CLINICAL ORTHOPAEDICS AND RELATED RESEARCH Number 320, pp 247-278 0 1995 Lippincott-Raven Publishers

SECTION

111

REGULAR AND SPECIAL FEATURES
Tutorial
Molecular Biology for the Clinician f Part II. Tools o Molecular Biology
Eileen M. Shore, PhD*; and Frederick S. Kaplan, MD*.**

This is the second of a series of tutorials on molecular biology for the clinician. Part I introduced general principles of nucleic acid and gene structure. This tutorial will build on those general principles and describe some of the commonly used techniques employed in research and diagnosis to examine genes and gene function. The tools of molecular biology include recombinant DNA techniques that have revolutionized the study of DNA and genetic information. Recombinant DNA technology allows the isolation of specific regions of chromosomal DNA and the recovery of unlimited quantities of that DNA for analyses, including determination of the nucleotide sequence of genes, examination of the mechanisms of gene regulation, and the identification of gene products (RNA and protein).

From *Department of Orthopaedic Surgery, University of Pennsylvania School of Medicine, Philadelphia, PA. **Division of Metabolic Bone Disease, Department of Orthopaedic Surgery, Hospital of the University of Pennsylvania, Philadelphia, PA. Reprint requests to Eileen M. Shore, PhD, University of Pennsylvania, Department of Orthopaedics, 424 Stemmler Hall, 36th and Hamilton Walk, Philadelphia, PA19104-6081.

Historically, the practice of medicine began as the treatment of symptoms (such as fever, skin rash). Later, physicians began to search for and treat the causes of these symptoms (such as bacterial and viral infections, poor nutrition). Later still, physicians used this knowledge to cure or prevent the diseases themselves. However, some diseases could not be prevented by the treatment of exogenous factors, because the diseases were not external to the individual, but rather an innate part of an individual’s genetic makeup. Modern clinical and human genetics strive to identify gene alterations that lead to disease. Gene mutations can lead to the synthesis of a nonfunctional gene product (ribonucleic acid [RNA] or protein) or to the dysregulated production of an RNAor protein. Understanding gene structure and function has led to the development of improved methods of diagnosis and treatment of many diseases. Identification of the gene that is responsible for a disease is often the first step in identifying the protein and the biochemical pathway that is perturbed in that disease. Molecular biology technology provides the tools that are being used to determine the structure of genes, to

247

24%

Shore and Kaplan

Clinical Orthopaedics and Related Research

understand the regulation and expression of gene activity, and to design therapies that will lead to the effective treatment of diseases caused by the malfunction of genes.

tional regulation and gene expression. This article is not intended to provide protocols for these techniques, but to present a general overview that will aid the reader in understanding the principles behind the technology.

INTRODUCTIONTO CLONING AND RECOMBINANT DNA TECHNOLOGY
The analysis of a gene would be impossible if only a single molecule of that gene was available for study. Recombinant deoxyribonucleic acid (DNA) technology and molecular cloning provide the means to recover multiple copies of a segment of DNA, producing large amounts of material for analysis. The human genome contains 50,000 to 100,000 genes encoded by hundreds of millions of base pairs of DNA. Molecular cloning is the isolation of a specific segment of DNA (such as a gene or part of a gene) and the generation of many identical copies, or clones, of that segment of DNA. An isolated segment of DNA cannot replicate itself. This limitation is overcome by joining (or ligating) the DNA segment to a vector (or carrier) DNA molecule which is capable of its own replication and which also can replicate the DNA segment that has been ligated to it. This new combination of the DNA segment of interest with the vector DNA is a recombinant DNA molecule. To provide the necessary enzymes and other cellular machinery required for DNA replication, recombinant DNA molecules are inserted into host cells, typically bacteria such as Escherichia coli, which become factories to reproduce large quantities of the recombinant DNA molecule. This procedure is summarized in Figure 1 and described in detail in the following sections. The authors describe the molecular biology techniques that are used to construct a recombinant DNA molecule and some of the technology that is used to characterize the structure and function of cloned genes. The last sections introduce methods of RNA and protein analysis and the analysis of transcrip-

ISOLATION AND CHARACTERIZATION OF SPECIFIC GENES Restriction Endonucleases
One of the most important discoveries leading to the isolation and manipulation of specific genes was identifying restriction endonucleases. These enzymes are synthesized by many bacterial species as part of a defense system to degrade the DNA of invading viruses. Each restriction endonuclease (also referred to as a restriction enzyme) recognizes and binds a specific sequence of DNA nucleotides and cuts double-stranded DNA within or nearby the recognition sequence (restriction site), breaking the phosphodiester bonds of the nucleotide chain. Recognition sites for restriction endonucleases occur in the DNA of the bacteria as well; however, the bacteria contain enzymes that modify its DNA by methylation of nucleotides that contain adenine (A) or cytosine ( C ) bases, blocking cleavage of its DNA by the restriction endonuclease. The unmethylated vital DNA will be unprotected and will be cut or restricted (prevented from functioning) when it enters the bacterial cell. Double-stranded DNA will be broken only if phosphodiester bonds, which form the backbone of a polynucleotide chain, are cut in each strand. Restriction endonuclease recognition sites are typically 4 to 8 nucleotides long and occur as inverted palindromes: The same sequence occurs in both strands in reverse order (Fig 2). The enzyme then can recognize and cut the same sequence on both strands. Some restriction enzymes cut both DNA strands at the center of symmetry, generating fragments of DNA with blunt ends. Others cut each DNAstrand at the same sites on each strand, but offset

Number 320 November, 1995
Nucleus

Tools of Molecular Biology

249

Fig 1. Recombinant DNA. In eukaryotic cells, DNA resides in the chromo+ somes within the cell nucleus. The Isolate DNA DNA is extracted from the cells and segments from cell enzymatically cut into small pieces. Eukaryotic These pieces of DNA are ligated to Cell vector DNAs, self-replicating DNA molecules that will facilitate the syn/ n thesis of many copies of any piece of DNA inserted into them. A commonly used vector is plasmid DNA, small cirLigation t o join Transformation cular molecules of double-stranded DNA segments Bacteria, into host cells DNA that are found in bacterial cells. with vector DNAS host cell t o replicate Vector The joining of plasmid vector DNA DNA Recombinant DNA with eukaryotic cellular DNA (DNAs I molecules molecules Replication of from 2 different sources) forms recomrecombinant DNA binant DNA molecules. The recombinant DNA molecules are introduced into bacterial host cells by a process known as transformation, under conditions that favor the entry of a sinale recombinant molecule per host cell. Each transformed host cell becomes a factory for the synthesis-of many copies of a single recombinant molecule (see also Fig 5A).

0

u

OW

from the center of symmetry, generating fragments with single stranded ends (Fig 2). The isolation and commercial manufacture of the restriction endonucleases have provided a set of highly specific molecular scalpels that allow DNAto be cut into highly specific and reproducible pieces. More than 150 different restriction enzymes, most of which recognize different nucleotide sequences, are available commercially. Restriction endonucleases have been purified from many species of bacteria. The enzymes are named by a 3- to 4-letter abbreviation that identifies the bacterial species from which the enzyme was isolated, followed by Roman numerals to distinguish enzymes that have the same bacterial origin. For example, EcoRI was the first restriction endonuclease purified from Escherichia coli; Hind I11 was the third restriction enzyme isolated from Haemophilus influenza Rd; and PvuI and PvuII were recovered from Proteus vulgaris. A specific restriction endonuclease will cut, or digest, any (unmethylated) doublestranded DNA molecule into specific segments of DNA because the enzyme will cut the DNA reproducibly where there are

recognition sites for that enzyme. Such a digestion of DNA results in the production of a set of DNA segments called restriction fragments. A specific region of DNA can be cut singly or in combination with several restriction enzymes, and a restriction map of that region of DNA, showing the position of each restriction enzyme recognition site relative to the others, can be constructed (Fig 3 ) . Because each restriction endonuclease recognizes a short DNA sequence, a restriction map provides some DNA sequence information about the DNAregion without determining the complete nucleotide sequence. Comparison of restriction maps for a particular gene region among individuals or species can indicate the degree of similarity among them. Restriction maps also provide important information about a gene region that is used in gene cloning and genetic engineering (to be discussed in detail in later sections).

DNA Cloning A detailed analysis of a gene or a specific segment of DNArequires large amounts of a specific DNA fragment. This can be accom-

250

Shore and Kaplan
RecognitionlCut Site
~

Clinical Orthopaedics and Related Research

Restriction Endonuclease ECO RI

Ends Produced 5 ' G 3' 5' A A T T C 3' 3'C T T A AS' 3 ' 6 5'
5' A 3'

2 0 kb molecule of DNA

I
S'G~AATTCS' 3' C T T A AAG 5'
5' A ~ A G

Hind

111

c T T 3'

3 ' 1 T C G AAA 5' Taq I
5' TVC G A 3' 3 ' A G CAT 5'

5'A G C T T 3 ' 3'T T C G AS' 3 ' A 5'
5'

-

A and 8

5 kb 15kb

-8

kb 12 kb

5 ' T 3' 3'A G C5'
5' c c c 3' 3 ' G G G 5'

C G A 3'
3 ' 1 5'

- 5 kb -7 kb -8 kb

Sma I

5 C C CVG G G 3 ' ' 3'GGGACCC5'

5 ' G G G 3' 3'CCCS' 5 ' C G 3' 3'TAGCs' A

P""

I

S'CGAYCG3' 3'GCATAGCS'

5'C G A T 3 ' 3' G C 5'

Fig 2. Restrictionendonuclease recognition and cut sites. A restriction endonuclease binds to a specific sequence of nucleotides in doublestranded DNA. The recognition sequences for restriction endonucleases are inverted palindromes: reading in the 5' to 3' direction, both strands of the recognition site have the same sequence. Most commonly used restriction endonucleases recognize sequences of 4 to 8 nucleotides and cut within the recognition site. Cuts in the DNA strands are made at symmetrical positions on each DNA strand, as indicated by black arrowheads in the examples shown. Some restriction endonucleases, such as Sma I, cut the double-stranded DNA at the center of symmetry, generating fragments of DNA with flush (or blunt) ends. Other enzymes will cut at offset (or staggered) sites, generating single stranded tails, or overhangs. Eco RI, Hind 111, and Taq I generate 5' overhangs; Pvu I creates a 3' overhang. Most of the examples (Eco RI, Hind Ill, Sma I, Pvu I) recognize sequences of 6 nucleotides and commonly are referred to as 6cutters.Taq I is a 4-cutter.

,
I

2 0 kb 15 kb

lZkb
5 kb

$1

8 kb 8 kb

A+8

7 kb

(1

I I 1

A

J

B

Restriction

Map:

L

Fig 3 Restriction map.A restriction map shows . the position of restriction sites within a segment of DNA. For the linear 20 kb of DNA molecule depicted, the positions of restriction sites for restriction enzymes A and B can be determined by cutting the DNA with these 2 enzymes in single digests with each enzyme and by a double digest using both enzymes together.The resulting fragments will be <20 kb if the enzyme has cut within the 20-kb molecule. The positions of cut sites that are consistent with the restriction fragment sizes generated can be determined, and a restriction map for those sites can be constructed.

plished by using recombinant DNA molecules produced by DNA cloning.
Ligation Many restriction endonucleases cleave double-stranded DNA by staggered cuts that generate single-stranded tails at the cut site (Figs 2,4). These ends of DNA are described as cohesive ends, or sticky ends, because the single strands can form complementary base pairs with single-strandedtails on other DNAmolecules that have been generated by cutting with the same enzyme. Any 2 DNAmolecules with complementary single-stranded tails can be joined together by complementary base pair-

ing followed by reformation of the phosphodiester bonds on each DNA strand by the enzyme DNA ligase (Fig 4). Note that some restriction enzymes generate blunt ends, with no single-strandedtail, rather than sticky ends (Fig 2). Blunt ends also can be ligated together by DNA ligase. The ligation of 2 DNA molecules from different sources creates a recombinant DNA molecuIe.
Vector DNAs To generate many copies of a DNA molecule, the DNA must have the ability to replicate itself. Most isolated pieces of DNAdo not have this ability, and therefore require a carrier DNA molecule, or vector. The vectors that commonly are used in molecular biology studies can self-replicate in bacterial or yeast

Number 320 November, 1995

Tools of Molecular Biology

251

Fig 4A-C. Ligation of DNA. (A) A doublestranded segment of DNA with an Eco RI recognition site is depicted. The nucleotides of the inverted palindrome recognition site (5' GAATTC 3' on both strands) are represented by the boxes containing the base (A = adenine; C = cytosine; G = guanine;T = thymine) of each I nucleotide. The hydrogen bonds between base pairs of the 2 DNA strands are indicated by Digestion with Restriction Enzyme striped lines: 2 hydrogen bonds between A-T (EcoRI) pairs and 3 between G-C pairs. Arrows indicate the cutting site of the enzyme. (8) digestion On with Eco RI, the phosphodiester bonds between the nucleotides at the cut site are broken, leaving a phosphate group (P) at free 5' ends and a hydroxyl group (OH) at free 3' ends. After disruption of the phosphodiester bonds, the hydrogen bonds between base pairs are not sufficient to hold the double strands toLigation with gether and the fragments of double stranded DNA Ligase DNA to either side of the cut site will separate. Because Eco RI cuts at staggered sites, the cut ends of the DNA fragments will have single5' stranded tails, or overhangs. (C) The overhangs that result from staggered cuts commonly are referred to as sticky ends because the single3' stranded ends can reform hvdroaen bonds between A-T and G-C pairs. If the single-stranded overhangs contain complementary base sequence to allow formation of hydrogen bonds between base pairs, they are described as being compatible. (A fragment generated with cutting by Eco RI and 1 generated by Hind Ill would not be compatible 1. because their single-stranded overhangs could not form complementary base pairs [see Fig 2 ) The enzyme DNA ligase catalyzes the formation of phosphodiester bonds and is used to join fragments of DNA having compatible sticky ends, for example in the formation of recombinant DNAs.

I

host cells. Recombinant DNA molecules that are formed using restriction enzymes and DNA ligase to join a segment of human DNA and vector DNA can be introduced into the host cells and replicated to generate large quantities of the recombinant molecule. Because the host cells can be grown indefinitely in the laboratory, unlimited amounts of the recombinant DNA can be obtained. Plasmids and phagemids are commonly used vectors for the cloning of short (up to 10 kb) segments of DNA. Plasmids, which occur in nature, are circular double-stranded DNA molecules found in bacteria and replicate independently of the host chromosome. Phagemids are genetically altered, or engineered, plasmids to which regions of bacteriophage M13 (a bacteria virus) have been added. The M13 genome can be replicated into double-stranded

or single-stranded forms. By inserting the information that encodes single-strand synthesis into a plasmid, the plasmid can be induced to generate single-stranded DNA, a property that especially has been useful for DNA sequencing technology (see below). Plasmids and phagemids have been engineered to meet the specific needs of molecular cloning. They contain an origin of replication to allow propagation in bacteria, 1 or more selectable markers (to identify bacteria that carry the plasmid or phagemid), and multiple recognition sites for restriction endonucleases for ease of insertion of a foreign DNA segment. The method of propagation of plasmid DNA is shown schematically in Figure 5A. Larger fragments of DNA can be cloned into other types of vectors (Table 1). Bacte-

252
A

Shore and Kaplan
B
Bacteriophage 1 growth

Clinical Orthopaedics and Related Research

Fig 5A-B. Growth of phage and plasmid DNAs. (A) Plasmids are small circular doubleDNA stranded DNAs. Plasmid DNA molecules are ooo Packaw I depicted here as small circles; the heavy line DNi within each circle represents a segment of DNA Transform inserted within the plasmid DNA. In the laboraBacteria tory, bacteria (Escherichia coli) cells can be Cells Transfect made permeable to the uptake of DNA. The Bacteria Cells plasmid enters these competent cells in a process known as transformation, under condiftions that favor the entry of a single plasmid molecule per cell. Plasmid DNA will replicate in each host bacterial cell, generating many Phage Growth copies of the plasmid within the cell. A single -DNA replication cell with its replicated plasmid DNA is shown in -phage particle assembly the figure. After transformation, the bacterial cell mixture is plated on agar containing a selective antibiotic, and only transformed cells Bacterial that contain an antibiotic resistance gene carCell Lysis ried within the plasmid DNA will grow; most plasmids used in recombinant DNA techniques contain an antibiotic resistance gene. Because the bacterial cells are spread across the agar colonies surface at a low density, the plasmid-containing cells will be well isolated from each other. As each of these bacteria cells grow and divide, its plasmid DNA will be distributed to its daughter cells in which the Dlasmid DNA continues to replicate. Each transformed bacterial cell will form a colony of cells (a cluster of cells, all of which are descended from a single parental cell) containing the plasmid DNA. Cells that carry no plasmid, and therefore no antibiotic resistance gene, have been plated on the agar surface as well, but are not seen because they cannot grow and form visible colonies. (B) Bacteriophage h (lambda) DNA is double-stranded and can exist as a linear or circular form. Phage 3, DNA molecules are depicted here as circles;the heavy line represents a segment of DNA inserted within the phage DNA.To enter bacterial host cells (Escherichia coli), h DNA must be packaged into a protein coat to form infectious phage particles.The phage particles insert the h DNA into the bacteria by a process known as transfection. The h DNA enters the host cell in a linear form and then circularizes within the host cell to replicate itself. Generally, only a single phage particle will infect a given cell. In the host cell, the h DNA replicates and new phage particles form. A single cell containing replicated and packaged h DNA is depicted in the figure.Over time, the phage growth will cause the bacterial cell to lyse, or burst, releasing the phage particlesto infect additional host cells. In the laboratory, phage can be grown on an agar surface in a petri dish.The phage are mixed with bacterial cells then plated at a low density such that an infected cell is well isolated from other infected cells on the dish. As the phage grows and is released by cell lysis, the phage particles reinfect additional host cells which in turn are lysed.The cell lysis eventually results in a cleared area against an as yet uninfected bacterial lawn. (Bacteriophage MI3 is a filamentous bacteriophagethat is not packaged into the same type of protein coat as bacteriophage h; MI3 phage does lyse cells and form plaques.)
0

Plasmid and phagemid growth plasmid DNA

0"

0

'2

I @ 0

I

i

J

riophage lambda ( k ) , a bacterial virus (or phage), can accomodate up to 23 kb of DNA. Fragments of approximately 45 kb can be inserted into cosmid vectors. Cosmids are hybrid molecules derived from plasmid and phage DNAs. The DNA of these hybrid molecules is packaged into phage particles but

replicates like plasmids within the host cells (Fig 5). Very large pieces of chromosomal DNA (range, 100-1500 kb) can be recovered in yeast artificial chromosomes (YACs). Yeast artificial chromosomes have been constructed to contain centromeres and telomeres to allow replication and segregation during

Number 320 November, 1995

Tools of Molecular Biology

253

TABLE 1.

Cloning Vectors
Size of Insert
< I 0 kb
<23 kb

Cloning Vector
Plasmid and phagemid Bacteriophage h Cosmid Yeast artificial chromosome (YAC)

<45 kb
100-1500 kb

each bacterium producing DNA from a single recombinant clone. When the phage-cell mixture is plated on agar in a petri dish, clear areas of cell lysis (called plaques) appear against a lawn of uninfected bacterial growth (Fig 5B). Each plaque is derived from a single infecting phage particle: The initially infected bacterium is lysed after the production of multiple new phage particles, releasing the newly synthesized phage to infect additional cells. Complementary DNA (cDNA)Libraries Only a portion (estimated to be approximately 10%) of the total DNAin the human genome contains protein coding gene sequences. The function of most of the remaining DNA sequences of the genome is not understood. However, some nonprotein coding sequences contain gene regulatory sequences or are transcribed sequences that are removed from the primary RNA transcript (Fig 6). After splicing to remove the introns, the resulting mRNA contains a continuous protein coding region flanked on either side by untranslated regions (UTRs). Messenger ribonucleic acid (mRNA) for a particular gene only will be present in cells that transcribe that gene. Some genes are transcribed (or expressed) in all cell types; these frequently are referred to as housekeeping genes because they are necessary for general cell maintenance. Other genes are regulated specifically to be expressed in only certain cell types (for example, hemoglobin in red blood cells) or at very specific times (for example, during development or tissue regeneration as in fracture healing). Such specific gene regulation is what makes 1 type of cell functionally different from another type. A genomic DNA library contains the entire DNA content of a cell. A complementary DNA (cDNA) library represents the entire RNA content of a cell or tissue. In other words, a cDNAlibrary contains only the gene sequences that are transcribed in the cells from which the library was made. Because any given cell type expresses only a subset of its genes, the cDNA library used as a source for

yeast cell division like normal yeast chromosomes.

DNA Libraries The discussion thus far has described how pieces of DNA are copied or cloned. The next sections will begin to describe how a specific gene of interest may be recovered. The first step in isolating a particular DNA sequence is to construct a set of recombinant DNA molecules from a source that contains the sequence of interest. This set of clones is called a library and ideally contains all sequences contained in the original source (genomic DNA or messenger ribonucleic acid [mRNA]). Once a library is constructed, it is screened to identify the specific clone containing the DNA sequence of interest.
Genomic Libraries A genomic DNA library includes all DNA sequences contained in the chromosomes of a cell (genomic DNA). Typically, purified genomic DNA is digested with a restriction enzyme in a brief incubation, so that the enzyme is not given sufficient time to cut at all the recognition sites for that enzyme. Assuming that the limited number of cuts that are made occur without bias for specific positions along the DNAmolecules, a set of large overlapping fragments will be obtained. These fragments are ligated into bacteriophage DNA, and the recombinant bacteriophage DNA molecules are packaged into infectious phage (viral) particles. These viral particles are incubated with Escherichia coli cells, some of which will be infected by a single phage particle. The phage DNA will replicate in its host bacterial cell,

254

Shore and Kaplan
Start of trandmon (initiation codon) End 01 translation (termination mdon)

Clinical Orthopaedics and Related Research

Fig 6. Gene and mRNA structure. Genes, located on double-stranded Chro mo ~ma , .............. ............ DNA within chromosomes, are the .......... DNA ........ fundamental physical and functional units of heredity. A gene is an orftr~=ript dered sequence of nucleotides that contains the coding information for Primary RNA transcript an RNA transcript and the regulatory information that controls the synthesis of that transcript. The top portion of the figure depicts a segment of mRNA chromosomal DNA with the tran‘paly(A)tall scribed region of a hypothetical gene Stan of translation End Of translation shown as boxes (exons: white boxes; introns: striped boxes). By convention, a gene is depicted with its transcription start site (the beginning of exon 1) to the left.Transcriptional regulatory sequences, including the site for RNA polymerase complex binding, occur immediately upstream of the transcriptional start site (to the left of the transcript start site in the figure) in a region called the promoter. (Nontranscribed DNA sequences adjacent to the transcribed region are indicated by dashed lines.) Additional regulatory sequences (not shown) can occur further upstream, downstream, or even within the transcribed portion of the gene. Ribonucleic acid polymerase synthesizes a single-stranded primary RNA transcript that is colinear with the gene sequences. For most genes, the primary RNA transcript contains intron and exon regions. The exons (numbered, white boxes) include all the coding sequences for the protein product of the gene, plus nonprotein coding sequences upstream from the initiation codon (start site of translation) and downstream from the termination codon (end of translation) that are known as UTRs (untranslated regions). The introns (striped boxes) are sequences that interrupt the coding sequences of a gene. lntrons are transcribed into RNA, but are removed from the primary RNA transcript when it is processed to form the mature messenger RNA (mRNA).Two other RNA processing steps also occur to RNAs that encode proteins.The first is a modification of the 5‘ end (or start) of the RNA molecule; this occurs as soon as transcription begins and is found on both primary RNA transcripts and mRNA.The other is the addition of a poly(A) tail to the 3’ end of the mRNA.The primary RNA transcript extends beyond what will be the end of the mRNA; it is trimmed back to a specific site and the poly(A) tail is added. Note that a genomic DNA clone of a gene, which is derived directly from the chromosomal DNA, can contain intron and exon sequences, and regulatory sequences that flank the transcribed regions. In contrast, a cDNA clone is obtained by synthesizing a DNA copy of mRNA; this cDNA will therefore only contain exon sequences.

WAAMAAA

a specific gene must be derived from cells that transcribe that gene. This is a very important distinction between these 2 libraries. For example, a human genomic DNA library will contain the same DNA sequences regardless of the cells or tissue from which the DNA is derived. This is because all the cells, with few exceptions, of an organism have the same DNA content. In contrast, a cDNA library is derived from the mRNA contained in a cell or tissue. Since the mRNA content of each tissue is different, because of different cellular functions, cDNA libraries will reflect only the genes that are being actively transcribed by the cells of that tissue.

Complementary DNA libraries are constructed from DNA copies of mRNA. A DNA copy of an RNA will reflect the same information (nucleotide sequence) contained in the RNA, while giving the additional advantage of manipulations (such as cloning) that are possible with a DNA molecule but not RNA. To construct a cDNA library, total RNA is isolated from the cells or tissue of interest and then processed to obtain a fraction enriched for mRNAs. Most eukaryotic mRNAs contain a series of adenine nucleotide residues at their 3’ ends. This run of adenines (which are not encoded within the gene, but are added to the mRNA after transcription) is known as a poly(A) tail (Fig 6). The cellular functions of

Number 320 November, 1995
mRNA

Tools of Molecular Biology

255

oligo(dT) ceW+

...............\.. ...... ................... .............. .. ...............
.:... .;::.

AAAAA 3'

. . . . . . . I . .

5' First strand 3' cDNA

<

Double stranded cDNA only mRNA with poly(A) tail will bind lo oligo(dT) celluloSe rRNA and tRNA flow through the column

3'

'

i i

Synthesize cDNA strand from mRNA with reverse transcriptase AAAAA 3'

m 5'
Remove mRNA and synthesize second DNA strand

> y 5'

low sah solution

\

A A

I

poly(A) mRNA is eluted from the oligo(dT) column wlth a low salt buffer

Fig 7. Poly(A) RNA selection. To construct a cDNA library, total RNA is processed to obtain a fraction enriched for mRNAs. Most eukaryotic mRNAs contain a series of adenine nucleotide residues at their 3' ends. This run of adenines (which are not encoded within the gene, but are added to the mRNA after transcription) is known as a poly(A) tail. The cellular functions of the poly(A) tail are not well understood; however, it serves as a useful way to separate mRNA (1Y0-2Y0 of the total RNA) away from tRNAs and rRNAs. Short oligonucleotides containing only deoxythymidine residues [oligo(dT)] are attached to cellulose beads [oligo(dT) cellulose] that are packed into a small column. When total RNA in solution is poured into the column, the poly(A) tails of the mRNA (represented by As in the figure) will hybridize to the oligo(dT) causing the mRNA to bind to the column. Ribosomal RNAs and tRNAs will not bind and are washed away. The poly(A) mRNA can be eluted easily from the column using a low salt buffer that is unfavorable to hybridization of poly(A) to oligo(dT).

Fig 8. Complimentary DNA synthesis. Complementary DNA (cDNA) is a DNA copy of an RNA molecule. Most messenger RNAs (mRNAs) are synthesized with a poly(A) tail at the 3' end of the RNA. A short oligonucleotide of poly(dT) can hybridize (complementary base pair) to the poly(A) tail and serve as a primer (provide a 3' OH from which nucleic acid synthesis can extend) for DNA synthesis. Reverse transcriptase is an RNA-dependent DNA polymerase that is able to use an RNA template for the synthesis of a complementary strand of DNA. To form a double-stranded cDNA molecule, the RNA template is removed enzymatically and the second strand of DNA can be synthesized by DNA polymerase using the first cDNA strand as

the poly(A) tail are not well understood; however, it serves as a useful way to separate &A (1%-2% of the total FWA) away from tRNAs and rRNAs (Fig 7). After the enrichment for poly(A) mRNA, an enzyme called re-

verse transcriptase (an RNA-dependent DNA polymerase) is used to synthesize cDNA from RNA templates (Fig 8). Reverse transcriptase requires a primer (starting nucleotides with a free 3' hydroxyl group) to initiate synthesis of DNA: either oligo(dT) primers, which will anneal to the poly(A) tail of mRNAs, or random primers, which anneal within the RNA sequence, are used. Through a series of enzymatic steps, the RNA strands of the RNA:cDNA hybrid molecules are removed and replaced by DNA. The double-stranded cDNA molecules are ligated into a cloning vector to create a cDNAlibrary that represents all the transcripts in the starting cells or tissue. Because most cDNAs are only a few kilobases in length, they can be cloned into either a bacteriophage or a plasmid vector. Cloned DNA in a plasmid vector is inserted into bacterial cells for replication through a process known as transformation (Fig 5A). Bacterial cells must first be treated to make their cell walls permeable to the uptake of DNA; these cells are described as be-

256

Shore and Kaplan

Clinical Orthopaedics and Related Research

ing competent. Once the bacteria have been transformed with the recombinant DNA, the cells are grown under selective conditions, usually with an antibiotic that will allow only the cells containing plasmid DNA to grow. When the cells are plated at low density on agar under selective conditions, the cells containing plasmid DNA will grow and replicate forming a colony of cells on the agar surface. Because only a single DNA molecule typically enters a given cell, each colony is synthesizing the DNAof a single recombinant molecule. At the start of this section, a distinction was made between the gene content of genomic DNA libraries (contain all genes and DNA sequences) and cDNA libraries (contain only the expressed gene sequences for a given cell type or tissue). The differences between a cDNA clone and a genomic DNA clone for a given gene also need to be emphasized (Fig 6). Because a cDNAclone was derived by copying an mRNA, the only gene sequences represented in a cDNA clone are the exons contained in the processed or mature mRNAof the gene. However, a genomic DNA clone for the same gene will contain, in addition to the exon regions, the introns and the sequences flanking the transcribed region which typically contain information that regulates gene expression. The 2 types of clones, cDNA and genomic DNA, provide useful but different information in gene analysis. For example, the amino acid sequence of a protein can be determined easily from a cloned cDNAbecause its nucleotide sequence is not interrupted by introns as it is for genomic DNA. The protein and cDNA sequences can be compared with databases of known sequences to provide clues to the function of the gene. Complementary DNAs frequently are used to identify mutations that cause disease or to study structure-function relationships of the protein product. However, other questions can be answered only by isolating genes from a genomic DNA library, for example, the examination of the gene sequences that regulate gene expression (such as

promoter regions) or the analysis of the structural organization of a cluster of genes. Genomic DNA contains intron and exon sequences, allowing an analysis of mutations that affect splicing.

Screening a DNA Library Once a phage or plasmid DNA library has been plated to separate individual recombinant DNA clones on an agar plate, a method of screening, or probing, for a specific clone containing the DNAsequence of interest is needed. The first step in screening a library is to transfer the cell colonies or phage plaques to a solid support, usually a nitrocellulose or nylon filter. A filter disk is placed on the agar surface containing the cells or phage and then carefully lifted off, carrying some of the cells or phage in an exact replica of the position each clone has on the agar surface (Fig 9). The filters are treated to lyse the cells and/or phage and release their DNAs. The DNA then is denatured to single strands and fixed to adhere to the filter. The filters containing the recombinant DNAs now are ready to be hybridized with a labeled probe to detect specific DNA sequences. Nucleic Acid Hybridization and Probes The most commonly used screening method is nucleic acid hybridization. Hybridization is the formation of hydrogen bonds between base pairs of complementary strands of DNA or RNA. Under the appropriate conditions, single-stranded DNAs can anneal, or hybridize, to one another if their DNA seqences are complementary (Fig 10). Conversely, a double-stranded DNA molecule can be denatured to separate the strands into 2 single-stranded DNA molecules. A probe is a DNA or RNA sequence that carries a tag or label so that it readily can be recognized and detected. Incorporation of a radiolabeled molecule, such as 32P,is used frequently to tag the probe DNAor RNA, although nonradioactive methods also can be used. Standard labeling methods for DNAinclude end-labeling, nick translation, and ran-

Number 320 November, 1995

Tools of Molecular Biology

257

Fig 9. Screening a recombinant DNA library. Cloned DNAs can be screened for the presence of a specific sequence. In the example shown, recombinant DNA molecules in a phage A filter disk is placed vector are plated and grown as plaques on agar on surface of plate plates (see Fig 5B). When a filter disk made of nitrocellulose or nylon is placed over the plaques, some of the phage particles will stick Phage stick to filter to the filter when the filter is lifted from the plate. The filter, called a plaque lift, is a replica Filter is lifted from agar plate of the plaque positions as they occur on the ,, agar surface. The phage particles on the plaque lift are treated to release the phage DNA, and the DNA is denatured to single strands and fixed, or on agar plate adhered, to the filter.The filter is incubated with a labeled probe specific for the DNA sequence DNA on filter is denatured and of interest under conditions that promote the hybridized with labeled probe hybridization of the probe with phage DNA through the formation of base pairs between complementary sequences (see Fig 10). After Hybridized filter is exposed to Xray film to hybridization, the filter is washed to remove unidentify p s i t i v e clone bound probe molecules and then exposed to xray film. Recombinant DNAs containing sequence complementary to the probe are recognized by positive (dark) spots on the film where probe DNA has bound. A positive clone is idenPositive clone can be selected from phage tified by matching the autoradiogram with the plaques on original plaque lift and with the original agar plate conplate taining the plaques. The plaque corresponding to the positive signal can be selected for further analysis of its recombinant DNA. This screening method also can be applied to recombinant DNAs in plasmid vectors. After growing colonies on plates (see Fig 5A), colony lifts can be made and processed similarly as plaque lifts.

a
I .

I

dom priming (Fig 11). Labeled RNAs are synthesized by in vitro transcription. When a single-stranded probe DNA or RNA is incubated with filters containing clones from a recombinant DNA library, the probe will hybridize to the recombinant clone because the DNA sequences are complementary to one another. After hybridization and washing to remove unhybridized probe, the filter is exposed to xray film. Energy emitted from the radioactive probe (now hybridized to its complementary recombinant clone) will cause a dark spot to appear on the film at the position of the positive clone. The film and the original agar plate containing the recombinant clones are aligned with each other, and the recombinant cell colony or phage plaque containing the positive recombinant DNA can be selected (Fig 9).

Where do probes come from? When no information about the gene of interest is available, the protein of interest is purified and a portion of its amino acid sequence is determined. The genetic code, which relates a single amino acid to its several encoding nucleotide triplet codons, is used to anticipate the putative nucleotide sequence of the gene that encodes the protein. Oligonucleotides (short sequences of nucleotides, Fig 10) corresponding to the several possible gene sequences are synthesized to be used as probes for screening a cDNA library. Once a cDNA clone for the gene of interest has been recovered, the cDNA itself can be labeled and used as a probe to screen a genomic DNA library. As discussed above, cDNA and genomic DNA clones provide important, but different, information for gene analysis. Be-

258

Shore and Kaplan

Clinical Orthopaedics and Related Research

oligonucleotide
3'C AT A A C G G AT C C G T A T A G G A 5 ' 5'---- A T G A A A T G C A C T G T A T T G C C T AG G C AT A T C C T A G C G A T----3'

/

target DNA
Fig 10. Nucleic acid hybridization. A target sequence is shown in the 5' to 3' orientation. Nucleotides are represented by their base content (A, C, G, T). Above the target sequence, a short sequence of nucleotides (an oligonucleotide) with a complementary sequence is shown in the 3' to 5' direction (in the antiparallel orientation with respect to the target strand). This oligonucleotide of 20 nucleotides can be referred to as a 20-mer. Two nucleic acids that have complementary nucleotide sequences can form base pairs between the complementary bases, resulting in a region of doublestranded DNA. This process is known as hybridization. If the oligonucleotide is labeled with a detectable tag, such as a radioisotope, it can be used to identify a piece of DNA containing a complementary sequence. Hybridization probes can be synthetically made oligonucleotides or sequences that are up to a few kilobases in length such as a cDNA.

\

cause of the greater complexity of genomic DNA, it is often difficult to use an oligonucleotide probe to identify a clone of interest from a genomic DNA library. A cDNA clone, however, is used easily for this purpose. An alternative to screening directly for gene sequences with a nucleic acid probe is to screen for the protein encoded by the gene. In this case, a cDNA library is constructed using a special type of vector known as an expression vector, which facilitates the translation of the cDNA into a protein. An expression library can be screened using an antibody to the protein product. Expression vectors typically contain bacterial promoters that allow the transcription of the cloned cDNA in Escherichia coli cells. The host cell is capable of protein synthesis from these transcripts. The expression library is plated as has been described above, and then screened by incubating filter lifts of the plated library with an antibody specific for the protein of interest. Detection methods, similar to those used for Western blotting and described in a later section, are used to identify the clone that expressed the protein. The positive clone is recovered, and the se-

quence of the DNA can be readily determined for gene analysis.

Gel Electrophoresis Gel electrophoresis is a technique that separates DNAmolecules by size and is used to visualize fragments of DNA. Different gel compositions and electrophoretic conditions are used depending on the type of separation required. Apolyacrylamidegel is used to separate DNA molecules of <500 base pairs. An agarose gel is used to separate DNAmolecules of 300 to 10,000 nucleotide pairs. Pulsed-field gel electrophoresis is a variation of agarose gel electrophoresis that is used to separate very large DNA molecules including entire bacterial and yeast chromosomes. For gel electrophoresis, the DNA sample is placed in a well at 1 end of a gel. Because each nucleotide in a nucleic acid molecule carries a single negative charge, when an electric current is applied the DNA migrates toward the positive electrode. The agarose (or acrylamide) matrix acts like a sieve for the DNA, allowing smaller DNA molecules to move more quickly through the matrix and thereby separating the DNA fragments according to their size (molecular weight).

Number 320 November, 1995

Tools of Molecular Biology

259

A

Nlck Translation

double stranded DNA

treat DNA with DNase to produce single stranded nicks

DNA synthesis, extending from the DNA nick, incorporates labeled dNTPs

B Random Priming
5' 3'

::+
double stranded DNA

5'

3'
5'

5'

*****-

+

*.***c

3'
5'

3'

+ 3'

--9 -*****

+ -*****

denature DNA to single strands

DNA synthesis, extending from oligonucleotide primers, incorporates labeled dNTPs

Fig 11A-6. Labeling of double-stranded DNA to generate probes for nucleic acid hybridization.Two methods for uniform labeling of double-stranded DNA molecules are shown. In the nick translation method (A), double-stranded DNA is treated with an enzyme (DNase I) that produces singlestranded nicks in the DNA strands. These nicks break the phosphodiester bond between 2 adjacent nucleotides on the same DNA strand.The 3' OH that is now present at the nick site can serve as a primer from which DNA polymerase can synthesize DNA (displacing the existing unlabeled DNA) using the complementary DNA strand as a template. Labeled (such as with 32P) deoxynucleoside triphosphates (dNTPs) are included in the DNA synthesis reaction so that the newly synthesized DNA will contain a detectable tag. (Arrows show the direction of DNA synthesis-always 5' to 3'and asterisks (*) represent the newly synthesized labeled DNA.) In the random priming method (B), double-stranded DNA is denatured (either by heat or alkali) to single-stranded DNA molecules.The single strands are incubated with short oligonucleotides (typically 6-9 nucleotides) of random sequences under conditions that favor hybridization of the oligonucleotides to the single-stranded DNAs. Deoxyribonucleic acid polymerase, in the presence of labeled dNTPs, synthesizes DNA extending from the 3' OH of the primers and using the single-stranded DNA molecules as the template to direct the incorporation of complementary nucleotides. In recent years, random priming has become more commonly used to label probes for filter hybridization (see Figs 9, 13) than nick translation. In addition to these 2 methods, there are several other ways to generate labeled nucleic acid probes. For example, single-stranded RNA probes can be generated by the transcription of DNA sequences that have been cloned into a vector that contains a phage RNA polymerase promoter. A phage RNA polymerase is used to synthesize uniformly labeled probes to detect homologous DNA or RNA sequences. Unlike probes that are generated from double-stranded DNA molecules, singlestranded probes have the advantage of avoiding the problem of hybridization competition between the complementary strands of the probe DNA and the target DNA or RNA. In addition to the uniform labeling methods that have been described, DNA molecules also can be tagged by end labeling techniques. End labeling with polynucleotide kinase, which transfers a labeled phosphate to the 5' end of the DNA, is typically the preferred method for labeling oligonucleotides.

All DNA fragments of a given size will migrate at the same rate through the gel. Deoxyribonucleic acid fragments separated by gel electrophoresis can be visualized by staining the gel with the fluorescent

dye ethidium bromide. Molecules of ethidium bromide insert between the stacked bases of a DNA double helix, and when the stained gel is exposed to ultraviolet light the ethidium-DNA will fluoresce. The length of

260

Shore and Kaplan

Clinical Orthopaedics and Related Research

a particular DNA fragment can be determined by comparing the distance the fragment has migrated through the gel to the positions of marker DNAs of known lengths. Deoxyribonucleic acid from a plasmid or bacteriophage is relatively short in length, and therefore recognition sites for most restriction enzymes typically occur only a few times, cutting the DNAinto only a few fragments. When plasmid and phage DNAs are digested with a restriction enzyme and then electrophoresed through a gel, the DNA fragments are visualized as discrete bands (Fig 12). In contrast, the DNA sequence of genomic DNA is much more complex. When genomic DNA is digested and electrophoresed, there are too many different-sized fragments to see each as an individual band and instead the DNA appears as a smear (Fig 12).

To visualize individual DNA fragments from a restriction digestion of genomic DNA, the techniques of Southern blotting and hybridization with a labeled probe are used. Although this section has focused on the analysis of DNA, RNA and protein also can be fractionated by gel electrophoresis and analyzed by blotting techniques. Northern blotting is used for the analysis of RNA; Western blotting is used for proteins. These techniques will be described in later sections.

Southern Blotting Nucleic acid hybridization using a labeled DNA probe can be used to identify specific DNAmolecules. However, DNA embedded in agarose gel after electrophoresis is not readily accessible for hybridization. Southern blotting, or Southern transfer (named for E. M. Southern,

Fig 12. Agarose gel electrophoresis of DNA. Agarose gel electrophoresis separates fragments of DNA (or RNA) by size (molecular wells weight). Because DNA (and RNA) molecules are negatively charged, they will migrate toward the positive pole when an electrical current is applied to the gel. An agarose gel is a rectangua (D lar slab formed with depressions, or wells, at 1 n 1 LD (D end into which the DNA sample is placed with 4. the negative electrode near the wells and the 3 (CI positive electrode at the opposite end. Agarose 2 extracted from seaweed is a linear polymer that N m forms a matrix which acts like a sieve for the migrating fragments; smaller fragments will move more rapidly through the gel than larger fragments. After digestion of DNA with a restriction enzyme, small DNA molecules such as plasmid and phage DNAs will be cut into only a few fragments because the recognition site for the enGenome zyme will typically occur only a few times. Size (kb) 3 5 0 4x103 3x106 ExamDles are shown of a 3-kb circular Dlasmid DNA with a single-cut site that results in a single linear fiagment of DNA and a 50-kb phage DNA that is cut into 6 pieces. Genomic DNAs, isolated from human cells and Escherichia coli bacteria (E. coli), for example, are much more complex in DNA sequence content than plasmid or phage DNA. Restriction enzymes cut genomic DNAs into many fragments of various sizes that do not resolve from each other, instead appearing as a smear of DNA. The Escherichia coli genome contains approximately 1000-fold less DNA than the human genome. The figure is drawn to illustrate that comparable genome equivalents of these DNAs would be visualized as lighter (Escherichia coli) or darker (human) smears of DNA on a gel. In the rightmost lane of the gel, a lane of molecular weight (mw) standards is depicted. Molecular weight standards are DNA fragments of known size; the distance that these fragments migrate through the gel can be measured and plotted graphically against the size of the fragment to produce a standard curve. By measuring the migration distance of a fragment of unknown size and comparing with the standard curve, the size of this fragment can be determined. As described in the text, fragments of DNA resolved by electrophoresis are visualized by staining with ethidium bromide.

-

phage human mw dasmid E. coli std

+

Number 320 November. 1995

Tools of Molecular Biology

261

the developer of the technique), is a technique of transferring DNA that has been electrophoresed through an agarose gel to a solid support for hybridization. After restriction digestion and electrophoresis, the gel containing the size-fractionated DNA is incubated in a solution that contains a strong base (sodium hydroxide) to denature the DNA to separate the complementary strands of the double-stranded DNA fragments. The now single-stranded DNA molecules are transferred to a piece of nitrocellulose or nylon filter by blotting and capillary action (Fig 13A). This process preserves the relative positions of each of the DNAfragments, resulting in a filter replica of the gel. (The formation of colony and plaque lifts, described above for screening a DNAlibrary, are variations of the Southern blot.) To identify a specific DNA fragment, a labeled probe containing the sequences of interest is hybridized with the Southern blot filter. After washing to remove excess probe, the filter is exposed to xray film. Bands of DNA that have hybridized to the labeled probe will produce dark bands on the developed film. The sizes of the hybridized bands can be determined by their positions relative to the positions of molecular size standards. Southern blotting has become a standard procedure for medical diagnostics. The DNA of a patient can be examined as to whether a particular gene is present, deleted, or altered in that individual. For example, by examining the pattern of restriction fragments (restriction map), it can be determined if the gene appears to have a grossly normal structure or if portions of the gene are deleted or rearranged.

DNA Sequencing The information encoded by DNA is contained in the sequence of nucleotides of the DNA. Although several methods for determining DNA sequence have been developed, the most commonly used method is dideoxy chain termination. Dideoxy chain termination is an enzymatic sequencing method that uses a DNA polymerase to read the sequence of nucleotides

from a single-stranded DNA molecule (template). The segment of DNA to be sequenced usually is cloned into a vector, allowing sufficient quantities of DNA for the sequencing reaction to be obtained. Single-stranded template DNA can be synthesized as single-stranded molecules by cloning the target DNA into a bacteriophage M13 or phagemid vector; alternatively, single-stranded template can be obtained by denaturation of double-stranded DNA that has been cloned into a plasmid phagemid vector. Deoxyribonucleic acid synthesis by DNA polymerase only can occur by extending the newly synthesized strand from a 3’ OH (hydroxyl) group. A short sequence of nucleotides that is complementary in sequence to the template DNA can serve to prime the DNA synthesis (Fig 14). No prior knowledge of the target DNA sequence is necessary, because an oligonucleotide primer that has sequence complementary to vector DNA can be used to extend DNA synthesis into the target region. The chain terminators used in this sequencing method are dideoxy forms of the 4 DNA nucleotides (dNTPs: dATP, dCTP, dGTP, dTTP). Dideoxy nucleotides (ddNTPs) can be incorporated into a growing nucleotide chain, but they will block the addition of any further nucleotides caused by the absence of the 3’ hydroxyl (OH) needed for phosphodiester bond formation (Fig 15). When a small amount of 1 ddNTP is included with the 4 dNTPs in a DNA synthesis reaction, there is competition between extension and termination. The result is a set of DNA molecules whose lengths are determined by the distance between the primer used to initiate synthesis and the site of ddNTP termination (Fig 16). For each template DNA and primer, 4 reactions are done, each contains the 4 dNTPs (deoxy nucleotide triphosphates) and a chain terminating nucleotide specific for 1 of the 4 DNA nucleotides. The 4 reactions will produce a set of newly synthesized DNA molecules that terminate at positions occupied by every nucleotide in the template strand (Fig 16). Asequencing reaction contains some labeled (usually 35Sor 32P)nucleotide so that

262

Shore and Kaplan

Clinical Orthopaedics and Related Research

Size fractionation by agarose gel electrophoresis RNA

Transfer t o filter

Hybridize to labeled probe and autoradiograph

I

Fig 13A-B. Southern and Northern transfers. Identification of specific sequences of DNA or RNA is accomplished by the transfer techniques known as Southern and Northern blotting. (A) A Southern blot is a filter replica of DNA that has been size fractionated by agarose gel electrophoresis. After restriction enzyme digestion of DNA, the DNA fragments are electrophoresed through an agarose gel. A fractionated digest of human genomic DNA contains too many differently sized fragments to be resolved as individual bands and will appear as a smear when stained with ethidium bromide (see Fig 12). The DNA is denatured in the gel and then transferred to a solid support, usually a nitrocellulose or nylon filter, such that the relative positions of the DNA fragments are maintained.The DNA bound to the filter is hybridized to a labeled probe (see Fig 11). Autoradiography (exposure of the hybridized filter to xray film) is used to locate the positions of DNA fragments complementary to the probe. (6) Northern blot is a filter replica of RNA that has been fractionated by agarose gel elecA trophoresis. The procedure is nearly identical to that used for Southern blotting. The main differences are that the RNA molecules can be fractionated by agarose gel electrophoresis without prior digestion and RNA does not need to be denatured before transfer since it is already single stranded.

Ll

the newly synthesized DNA can be detected by autoradiography. When each of the 4 reactions for a given template are electrophoresed through a polyacrylamide gel (which can resolve DNA bands that differ in length by a single nucleotide) and the gel is exposed to xray film, the sequence of the DNA can be read easily from the relative positions of bands on the film (Fig 16). Variations in the dideoxy DNA sequencing method have been developed to automate the procedure. One modification is to use fluorescently labeled ddNTPs that not only eliminate the need for radioisotope labeling, but also allow the DNA sequencing of a template to occur in a single reaction rather than 4, since

each of the 4 ddNTPs is tagged with a different fluorescent compound. All synthesized DNA strands that terminate at the same ddNTP will be specific for the same fluorescent tag, and can be distinguished from DNA strands that terminate with the other 3 ddNTPs. In the DNA sequencing reaction, the 4 labeled ddNTPs are incubated with the template DNA, primers, dNTPs, and DNA polymerase. After DNA synthesis, all the dideoxy-terminated products are electrophoresed in a single lane of an acrylamide gel in an automated DNA sequencer. The sequencer scans the lane of the gel and interprets the different absorption of each of the 4 fluorescent tags as a specific base to determine the sequence of the DNA.

Number 320 November, 1995

Tools of Molecular Biology

263

target DNA

5'---- A T G A A AT G C A C T G T A T T G C C T A G G C AT A T C C T A G C G AT----3' < OH 3' A T C C G T A T A G G A T C G C T A 5 , , direction of DNA synthesis oligonucleotide primer DNA polymerase dNTPs (dATP, dCTP, dGTP, dTTP)

I

\

5'---- A T G A A A T G C A C T G T A T T G C C T A G G C AT A T C C T A G C G AT---- 3' 3 1 T A C T T T A C G T G A C A T A A C G G A T C C G T A T A G G A T C G CTA5t
newly-synthesized DNA
Fig 14. Primer extension. Deoxyribonucleic acid synthesis by DNA polymerase occurs through the sequential addition of deoxynucleotides to free 3' hydroxyl (OH) groups. A short sequence of nucleotides (an oligonucleotide primer) provides the 3' OH to start, or prime, DNA synthesis. For a DNA sequencing reaction, the oligonucleotide is designed and in vitro synthesized to have a nucleotide sequence that is complementary to the template DNA strand. Alternatively, the primer can be complementary to vector sequences at a site adjacent to the target sequence. The primer hybridizes to the template through complementary base pairing and positions the primer, and therefore the start of DNA synthesis, at a specific location along the template. The specific nucleotide that is added (A, C, G, or T) is determined by the template DNA that directs the addition of a complementary nucleotide. Deoxynucleoside triphosphates (dNTPs: dATP, dCTP, dGTP, dTTP) are the substrates used by DNA polymerase for DNA synthesis (see also Fig 15). The direction of DNA synthesis is described as occurring in the 5' to 3' direction.

Eventually, the DNA sequence of the approximately 6 billion nucleotides that encode the entire human genome will be determined. This information will be invaluable for predicting the amino acid sequence of the protein product of newly identified genes, for providing the background to compare and identify gene mutations in individual patients, and for designing probes and primers to be used for clinical diagnostics.

Polymerase Chain Reaction The polymerase chain reaction (PCR) is a powerful technique that has revolutionized molecular genetics during the past 10 years.

Polymerase chain reaction is used to amplify (make many copies of) a specific DNA or RNA sequence (Fig 17), even if the target DNA is present in very small amounts in a mixture of DNA. Polymerase chain reaction uses a DNA polymerase to synthesize DNAfrom 2 primers (short oligonucIeotides of typically 18-30 nucleotides) that hybridize to opposite strands of the template (or target) DNA. The primers are arranged such that the primer extension reaction from each strand directs the synthesis of DNA toward the other primer. In this way, 1 primer synthesizes a DNA strand that can be primed by the other strand. Both the double

264

Shore and Kaplan

Clinical Orthopaedics and Related Research

Deoxyribonucleotide (dNTP)

Dideoxyribonucleotide (ddNTP)

Fig 15. Deoxynucleotides and dideoxynucleotides. Deoxyribonucleicacid molecules are chains o deoxynucleotides. Synthesis o DNA f f occurs through t h e removal o 2 phosphate f groups from a deoxynucleoside triphosphate (dNTP) and the addition o the resulting def oxynucleoside monophosphate to a 3' hydroxyl (OH) group o the growing nucleotide chain.The f 3' OH o 1 nucleotide i used for phosphodif s ester bond formation with the next nucleotide. Because a dideoxynucleotide contains no 3' OH group, it acts as a chain terminator for DNA synthesis when incorporated into a growing nucleotide chain.

strands of DNA in the target sequence will be replicated, synthesizing a segment of DNAdefined by the 2 primers (Fig 17). The DNA polymerase used in a polymerase chain reaction reaction has been isolated from a heat-resistant strain of bacteria. (Several heat stable DNA polymerases currently are available for polymerase chain reaction. The first to be used, Taq polymerase, was derived from the bacteria Thermus aquaticus.) A heat-resistant polymerase will retain its enzymatic activity through the high temperatures, needed to denature the double strands of the DNA template to single strands that can hybridize with the oligonucleotide primers. The stability of the polymerase allows the DNA synthesis reaction to go through many cycles of strand denaturation and DNA synthesis, resulting in an exponential amplification of the target DNA. A polymerase chain reaction reaction contains the template DNA, a heat stable DNA polymerase, dNTPs (dATP, dCTP, dGTP, dTIT), the 2 primers, and the appropriate buffer. Each polymerase chain reaction cycle has 3 steps: (1) denaturation (at approximately 95" C) to separate the double-stranded DNA

template to single strands; (2) annealing (at approximately 55" C, depending on the primers used) to allow hybriduation of the primers to the single strands of the template; and (3) primer extension (at approximately 72" C ) to synthesize DNA by extending from the 3' end of the primer (Fig 17). Repetition of these 3 steps over 25 to 30 cycles results in an amplification of the number of copies of the target sequence by approximately 1 million-fold (less than predicted with exponential amplification of 100% efficiency, but a substantial increase of the starting DNA nevertheless). With use of polymerase chain reaction machines or thermal cyclers, an amplification cycle takes only approximately 10 minutes. One major advantage of polymerase chain reaction is the small amount of starting material necessary for an amplification reaction. Polymerase chain reaction can selectively amplify the DNA or RNA from a few cells, or even a single cell, generating sufficient amounts of DNA to be used, for example, for DNA sequence analysis or as a hybridization probe. In many cases polymerase chain reaction eliminates the need to clone a DNA fragment before these analyses; however, polymerase chain reaction does not eliminate the necessity for the initial cloning of a gene because it is necessary to know enough DNA sequence information about the target sequence to be able to design the primers that are used in the amplification reaction. When polymerase chain reaction is used to examine an mRNA of interest, a DNA copy first must be synthesized using the same enzyme, reverse transcriptase, that is used to prepare cDNA libraries (Fig 8). Primers and DNA polymerase then are used to synthesize the second DNA strand, followed by several cycles of amplification as in DNA polymerase chain reaction. This technique is called reverse transcriptase polymerase chain reaction, or RT-PCR. Polymerase chain reaction is used extensively in the molecular analysis of genetic disease, molecular diagnosis, and forensic medicine. Polymerase chain reaction can be

Number 320 November, 1995
DNA

Tools of Molecular Biology

265

Fig 16. Deoxyribonucleic acid sequencing by dideoxy chain terminaS'---ATCGATATAGCCATAGCTGTGATCCGTAATGTAC---3' tion. Using a single-stranded DNA 3,TCGACACTAGGCATTACATG template, a complementary primer 'Primer with a 3' OH group, and all 4 nucleoside I dATP, dCTP, dGTP, dTTP triphosphates (dATP, dCTP, dGTP, labeled dNTP dlTP), DNA polymerase will catalyze DNA polymerase chain extension from the primer, and synthesize a DNA strand complemen+ddATP +ddCTP +ddGTP +dmP tary to the template (see also Fig 14). (top). The template DNA is the target DNA whose sequence will be deterAmTATmTATGATATAWXT ATOXTATAGXATATE4TATAQD.T *ATATajGTATCGTAW A WIA mined. The primer hybridizes to the template through complementary A ~ T A T ~ T ATGATAT~TATE~TATAQx~TATcMA~xATbase pairing and positions the primer, %TA IAGCTATATCGGTA 'ATTATATCGGTAand the start of DNA synthesis, at a A-TAT~TA ~ T A T ~ T specific location along the template. AlO3TATAKO.TI A T W A*AGCTATATCGGTA Therefore, the first complementary nucleotide that is added to the primer will ATGATATAKCATATGATATWTbe the same for every newly syntheIATAmA%TajGTAsized DNA molecule generated from ATGATATA-Tthat primer and a given DNA template. 7 AIn the example shown, the first nucleotide added is an A, the second is a l A C G T T, and so on. If DNA synthesis is stopped at intervals, all fragments that are the same length will end in the same nucleotide (middle). For the dideoxy chain termination sequencing method, dideoxyncleotides (ddNTPs) are used - GT to terminate DNA synthesis at each of 5' the nucleotides (A, C, G, or T) in 4 separate reactions. In the figure, DNAs in each of the 4 reactions are depicted:The top strand shows the nucleotide sequence of a portion of the template adjacent to the site of primer binding, and the bottom strand representsthe newly synthesized DNA that extends from the primer and is complementary to the template strand. Insertion of a ddNTP is represented by *A, *C, *T. In the A reaction, for example, a *G, combination of dATP and ddATP is included and for each position where an A can be incorporated, either a dATP or a ddATP can be added. If ddATP (*A) is inserted the chain will terminate, but if dATP (A) is used DNA synthesis will continue incorporating the other dNTPs. When the next A is to be inserted, again DNA synthesis may or may not be terminated (bottom). Electrophoresis and autoradiography is used to detect the DNA extended from the primer. A small amount of a tagged (usually with 32Por 35S) dNTP is included in the sequencing reactions to label the newly synthesized DNA.The fragment lengths of the newly synthesized DNA molecules are determined for each of the 4 reactions by electrophoresis through a high resolution polyacrylamide gel which resolves DNA fragments that differ in length by a single nucleotide. Each DNA fragment synthesized from the A reaction must terminate at an A, and so on.The shortest fragments will have terminated after the addition of only a few nucleotidesto the primer. By size ordering the fragments generated from each of the 4 reactions,the DNA sequence of the newly synthesized DNA is determined.The sequence is read from the bottom of the gel (the shortest fragments, closest to the primer) to the top of the gel (longer fragments extending from the primer) in the 5' to 3' direction.This sequence is complementaryto the template strand.

'

v

\; 7

1 . I

used in the analysis of specific genes or gene segments from the DNA of individual patients for the analysis of mutation or variations in DNA sequence. Before the availability of polymerase chain reaction technology,

this was possible only by constructing a cDNA or genomic DNA library from each individual patient and then screening the library for the gene of interest, a procedure that takes many days. Using polymerase

266

Shore and Kaplan
double stranded template

Clinical Orthopaedics and Related Research

Fig 17. Polymerase chain reaction (PCR). Polymerase chain reaction is used for the amplifica5' tion of a specific region of DNA. Any double 4 denature stranded DNA can be amplified by this method 5' 3' 3' 5 ' I providing that the sequence flanking the region J anneal primers Cycle 1 to be amplified has been determined. This se5' 3'--5' 3' quence information is used to design 2 oligonu3' 5-' '3 5' cleotides, 1 complementary to each strand of 4 DNA synthesis (extension) the target template DNA (bold lines), that will serve as primers for DNA synthesis across the 4denature region to be amplified.One primer is shown as a white box and the other primer as a striped box. Three basic steps occur in each polymerase 4 anned primers chain reaction cycle: denaturation of double' Cycle 2 stranded DNA, annealing of primers, and extension of DNA synthesis. In the first cycle, 4 DNA synthesis (extenson) double-stranded template DNA is heated to denature, or separate, the double strands. Subsez P quent cooling of the DNA in the presence of a 4denature large excess of the 2 oligonucleotide primers allows the hybridization of the primers to their complementary sequences in the template DNA. Deoxyribonucleic acid polymerase, with the 4 nucleoside triphosphates, extend DNA J anneal primers synthesis from the 3' ends of the 2 primers.The cycle of denaturation, annealing, and extension Cycle3 etc is repeated, usually 25 to 30 times, to synthesize many copies of the DNA region between the 2 primers. Many of the products of the first few rounds of amplification are heterogeneous in P size because extension of the DNA strands can continue past the primer site when copying dixe rectly from the template DNA strands. However, as more strands that end at the primer site are synthesized, the portion of DNA between the primers will be preferentiallyamplified and becomes the dominant product of the reaction.
5' 3'

3q/DNA

I

-

-

I

_3

-

chain reaction, the targeted gene region can be amplified (using primers designed from the known sequence of a normal copy of the gene) in a single day, and the DNAs from many patients can be easily analyzed simultaneously. After polymerase chain reaction, the amplified DNA can be cloned for further analysis or the polymerase chain reaction product can be analyzed directly. In addition to screening for mutations, polymerase chain reaction is also useful to detect viral infections such as acquired immunodeficiency syndrome (AIDS). Even if the virus is present in low abundance, as long as its nucleotide sequence is known, the viral DNA can be detected by polymerase chain reaction amplification. Polymerase chain reac-

tion also is becoming of increasing importance in forensics. Although the DNA of all people is very similar, each individual's DNAis slightly different. These differences can be detected through polymerase chain reaction and used to generate a unique DNA profile, analogous to a person's fingerprints.

ANALYSIS OF GENE STRUCTURE
Many techniques for analyzing DNA have been described above: gene cloning, gel electrophoresis, Southern blotting, DNA sequencing, and chain reaction amplification. In this section, several additional applications of recombinant DNA technology are described.

Number 320 November. 1995

Tools of Molecular Biology

267

A

Restriction Map
0
1 2

3

4

5

6

7

8

9

10kb
I

Allele A

0
Probe

H3

Allele B

3 *
0
Probe

RI

B

Southern blot of EcoRl digest AA AB
BB

6 kb

2 kb

Fig 18A-6. Restriction fragment length polymorphisms (RFLPs). (A) Restriction maps for 2 alleles are shown. Both alleles have recognition sites for Hind Ill (H3), Eco RI (RI), and Xho I (X) at the same locations, with the exception of an Eco RI site that is present in allele A (at 5 kb from the leftmost Hind Ill site) but is absent in allele B. (B) Genomic DNA from individuals that are homozygous for allele A (AA), heterozygous for alleles A and B (AB), or homozygous for allele B (BB) is digested with Eco RI, electrophoresed through an agarose gel, and Southern blotted (see Fig 13A). The blot is probed with the 0.75 kb Xho I-Eco RI fragment from the region of interest (shown in [A] beneath each restriction map, at positions 6.25 to 7 kb). The probe detects a single Eco RI fragment of 2 kb in AA homozygotes. This fragment is the result of Eco RI sites at positions 5 kb and 7 kb, as shown in the allele A map above. The probe detects a single Eco RI fragment of 6 kb in BB homozygotes. This fragment is the result of Eco RI sites at positions 1 kb and 7 kb, as shown in the allele B map above. The probe detects 2 Eco RI fragments in AB heterozygotes: 1 of 2 kb generated from the A allele and 1 of 6 kb generated from the B allele.

Restriction Mapping and Restriction Fragment Length Polymorphisms
A restriction endonuclease will cut a doublestranded DNA molecule into pieces of DNA known as restriction fragments. The recognition site of each enzyme is specific for a particular short nucleotide sequence (Fig 2). When a sample of the same DNAis digested with a different restriction enzyme, cleavage will occur at different sites within the DNA; the sizes and numbers of fragments from the 2 digestions will differ. By using several enzymes and combinations of these enzymes, a restriction map of the DNA can be constructed (Fig 3). This map shows the positions of each restriction site relative to the others. Restriction maps are useful in the process of gene cloning (see above). Specific restriction fragments can be visualized after gel electrophoresis and staining with ethidium bromide for cloned DNAs, or by Southern blotting and hybridization with a specific probe for genomic DNA.

Because restriction sites characterize specific regions of DNA sequence, restriction maps also serve as a means of comparing the same region of DNA from several sources (such as among individuals or between species) without determining a complete DNA sequence. Differences in the size (length of the DNA fragment) and/or number of restriction fragments generated from a DNA region when the DNAs are cut with the same restriction endonuclease are known as restriction fragment length polymorphisms (RFLPs)(Fig 18). The variation in a DNAregion detected by an restriction fragment length polymorphism reflects differences in the DNA sequences being compared. This difference may be attributable to a nucleotide alteration that changes the restriction endonuclease recognition site itself, or can reflect a mutation such as an insertion or deletion of DNA sequence in the restriction fragment that is generated. Southern blotting and restriction fragment length polymorphism analysis are used to

268

Shore and Kaplan

Clinical Orthopaedics and Related Research

examine variation among individuals of a family, leading to the recognition of polymorphisms that represent either normal variation in the population or variation that may be associated with a particular disease phenotype (Fig 18). In many cases an restriction fragment length polymorphism is a marker that is linked, or coinherited, with the disease allele.

Genetic Linkage Analysis and Positional Cloning Genetic mapping through genetic linkage analysis and positional cloning have become very important in medical genetics. These are the only methods that allow the identification of some genes, such as certain disease genes, for which the gene product has not been identified yet. Functional cloning, which has been the traditional strategy for cloning genes, begins by identifying and purifying the protein product of the gene, followed by determining a partial amino acid sequence of the protein and using that information to design oligonucleotide probes that can be used to screen a library and recover the gene that encodes the protein of interest. In contrast, positional cloning relies only on the chromosomal location of the gene. Genetic linkage analysis maps the disease gene to a specific chromosomal position. Genes in that chromosome region then are identified, cloned, and examined for correlation to the disease in question. The RNA and protein products of genes identified in this manner can be characterized and examined to reveal the cellular pathways involved in the disease condition. When 2 genes are located on different chromosomes, they are not linked, and the probability that any allele of the first gene will be coinherited with a specific allele of the second gene is random, or 50%.This also is true for 2 genetic loci that are located far apart on the same chromosome, because a crossover event will almost always occur between these loci. The closer that 2 genes or genetic markers are located on the same

chromosome, the greater the probability that specific alleles for these genes will be inherited together or show linkage. In genetic linkage analysis, polymorphic markers distributed throughout the chromosomes are tested for coinheritance with a particular disease phenotype within a family. Linkage analysis is facilitated greatly by large numbers of polymorphic markers spaced at small intervals along the DNA of the chromosomes, and the efforts of many laboratories during the past few years have resulted in a catalog of such markers. Many of the useful markers are restriction fragment length polymorphisms, but the most of the markers are derived from polymerase chain reaction amplification of short tandemly repeated sequences. Scattered throughout the human genome are many loci that contain a variable number of identical sequences joined together in tandem. These variable number tandem repeat (VNTR) sequences are 2 to 100 nucleotides long and occur in 3 to 60 or more copies. Microsatellites, or short tandem repeat polymorphisms (STRPs), form a subgroup of variable number tandem repeat sequences that consists of 1 to 5 nucleotides repeated in a tandem arrangement. At a specific chromosomal locus, the number of repeats in an short tandem repeat polymorphism is highly variable among individuals, such that only individuals who are related to one another will tend to contain a specific short tandem repeat polymorphism with the same number of repeats. Because short tandem repeat polymorphisms are very polymorphic, they are extremely useful as genetic linkage markers. The number of repeats within an short tandem repeat polymorphism at a specific chromosomal locus can be detected in individuals through polymerase chain reaction methodology. Polymerase chain reaction primers are designed from the sequence flanking the repeat region, and used to amplify the repeat region from an individual’s genomic DNA. The polymerase chain reaction products are electrophoresed through polyacrylamide gels that

Number 320 November, 1995

Tools of Molecular Biology

269

can resolve amplified DNAs that differ in as little as 1 nucleotide in length. By comparing the inheritance pattern of the marker among affected and unaffected members of a family with the presence of the disease, linkage can be established or excluded. In addition to their use as genetic linkage markers, variable number tandem repeat sequences have important uses in forensic medicine and identification. Some variable number tandem repeat sequences, such as short tandem repeat polymorphisms, are detected readily by polymerase chain reaction techniques as described above. Other variable number tandem repeat sequences are detected through restriction digestion and Southern blotting of genomic DNA (Fig 19). If the blot is incubated with a probe that will hybridize only to a single locus containing the variable number tandem repeat sequence (for example, the probe contains unique sequence adjacent to the repeat region), the 2 alleles for that locus in an individual can be visualized. An example of the inheritance of allelic polymorphisms at a single locus is shown in Figure 19. The same blot also can be incubated with a probe that will hybridize to the repeated sequence. This method will not identify a specific variable number tandem repeat sequence at a single chromosomal locus, but instead will identify multiple loci that contain that repeated sequence. The pattern of DNA fragments generated from the DNA of a single individual is essentially unique for that person. Use of variable number tandem repeat sequences to compare the identities of 2 people in this manner is known as DNA fingerprinting.

which a gene is located are fluorescence in situ hybridization (FISH), which hybridizes a probe labeled with a fluorescent tag to chromosomes and determines the gene location microscopically, and hybridization to a somatic cell hybrid mapping panel, which uses restriction enzyme digestion of DNA, Southern blotting, and hybridization to correlate probe hybridization with a specific human chromosome or chromosome region.

METHODS OF RNA ANALYSIS
Many of the techniques applied to DNAanalysis can also be applied with slight variations to RNA analysis, such as gel electrophoresis, blotting to a membrane, and hybridization. Other methodologies, such as restriction endonuclease digestion, cloning, and sequencing, can be applied only to DNA. One way to circumvent these limitations is to synthesize a complementary DNA copy (cDNA) of the RNA of interest (Fig 8) and then analyze the cDNA. To understand gene function and regulation, it is necessary to examine the cells and tissues in which a particular gene is active and determine the abundance of the RNA transcript and the timing of its expression. Several methods are available for the detection of RNA and are described briefly.

Northern Analysis
Northern analysis is very similar in procedure to Southern analysis (see above), except that in Northern analysis RNA is analyzed (Fig 13B). After isolation of RNAfrom cells or tissue, the RNA is size fractionated by agarose gel electrophoresis. Unlike DNA that is digested with restriction enzymes before electrophoresis, the lengths of RNAs are typically in the size range that will resolve easily on an agarose gel. A replica of the gel is made by transferring, or blotting, the RNAfrom the gel to a piece of nitrocellulose or nylon membrane. Specific RNAs are detected by hybridization with a labeled probe, followed by autoradiography.

Physical Mapping of Genes Cloned DNAs (cDNA or genomic) can be used to identify the chromosomal location of a gene. This can be important in associating a specific gene with a disease, for example, when a disease has been genetically mapped to a chromosomal locus or the disease is associated with a chromosome translocation. Two methods for identifying the chromosome on

270

Shore and Kaplan

Clinical Orthopaedics and Related Research

Allele
1

2

3
4

5

6

7
8

Fig 19. Detection of allelic polymorphisms. A hypothetical example of the inheritance of alleles for a single locus in a family is described. The top portion of the figure shows the family pedigree. Circles are females and squares are males. The parents (5 and 6) and grandparents (1 and 2; 3 and 4) of 6 children (7-12) are represented. Genomic DNA from each individual can be obtained easily from the white blood cells in a peripheral blood sample. In this example, it has been previously determined that the DNA at the locus of interest shows 8 polymorphic variants, or alleles, when the DNA has been cut with a particular restriction endonuclease. The polymorphic variants are restriction fragments of unique lengths that can be resolved by agarose gel electrophoresis (see also Fig 18). After digestion of the DNA from each individual with the restriction endonuclease, the DNA is electrophoresedthrough an agarose gel. The gel is Southern blotted and the filter is hybridized with a labeled probe.The probe is complementary to DNA sequences that occur at a single site, or locus, of interest within the chromosomal DNA. After probe hybridization, the filter is exposed to xray film, and the resulting autoradiogram is shown below the pedigree. Each lane of the autoradiogram corresponds to the DNA of the individual in the pedigree with which it lines up. Each individual is found to contain 2 different alleles for the locus examined, and the inheritance pattern of the alleles can be followed. For example, individual 5 has inherited a ’1’ allele from his father and a ‘5’ allele from his mother; individual 6 has inherited a ‘7’allele from her father and a ‘8’ allele from her mother. This type of analysis can be used to examine family relationships, or used as a diagnostic tool if a specific allele is found to be associated with a disease phenotype.

Northern analysis can be used to determine the relative abundance of a particular mRNA, for example, in comparing the RNA from a normal tissue with that from a disease tissue. Northern analysis also will reveal the size of the detected RNA, indicating, for example, if a mutation has caused a shortened form of the mRNA to be synthesized. The RNA that is applied to the gel may be either total cellular RNA or the poly(A) fraction of the total RNA. Poly(A) RNA, or polyadenylated mRNA, is the RNA fraction that is translated during protein synthesis. Abundant mRNAs can be detected by Northem analysis of total RNA; however, less abundant mRNAs are detected more easily when enriched in the poly(A) fraction (Figs 6,7).

Ribonuclease Protection Assay An alternate method for the detection of a specific RNA is the ribonuclease protection assay. In this method, a short labeled RNA probe (typically 100-500 nucleotides) specific for the mRNAof interest is hybridized in solution to the isolated cellular RNA. The hybridization reaction is treated enzymatically to preferentially digest single stranded RNAs (unhybridized molecules and single stranded ends on hybrid molecules). The resulting RNA-RNA hybrids then are electrophoresed through a high resolution polyacrylamide gel for identification and quantification. The ribonuclease protection assay is more sensitive than Northern analysis in detecting very low amounts of a specific mRNA. This method also allows for quantitation of the RNA. It cannot, however, determine the size of the synthesized mRNA because only the protected fragment of RNA corresponding to the size of the probe will be detected. In Situ Hybridization When RNA is extracted from cells or tissue, the information about the localization of the RNAis lost. In situ hybridization is used to identify the distribution of specific RNAs in intact cells and tissues. Cells or sectioned tissue are prepared on microscope slides and

Number 320 November, 1995

Tools of Molecular Biology

271

treated using methods that retain the integrity of the RNA and favor hybridization of a labeled probe to the RNA. The probes that are used for in situ hybridization can be DNAor RNAprobes, and can be tagged with radioactive, histochemical, or fluorescent labels. The same hybridization principles that apply to Northern and Southern analysis apply to in situ hybridization as well. After hybridization, the distribution of the target RNA is determined microscopically. (Note that the term in situ hybridization also is used to describe the hybridization of a labeled probe to the DNA in chromosomes.)

ies that will specifically bind the target protein. This antibody-protein complex is detected by incubation with a labeled secondary antibody detected by a radiographic, fluorescent, or histochemical signal.

METHODS OF PROTEIN ANALYSIS
Much information about normal and abnormal gene function can be understood by the examination of gene structure. However, to understand the effect that a gene defect has on its function the protein products need to be examined.

Immunohistochemistry Immunohistochemistry methods are analogous to in situ hybridization for nucleic acids and are used to detect the distribution of a specific protein within a cell or tissue. The localization of a protein to specific cells of a tissue or to specific sites within the cell (such as the cell membrane, nucleus, or cytoplasm) provides important information about the function or misfunction of the protein. As in Western blotting, a specific antibody serves as the probe to detect the protein of interest. ANALYSIS OF TRANSCRIPTIONAL REGULATION AND GENE EXPRESSION
Many genetic defects cause disease because an incorrect form of the gene product is synthesized. However, disease also can be the result of the over or under abundance of a gene product, or the synthesis of that product at an inappropriate place or time. In these instances, the genetic defect affects the regulation of the gene expression, often at the transcriptional level. Analysis of gene regulation requires the identification and characterization of the regulatory regions of the gene: the regions of a gene that encode binding sites for the RNA polymerase complex and for other promoter DNA binding proteins that activate or repress the synthesis of the RNA transcript by RNA polymerase. The promoter is the gene regulatory region that determines the position of transcription initiation by directing the site of RNA polymerase complex binding to the gene. It is located upstream (away from the direction of transcription) of the mRNA start site that codes for the 5’ end of the mRNA(Fig 6). Additional DNA sequences (or elements) within

Western Analysis Western blotting is analogous to Southern and Northern blotting but applies to the analysis of proteins rather than nucleic acids. Western blotting is used to determine the size and relative abundance of proteins. As with Northern analysis, this methodology can provide information on the differential expression of the gene product in a patient with a genetic disease. Proteins are extracted from cells or tissue and size-fractionated by electrophoresis through a polyacrylamide gel. The gel typically contains sodium dodecyl sulfate that binds to proteins to give them a negative charge, allowing them to migrate to the positive electrode during electrophoresis. Because of the small pore size of a polyacrylamide gel, transfer of the proteins to a nylon membrane is facilitated by electrophoretic transfer (rather than capillary action which is typically used for nucleic acid transfer from agarose gels). The filter containing the fractionated proteins is incubated with antibod-

272

Shore and Kaplan

Clinical Orthopaedics and Related Research

the promoter region can direct the rate and tissue specificity of mRNA synthesis. By mapping the precise location of the RNA start site of an mRNA, the position of the promoter region can be determined. Several techniques currently are used to map the 5’ ends of mRNAs, including primer extension, S 1 nuclease digestion, and rapid amplification of cDNAends (RACE) analysis. Once a promoter region has been identified and cloned, analyses to identify the DNA binding sites for transcriptional regulatory proteins can be done and sequence specific DNA binding proteins can be isolated and identified. DNA sequencing of the promoter will reveal DNA sequence elements that have homology to known transcription factor binding sites; however, actual use of these sites must be verified either by evidence of specific protein binding or by a functional assay showing the influence of the sites on transcription. Techniques such as DNA footprinting and the gel mobility shift assay are used to confirm protein binding to a specific promoter sequence. Promoter regions can be tested functionally in reporter gene expression assays.

tory proteins. However, novel transcription factors can be identified by this method as well. The DNA:protein complex can be recovered from the gel and the proteins purified and analyzed. This methodology will help to identify the nuclear proteins that regulate the transcriptional activation of a gene of interest.

DNA Footprinting In DNAfootprinting, a specific segment of a promoter region is labeled with a radioisotope tag and incubated with a specific transcription regulatory protein. The DNA:protein complex is partially digested with DNase I. The DNA regions that are protected from digestion due to protein binding are identified after electrophoresis through a high resolution polyacrylamide gel. This technique helps determine the exact region of DNA:protein interaction involved in the regulation of gene activity. Reporter Gene Expression Assays To study the function of putative gene regulatory regions, these regions can be cloned into an expression vector containing a reporter gene. The inserted DNA fragment sits adjacent to a gene that encodes a protein that readily can be detected in an assay (reporter gene). The cloned DNA is transfected into mammalian cells in culture. If the DNAcloned into the expression vector contains sequences that promote transcriptional activation, an mRNA will be synthesized from the reporter gene and the protein product will be made and detected by an appropriate assay. Several reporter systems are currently in use, including chloramphenicol acetyltransferase and firefly luciferase. In a typical experimental approach, regions of DNA sequence in the promoter are tested systematically to determine their effect on the level of reporter transcript that can be synthesized. In this way, functional regulatory sequences can be identified. Transgenic Mice Analysis of gene expression in cell culture (in vitro) can provide much information about the regulation of gene expression. However,

Gel Mobility Shift Assay Age1 mobility shift assay (also known as a gel retardation assay) is a means of showing the binding of proteins to a promoter region. In this assay, a specific segment of a promoter region is labeled with a radioisotope tag and incubated with a nuclear protein extract. (Because transcription occurs within the nucleus of a eukaryotic cell, the transcriptional regulatory proteins can be recovered in a nuclear extract.) The mixture is electrophoresed through a polyacrylamide gel. If proteins bind to the promoter DNA, the resulting DNA:protein complex will migrate more slowly through the gel than DNA alone. The identity and specificity of binding can be determined by competition of protein binding using oligonucleotides. Gel shift assays can be used to identify binding of previously characterized regula-

Number 320 November, 1995

Tools of Molecular Biology

273

many questions, such as the genetic control of mammalian development, require an animal (in vivo) system. In recent years, technical advances have provided the ability to introduce foreign genes into mice (forming transgenic mice) and to examine the pattern of expression and the phenotypic effects of the introduced gene (called a transgene). Transgenic mice are produced by microinjecting several hundred copies of the cloned gene into fertilized mouse eggs. The eggs then are transferred to a foster mother where some of the injected eggs will develop to term and contain the foreign DNA integrated into the mouse chromosomes. If the transgene is integrated within the chromosomes of the germ line cells, the transgene will be stably inherited by the progeny of the transgenic mouse. Although the site of DNA integration appears to be random regarding chromosomal location, the transgene often is expressed in a tissue-specific pattern similar to that of the endogenous copy of the gene. This allows the examination of expression of the gene or variants of the transgene to be examined. The insertion site of a transgene also can be within a chromosomal location that contains a gene which, now interrupted, results in a mutant phenotype. Characterization of the disrupted locus has led to the identification of previously unknown genes. Transgenes also can be targeted to be expressed in specific tissues: Cloning the gene of interest with a tissue specific promoter will direct its expression to occur only in tissues which can activate that promoter. The timing of transgene expression also can be regulated by cloning the gene of interest with an inducible promoter that can be activated at specific times. Valuable information on protein function also can be gained through observation of the effects of the absence of a specific gene product. Knockout mice are a type of transgenic mouse which results in a null mutation for the gene of interest. Through homologous recombination, which targets the transgene to the site of the endogenous copy of

the gene in the mouse chromosomes, the endogenous mouse gene can be disrupted. Mice homozygous for the disrupted gene are bred and examined.

SUMMARY
The techniques described in this article have been developed during the past 25 years and provide a powerful set of tools that have revolutionized the understanding of gene structure and function in health and disease. All these tools readily are applicable to a more fundamental understanding of the genes involved in the embryogenesis, maintenance, repair, regeneration, and aging of the musculoskeletal system. Many of these techniques are being used daily in medical research laboratories, in medical diagnostic laboratories, and in forensic medicine laboratories. This article presented the basic tools and principles of molecular biology, but only touched on the many variations and applications of this technology. It is incumbent on the informed physician to understand the principles behind these methods, their possible use, and their inherent limitations. They will continue to transform our world, enlighten our understanding of the biologic basis of our existence, and enhance our ability to prevent and treat human disease.

Further Reading
Alberts B, Bray D, Lewis J, et al: Molecular Biology of the Cell. Ed 3. New York, Garland Publishing, Inc 1994. Drlica K: Understanding DNA and Gene Cloning: A Guide for the Curious. Ed 2. New York, J Wiley and Sons, Inc 1992. Lewin B: Genes V. New York, Oxford University Press 1994. Sambrook J, Fritsch EF, Maniatis T Molecular Cloning: ALaboratory Manual. Ed 2. New York, Cold Spring Harbor Laboratory Press 1989. Shore EM, Kaplan FS: Molecular Biology for the Clinician. Part 1 General Principles. Clin Orthop 306:264-283, 1994. Thompson MW, McInnes RR, Willard HF: Genetics in Medicine. Ed 5. Philadelphia, WB Saunders Company 1991. Watson JD, Gilman M, Witkowski J, Zoller M: Recombinant DNA, Ed 2. New York, WH Freeman and Company 1992.

274

Shore and Kaplan

Clinical Orthopaedics and Related Research

GLOSSARY
5‘ and 3’ Nucleic acids have 5’ (5 prime) and 3’ (3 prime) ends, designating the orientation of the ribose (in RNA) or deoxyribose (in DNA) sugarphosphate backbone of the polynucleotide. Synthesis of DNAor RNAalways occurs by the addition of a nucleotide to the 3’ end of a polynucleotide; synthesis is described as occurring in the 5’ to 3’ direction. By convention, the end of a gene where the promoter region is located (and therefore the position of the transcription start site) is called the 5’, or upstream, region; the opposite end is the 3‘, or downstream, region. A,C,G,T Abbreviations for the 4 types of nucleotides in DNArepresenting the base contained in each nucleotide: A (adenine), C (cytosine), G (guanine), or T (thymine). In a double-stranded DNA molecule, the 2 polynucleotide chains are held together by hydrogen bonds between pairs of bases: Aalways pairs with T C always pairs with G. (Also see: nucleotide, base) U Abbreviation for the nucleotide base uracil, which replaces thymine (T) in RNA. allele Alleles are alternate forms of a genetic locus. For each locus, a single allele is inherited from each parent. Alu family A set of related sequences, each about 300 bp long, that are dispersed throughout the human genome. The named was derived from the Alu I cleavage site that each element contains. amino acids The units, or building blocks, of proteins. antisense strand of DNA The strand of double stranded DNA that is complementary to mRNA. This strand is the transcribed DNA strand, serving as the template for RNAsynthesis, sometimes referred to as the noncoding DNA strand (Also see sense strand). antiparallel The 2 polynucleotide strands of the DNA double helix associate with each other in opposite orientations: the 5‘ end of 1 strand is aligned with the 3’ end of the other. autosome Achromosome that is not 1 of the sex (X or Y) chromosomes. bacteriophage A bacteria virus. Bacteriophage DNAs are used as cloning vectors. Two types of bacteriophage commonly used are bacteriophage lambda (h) and bacteriophage M13. base The nitrogenous base that is part of the nucleotide building blocks of DNA and RNA. Frequently, the name of the base is used as a shorthand term for the nucleotide because the base is the distinctive component of each nucleotide. When referring to double-stranded DNA, the term base often is used instead of base pair. base pair (bp) Two complementary nucleotide bases joined by hydrogen bonds. The 2 strands of double stranded DNA are held together through the hydrogen bonds of base pairing. C always pairs with G; A pairs with T (in DNA) or U (in RNA). box A highly conserved DNA sequence or secondary (folded) structure of DNA. For example, the TATA box, which is recognized and hound by the RNA polymerase I1 complex.

cap A modified nucleotide (7-methyl-guanosine) that is added to the 5’ end of mRNAs. Appears necessary for processing, stability, and translation. cDNA Complementary DNAor copy DNAis synthetic DNA that has been copied from an mRNA template through the activity of the enzyme reverse transcriptase. cDNA, distinct from genomic DNA, will include only those regions of the genome that are contained in mature RNA (exons). chromatin The compact structure of double-stranded DNA and associated proteins that compose the chromosomes. chromosome An autonomous unit of the genome that contains many genes. Each chromosome contains a single long molecule of double-stranded DNA that is complexed with proteins to form chromatin. clone This term is used to describe a DNA molecule produced by recombinant DNA technology. Most accurately, a clone is a large number of identical cells or molecules that are derived from a single ancestral cell or molecule. coding DNA strand See sense strand of DNA. coding region The RNA coding region of a gene is the DNA segment that is transcribed to form an RNA. The protein coding region is the segment of mRNA that is translated into the amino acid sequence of the protein product of a gene. Note that although the protein coding region is always contained within the mRNA coding region, the mRNA coding region will include segments that are not used for protein coding. codon A sequence of 3 consecutive bases (a triplet) in a DNA or RNA polynucleotide that specifies an amino acid or a stop signal for translation. (Also see degeneracy.) cohesive ends Ends of double stranded DNA molecules in which 1 of the strands extends further than the other. The protruding strand is single stranded and therefore has the capacity to base pair with another single stranded end that contains a complementary base pair sequence. This forms a basis for joining 2 DNA molecules together. Cohesive ends are also known as sticky ends. colinearity The linear array of nucleotide base sequence of DNA (and the RNA transcribed from that DNA) directly corresponds to the sequence of amino acids in the protein encoded by that DNA (or RNA). colony A visible cluster of bacterial cells, usually grown on an agar surface. All cells within a colony are derived from a single parent cell; therefore, all the cells are identical. The colony often is referred to as a clone. consensus sequence An idealized sequence of nucleotides derived from comparison of several DNA sequence elements with similar function. The nucleotide at each position in the consensus sequence is the 1 most frequently found at that position. For example, a novel DNAsequence can be screened for similarity to the consensus sequence of the TATA box to predict potential sites of RNA polymerase I1 complex binding. cosmid An artificially constructed cloning vector that has been derived from plasmid and phage DNAs. Cosmids contain the cos gene of bacteriophage

Number 320 November. 1995

Tools of Molecular Biology

275

lambda and can be packaged into lambda phage particles for transfection into Escherichia coli cells, but are able to replicate like plasmids within the host cells. Cosmids facilitate the cloning of larger fragments than can be obtained with plasmid vectors. degeneracy The genetic code is described as degenerate because of the redundancy in coding specificity: Most of the 20 amino acids are encoded by more than 1 of the 64 codons. deletion The loss of a portion of DNA; it can be a single base or a large segment of a chromosome. dideoxy chain termination An enzymatic DNA sequencing method that depends on termination of DNA synthesis at specific nucleotides. dideoxy nucleotides (ddNTPs) Nucleotides that contain no hydroxyl group at either the 2‘ or 3’ position of the ribose. DNA synthesis cannot be extended from a ddNTP by DNA polymerase because there is no 3’ hydroxyl group with which to form a phosphodiester bond. diploid cell A cell that contains 2 homologs of each chromosome. A human diploid cell contains 46 chromosomes: 2 copies of each of the 22 autosomes plus 2 sex chromosomes. DNA (deoxyribonucleic acid) DNA is a polynucleotide that contains the code of information (genes) that specifies the cellular functions of all living organisms (with the exception of some viruses that use RNAas their genetic material). DNA fingerprinting The process of using highly polymorphic DNA markers, such as restriction fragment length polymorphisms and variable number tandem repeats, to compare the identities of individuals. DNA ligase An enzyme that joins the ends of 2 DNA molecules together by catalyzing the formation of phosphodiester bonds between 3’OH and 5’P ends. DNA polymerase An enzyme that synthesizes DNA. DNA synthesis DNA synthesis always occurs in the 5’ to 3’ direction and requires an existing DNA strand to serve as a template. The newly synthesized DNA strand has a complementary base sequence relative to the template strand, and therefore is able to hydrogen bond through complementary base pairing to the template DNAstrand. downstream A term used to describe sequences proceeding further in the direction of expression of a gene. For example, the coding region is downstream of the transcriptional start site. It also is used to refer to the noncoding region flanking the 3‘ end of the RNAcoding region of a gene. enhancer Transcriptional regulatory DNA sequences (or elements) that can function from upstream or downstream of the transcribed region of a gene and from distances of several kilobases. Enhancers only function in cis; therefore, they only can affect genes on the same chromosome. exon Atranscribed region of a gene that is present in a mature (postsplicing) RNA. An exon can contain translated regions and untranslated regions (UTRs). expression library A set of DNA fragments (typically cDNAs) that have been cloned into a vector that facilitates the translation of the cDNA into a protein. FISH (fluorescence in situ hybridization) A hybridization technique that uses fluorescently labeled

probes to locate complementary molecules in chromosomes or tissue sections. flanking regions DNA sequences that are upstream or downstream of the transcribed region of a gene. frameshift mutation A deletion or insertion in DNA that is not a multiple of 3 nucleotides. This changes the reading frame (the selection of triplet bases that form codons), resulting in a protein that contains an altered amino acid sequence, typically of an incorrect length. functional cloning Gene cloning that is accomplished by first identifying and partially characterizing the protein product of the gene. gel electrophoresis A method of separating molecules (DNA, RNA, or protein) based on their sizes and electrical charge. The molecules are drawn through a gel matrix (such as agarose or acrylamide) by an electrical field. Small molecules will migrate through the gel faster than larger molecules (assuming charge and conformation have neutral effects). gene A unit of heredity. In molecular biology, a segment of genomic (chromosomal) DNA that is required for production of a functional product (protein or RNA). A gene contains coding and regulatory regions. gene expression Describes the process by which the DNAsequences of a gene are converted into RNAor RNA and protein. genetic code The triplet codons that specify the 20 amino acids. The genetic code interprets nucleotide (DNAand RNA) language into protein. genetic linkage analysis Chromosomal loci (genes or DNA markers) on the same chromosome show linkage if they have a tendency to be inherited together. Linkage analysis within families is used to show linkage of 1 or more chromosomal loci with a particular disease phenotype. In this way, disease genes can be identified through positional cloning. genome The entire genetic information contained within the complete DNA sequence of the chromosomes in a cell, an individual, a population, or a species. genomic DNA DNA that has been isolated directly from cells or chromosomes or the cloned copies of all or part of that DNA. This DNA includes DNA sequences from coding and noncoding regions of the genome. histones Positively charged (basic) proteins that associate with DNA in the chromosomes. Histones provide the structural components for the primary level of chromatin compaction and have been highly conserved through evolution. homologous chromosomes (homologs) Chromosomes that contain the same gene loci in the same order. In a diploid cell, which contains 2 copies of each homolog, 1 homolog is inherited from each parent. hybridization The process of matching complementary strands of DNA or RNA or both to form a double stranded molecule. immunohistochemistry Analogous to in situ hybridization for the localization of nucleic acids, immunohistochemistry techniques are used to localize a specific protein within cells or tissue through antibody binding.

276

Shore and Kaplan

Clinical Orthopaedics and Related Research

in situ hybridization Hybridization of a DNAor RNA probe to a target molecule that has not been extracted from its cellular location, for example, within a chromosome or in a fixed tissue section. independent assortment The random inheritance of chromosomes and unlinked genes. insertion The addition of nucleotides into a region of DNA; it can be a single base or a large segment of a chromosome. intron A transcribed region of a gene that is removed from the primary RNA transcript. inversion A rearrangement of chromosomal DNA that flips a region of DNAend to end. kilobase (kb) One thousand nucleotide bases in a DNA or RNA sequence. When refemng to doubledstranded DNA, the term is used as shorthand for kilobase pairs. library A collection of clones (cloned DNA) from a particular source such as an organism or specific tissue. ligation The joining of 2 DNA molecules through the formation of phosphodiester bonds (see also: DNA ligase). marker An identifiable locus on a chromosome (such as a gene, a restriction endonuclease recognition site, or a VNTR) whose inheritance can be followed. Markers can be within or outside of genes. The most useful markers are those that have many allelic forms (are highly polymorphic) within the population. meiosis The type of cell division that produces 4 haploid gamete cells from a diploid cell. meiotic recombination The reciprocal (equal) exchange of portions of chromosomes that occurs between chromatids of homologous chromosomes. This recombination, also called crossing over, occurs during the first prophase of meiosis. messenger RNA (mRNA) An RNA that contains protein coding information. Eukaryotic mRNAs are transcribed from “Class 1 ” genes by RNA poly1 merase 11. methylation This typically involves the modification of a cytosine base in DNA by the addition of a methyl group. Methylation appears to play an important, though not well understood, role in the regulation of gene expression. missense mutation Changes of I or more nucleotides in a codon that alters the amino acid specified by that codon. molecular cloning The isolation of a specific segment of DNA (such as a gene or part of a gene) and the generation of many identical copies, or clones, of that segment of DNA. mutation A heritable change in the genomic DNA sequence. A mutation may be neutral, deleterious, or simply provide genetic variability. Mutations (gene changes) are the foundation of evolution. noncoding DNA strand See antisense strand of DNA. nonsense mutation A base substitution in DNA that changes an amino acid codon to a termination codon. Northern transfer The process of transferring RNA that has been electrophoresed through an agarose gel to a solid filter support for hybridization. nucleic acid hybridization See hybridization.

nucleosome The basic unit of chromatin structure which consists of -200 bp of DNAand a histone octamer (146 base pairs of DNAwrapped around a histone octamer corn plus a segment of linker DNA which associates with histone H1). nucleotide A molecule composed of a nitrogenous base, a 5-carbon sugar, and a phosphate group. Anucleic acid (such as DNA) is a polymer of many nucleotides. A nucleoside is composed of a nitrogenous base and a 5-carbon sugar (no phosphate groups). The 4 nucleosides in DNA are deoxyadenosine, deoxycytidine, deoxyguanosine, and deoxythymidine; in RNA they are adenosine, cytidine, guanosine, and uridine (uridine replaces thymidine). Nucleotides are named by their nucleoside and the number of added phosphate groups, for example, adenosine triphosphate (ATP). (See also A, C, G, T, U). oligonucleotide A short sequence of nucleotides. phage See bacteriophage. plaque A clear area in a lawn of bacteria on an agar plate. The clearing contains bacteriophage particles that have been released when cells infected with phage were lysed, or killed, by the viral infection. Released phage can infect additional bacterial cells that will be lysed subsequently, increasing the plaque size. phagemid A genetically engineered plasmid to which regions of bacteriophage M13 have been added for increased versatility as a cloning vector. plasmid A small, circular double-stranded DNAmolecule that is found in bacteria and replicates independently of the host cell chromosome. Plasmids commonly are used as vectors in molecular cloning. point mutation Achange in a single nucleotide of DNA. polymerase chain reaction (PCR) A DNA synthesis reaction in which a specific region of DNA is copied, or amplified, many times. polymorphism A detectable difference in a DNA sequence among individuals at a specific locus. polypeptide Achain of amino acids. Aprotein may be composed of 1 or more polypeptides. positional cloning Gene cloning that is based on the chromosomal location of the gene as identified through genetic linkage analysis. primer A short oligonucleotide to which nucleotides can be added by DNApolymerase. probe A DNA or RNA molecule that is labeled, or tagged, and used to locate a complementary DNA or RNA molecule through hybridization. promoter DNA sequences, usually located in the region flanking the 5’ end of the RNA coding region of a gene, that determine the site of initiation of transcription and the quantity and sometimes the tissue specificity of the mRNA. The promoter directs which of the DNA double strands will be the template strand for transcription. The promoter is part of the regulatory region of a gene. recombinant DNA molecule A DNA molecule that contains segments of DNA from different origins (for example, a piece of human DNA that has been joined to a plasmid DNA). regulatory gene A gene that encodes a product, either RNA or protein, that regulates the expression of other genes.

Number 320 November, 1995

Tools of Molecular Biology

277

restriction endonucleases Enzymes that recognize and cut double-stranded DNA at specific nucleotide sequences (restriction sites). In bacterial cells, from which these enzymes have been isolated, these enzymes function to protect the cell against infection by foreign DNA. restriction fragment Asegment of DNAthat is generated when a larger DNA molecule is cut with a restriction enzyme. For example, a 3-kb linear DNA molecule could be cut by the restriction endonuclease EcoRI to generate 2 restriction fragments: 1 that is 1 kb and 1 that is 2 kb. restriction fragment length polymorphism (RFLP) Cleavage of DNA from 2 sources (for example, from 2 individuals) may generate restriction fragments of different lengths in a specific region of the DNA. This occurs if there are differences (or polymorphisms) in the DNA sequence (such as a single nucleotide difference) that creates or removes a recognition site for a specific restriction endonuclease or an insertion or deletion of DNA). Note that there are many DNA sequence differences among individuals and not all differences cause a gene defect or functional alteration. restriction map A map of a segment of DNA that shows the relative positions of restriction endonuclease recognition sites. reverse transcriptase An RNA-dependent DNA pol ymerase that is used to synthesize cDNA from RNA templates. ribosomal RNA (rRNA) In eukaryotic cells, a primary rRNA transcript is synthesized by RNA polymerase I and processed to yield 28S, 18S, and 5.8s rRNA. 18s rRNA complexes with proteins to fonn the small ribosomal subunit; 28S, 5.8S,and 5s (synthesized by RNA polymerase 111) rRNAs complex with proteins to form the large ribosomal subunit. ribosomes Cellular organelles that mediate the interaction between mRNA and tRNAs during translation of the mRNA to protein. Ribosomes are composed of 2 distinct subunits, both of which contain rRNAand proteins. RNA (ribonucleic acid) A polynucleotide synthesized from a DNA template. RNA structure is similar to DNA structure, except that the sugar component is a ribose instead of a deoxyribose and the base uracil replaces thymine. ribonuclease protection assay A highly sensitive method for the detection of RNAmolecules. This is the preferred method for quantitation of RNAs but, unlike Northern analysis, cannot be used to determine the size of a particular RNAmolecule. RT-PCR (reverse transcriptase PCR) A PCR reaction in which the starting template to be amplified is an RNAinstead of a DNAmolecule. AcDNAis first synthesized from the RNA using reverse transcriptase, followed by polymerase chain reaction amplification of the double-stranded cDNA. screening The process of identifying a fragment of DNAcontaining a specific DNAsequence among all of the clones in a DNA library. semiconservative replication The mode by which double-stranded DNA is copied to produce 2 identical DNA double-stranded DNA molecules. The 2

strands of the double helix separate (hydrogen bonds between base pairs are broken) and each strand serves as a template for the synthesis of a complementary strand. The process is semiconservative because each of the resulting doublestranded molecules contains a newly synthesized DNA strand and a strand from the original DNA molecule. sense strand of DNA The strand of double-stranded DNA that has the same 5’ to 3‘ sequence (or sense) as mRNA; also called the coding strand. This strand does not serve as the template strand for transcription. (Also see antisense strand.) sequence Refers to the nucleotide sequence, or order, of a given segment of DNA or RNA. Each nucleotide in a DNA sequence is identified by the base which that nucleotide contains: adenine (A), thymine (T), guanine (G), or cytosine (C). In RNA, uracil (U) replaces T. silent mutation A change in the DNA sequence that has no discernible effect. Southern transfer The process of transferring DNA that has been electrophoresed through an agarose gel to a solid filter support for hybridization. splice site The donor splice site is the boundary between the 3‘ end of an exon and the 5’ end of the adjacent intron. The acceptor splice site is the boundary between the 3’ end of an intron and the 5’ end of the adjacent exon. splicing The removal (splicing out) of introns and the joining (splicing together) of exons to form mature mRNA from the primary transcript. stop codon See termination codon. short tandem repeat polymorphisms (STRPs) Also called microsatellites; these repeated sequences are variable number tandem repeats that contain 5 or fewer nucleotides repeated in a tandem arrangement. structural gene A gene that codes for an RNA or protein product. TATA box A conserved AT-rich sequence found in the promoter region of many eukaryotic genes. This sequence element, located -25 base pairs upstream of the transcription start site, determines the site of transcription initiation. template A DNA strand that directs the sequence of newly synthesized DNA or RNA through complementary base pairing. See antisense strand of DNA and semiconservative replication. termination codon One of the 3 codons that specify the end of polypeptide synthesis; also called a stop codon. transcript The RNA product synthesized by an RNA polymerase enzyme. transcription The synthesis of RNA by an RNApolymerase. This synthesis requires a DNA template. transcription factor A protein that is needed for transcription to occur, but is not part of RNA polymerase itself. May bind directly to a DNA sequence element or to other proteins in promoter and enhancer regions of a gene. transfection The process of a bacteriophage, or bacterial virus, infecting a bacterial cell. transgene See transgenic mice.

278

Shore and Kaplan

Clinical Orthopaedics and Related Research

transgenic mice Mice that carry a foreign gene (or transgene) within their genomic DNA. transformation The process by which a bacterial cell takes up DNA molecules such as plasmid DNA. (Note that the term transformation also is used to describe the conversion of a normal cell into a tumor cell.) translocation The transfer of a portion of 1 chromosome to another chromosome. upstream A term used to describe sequences opposite to the direction of gene expression. For example, the initiation codon is upstream from the termination codon. Also used to describe the noncoding region flanking the 5’ end of the RNAcoding region of a gene. untranslated region (UTR) A segment of a gene that is transcribed and contained within an exon but is not translated into amino acid sequence. Untranslated regions commonly occur at the 3’ and 5’ ends of a mRNA. vector A DNAmolecule, such as a plasmid or bacteriophage, that is used as a carrier molecule for cloned

DNAs. The vector contains information that will allow recombinant molecules to be replicated in host bacterial cells. variable number tandem repeats (VNTRs) The human genome contains many loci that contain multiple copies of short DNAsequences. These repeated sequences can occur within or outside of genes. Variable number tandem repeats loci are highly polymorphic within the population. Variable number tandem repeats are used as polymorphic markers in genetic linkage analyses and in the identification of individuals such as in forensics and paternity testing. Western transfer The process of transferring proteins that have been electrophoresed through an acrylamide gel to a solid filter support for detection of a specific protein by antibody binding. Yeast artificial chromosome (YAC) cloning vectors that contain the centromere and telomere sequences of yeast chromosomes, and are used to clone very large (>lo00 kb) DNAfragments.