You are on page 1of 37

Nucleic Acid Structure and Function

"The importance of deoxyribonucleic acid (DNA) within living cells is undisputed" (Watson & Crick, 1953). This opening sentence of James Watson and
Francis Crick's second major paper, published shortly after the announcement of their proposed structure for the genetic material, has proven to be an
understatement. Today, it is readily apparent that Watson and Crick's breakthrough set off a firestorm of discovery and innovation that has continued for
over 50 years.

The material in this set of articles describes the science surrounding the structure and function of DNA. Here, you will find information on the chemical
structure of DNA; details about the organization of DNA into chromosomes, genes, and gene families; and data regarding important categories of
sequences within DNA, such as introns, exons, promoters, telomeres, and centromeres.

DNA Transcription
By: Suzanne Clancy, Ph.D. © 2008 Nature Education 

Citation: Clancy, S. (2008) DNA transcription. Nature Education 1(1):41

If DNA is a book, then how is it read? Learn more about the DNA transcription process, where DNA is converted to
RNA, a more portable set of instructions for the cell.
The genetic code is frequently referred to as a "blueprint" because it contains the instructions a cell requires in order to sustain itself. We now know that
there is more to these instructions than simply the sequence of letters in the nucleotide code, however. For example, vast amounts of evidence
demonstrate that this code is the basis for the production of various molecules, including  RNA and protein. Research has also shown that the instructions
stored within DNA are "read" in two steps: transcription and translation. In transcription, a portion of the double-stranded DNA template gives rise to a
single-stranded RNA molecule. In some cases, the RNA molecule itself is a "finished product" that serves some important function within the cell. Often,
however, transcription of an RNA molecule is followed by a translation step, which ultimately results in the production of a protein molecule.

Visualizing Transcription

The process of transcription can be visualized by electron microscopy (Figure 1); in fact, it was first observed using this method in 1970. In these early
electron micrographs, the DNA molecules appear as "trunks," with many RNA "branches" extending out from them. When DNAse and RNAse (enzymes
that degrade DNA and RNA, respectively) were added to the molecules, the application of DNAse eliminated the trunk structures, while the use of RNAse
wiped out the branches.

DNA is double-stranded, but only one strand serves as a template for transcription at any given time. This  template strand is called the noncoding strand.
The nontemplate strand is referred to as the coding strand because its sequence will be the same as that of the new RNA molecule. In most organisms, the
strand of DNA that serves as the template for one gene may be the nontemplate strand for other genes within the same chromosome.

Figure 1 : Nascent pre-mRNAs.


The RNA processing machinery is contained inside structures called "terminal knobs" in nascent RNA transcripts. Terminal knobs are visible in this
electron micrograph of chromatin spreads from yeast.

The Transcription Process


The process of transcription begins when an enzyme called RNA polymerase (RNA pol) attaches to the template DNA strand and begins to catalyze
production of complementary RNA. Polymerases are large enzymes composed of approximately a dozen subunits, and when active on DNA, they are also
typically complexed with other factors. In many cases, these factors signal which gene is to be transcribed.
Three different types of RNA polymerase exist in eukaryotic cells, whereas bacteria have only one. In eukaryotes, RNA pol I transcribes the genes that
encode most of the ribosomal RNAs (rRNAs), and RNA pol III transcribes the genes for one small rRNA, plus the transfer RNAs that play a key role in the
translation process, as well as other small regulatory RNA molecules. Thus, it is RNA pol II that transcribes the messenger RNAs, which serve as the
templates for production of protein molecules.
Transcription Initiation
The first step in transcription is initiation, when the RNA pol binds to the DNA upstream (5′) of the gene at a specialized sequence called a promoter (Figure
2a). In bacteria, promoters are usually composed of three sequence elements, whereas in eukaryotes, there are as many as seven elements.
In prokaryotes, most genes have a sequence called the Pribnow box, with the consensus sequence TATAAT positioned about ten base pairs away from
the site that serves as the location of transcription initiation. Not all Pribnow boxes have this exact nucleotide sequence; these nucleotides are simply the
most common ones found at each site. Although substitutions do occur, each box nonetheless resembles this consensus fairly closely. Many genes also
have the consensus sequence TTGCCA at a position 35 bases upstream of the start site, and some have what is called an  upstream element, which is an
A-T rich region 40 to 60 nucleotides upstream that enhances the rate of transcription (Figure 3). In any case, upon binding, the RNA pol " core enzyme"
binds to another subunit called the sigma subunit to form a holoezyme capable of unwinding the DNA  double helix in order to facilitate access to the gene.
The sigma subunit conveys promoter specificity to RNA polymerase; that is, it is responsible for telling RNA polymerase where to bind. There are a number
of different sigma subunits that bind to different promoters and therefore assist in turning genes on and off as conditions change.
Eukaryotic promoters are more complex than their prokaryotic counterparts, in part because eukaryotes have the aforementioned three classes of RNA
polymerase that transcribe different sets of genes. Many eukaryotic genes also possess  enhancer sequences, which can be found at considerable
distances from the genes they affect. Enhancer sequences control gene activation by binding with activator proteins and altering the 3-D structure of the
DNA to help "attract" RNA pol II, thus regulating transcription. Because eukaryotic DNA is tightly packaged as chromatin, transcription also requires a
number of specialized proteins that help make the template strand accessible.
In eukaryotes, the "core" promoter for a gene transcribed by pol II is most often found immediately upstream (5′) of the start site of the gene. Most pol II
genes have a TATA box (consensus sequence TATTAA) 25 to 35 bases upstream of the initiation site, which affects the transcription rate and determines
location of the start site. Eukaryotic RNA polymerases use a number of essential cofactors (collectively called general transcription factors), and one of
these, TFIID, recognizes the TATA box and ensures that the correct start site is used. Another cofactor, TFIIB, recognizes a different common consensus
sequence, G/C G/C G/C G C C C, approximately 38 to 32 bases upstream (Figure 4).

Figure 2 : The three stages of DNA transcription.


(A) The transcription process is initiated when the enzyme RNA polymerase binds to a DNA template at a promoter sequence. (B) During the elongation
process, the DNA double helix unwinds. RNA polymerase reads the template DNA strand and adds nucleotides to the three-prime (3’) end of a growing
RNA transcript. (C) When RNA polymerase reaches a termination sequence on the DNA template strand, transcription is terminated and the mRNA
transcript and RNA polymerase are released from the complex
Figure 3 : Prokaryotic transcription units.
A prokaryotic transcription unit is composed of a transcription start site (or initiation site), a -10 DNA region, and a -35 DNA region. The -10 region is
located ten nucleotides upstream of the transcription start site; the -35 region is located 35 nucleotides upstream of the transcription start site. Many
prokaryotes share a common, or similar, sequence at their -35 and -10 regions. These shared sequences are called consensus sequences.

Figure 4: Eukaryotic core promoter region. In eukaryotes, genes transcribed into RNA transcripts by the enzyme RNA polymerase II are controlled by a
core promoter. A core promoter consists of a transcription start site, a TATA box (at the -25 region), and a TFIIB recognition element (at the -35 region).
The terms "strong" and "weak" are often used to describe promoters and enhancers, according to their effects on transcription rates and thereby on  gene
expression. Alteration of promoter strength can have deleterious effects upon a cell, often resulting in disease. For example, some tumor-promoting viruses
transform healthy cells by inserting strong promoters in the vicinity of growth-stimulating genes, while translocations in some cancer cells place genes that
should be "turned off" in the proximity of strong promoters or enhancers.
Enhancer sequences do what their name suggests: They act to enhance the rate at which genes are transcribed, and their effects can be quite powerful.
Enhancers can be thousands of nucleotides away from the promoters with which they interact, but they are brought into proximity by the looping of DNA.
This looping is the result of interactions between the proteins bound to the enhancer and those bound to the promoter. The proteins that facilitate this
looping are called activators, while those that inhibit it are called repressors.

Transcription of eukaryotic genes by polymerases I and III is initiated in a similar manner, but the promoter sequences and transcriptional activator proteins
vary.

Strand Elongation
Once transcription is initiated, the DNA double helix unwinds and RNA polymerase reads the template strand, adding nucleotides to the 3′ end of the
growing chain (Figure 2b). At a temperature of 37 degrees Celsius, new nucleotides are added at an estimated rate of about 42-54 nucleotides per second
in bacteria (Dennis & Bremer, 1974), while eukaryotes proceed at a much slower pace of approximately 22-25 nucleotides per second (Izban & Luse,
1992).

Transcription Termination
Terminator sequences are found close to the ends of noncoding sequences (Figure 2c). Bacteria possess two types of these sequences. In rho-
independent terminators, inverted repeat sequences are transcribed; they can then fold back on themselves in hairpin loops, causing RNA pol to pause
and resulting in release of the transcript (Figure 5). On the other hand, rho-dependent terminators make use of a factor called rho, which actively unwinds
the DNA-RNA hybrid formed during transcription, thereby releasing the newly synthesized RNA.
In eukaryotes, termination of transcription occurs by different processes, depending upon the exact polymerase utilized. For pol I genes, transcription is
stopped using a termination factor, through a mechanism similar to rho-dependent termination in bacteria. Transcription of pol III genes ends after
transcribing a termination sequence that includes a polyuracil stretch, by a mechanism resembling rho-independent prokaryotic termination. Termination of
pol II transcripts, however, is more complex.

Transcription of pol II genes can continue for hundreds or even thousands of nucleotides beyond the end of a noncoding sequence. The RNA strand is
then cleaved by a complex that appears to associate with the polymerase. Cleavage seems to be coupled with termination of transcription and occurs at a
consensus sequence. Mature pol II mRNAs are polyadenylated at the 3′-end, resulting in a poly(A) tail; this process follows cleavage and is also
coordinated with termination.
Both polyadenylation and termination make use of the same consensus sequence, and the interdependence of the processes was demonstrated in the late
1980s by work from several groups. One group of scientists working with mouse globin genes showed that introducing mutations into the consensus
sequence AATAAA, known to be necessary for poly(A) addition, inhibited both polyadenylation and transcription termination. They measured the  extent of
termination by hybridizing transcripts with the different poly(A) consensus sequence mutants with wild-type transcripts, and they were able to see a
decrease in the signal of hybridization, suggesting that proper termination was inhibited. They therefore concluded that polyadenylation was necessary for
termination (Logan et. al., 1987). Another group obtained similar results using a monkey viral system, SV40 (simian virus 40). They introduced mutations
into a poly(A) site, which caused mRNAs to accumulate to levels far above wild type (Connelly & Manley, 1988).
The exact relationship between cleavage and termination remains to be determined. One model supposes that cleavage itself triggers termination; another
proposes that polymerase activity is affected when passing through the consensus sequence at the cleavage site, perhaps through changes in associated
transcriptional activation factors. Thus, research in the area of prokaryotic and eukaryotic transcription is still focused on unraveling the molecular details of
this complex process, data that will allow us to better understand how genes are transcribed and silenced

RNA Transcription by RNA Polymerase: Prokaryotes vs Eukaryotes


By: Suzanne Clancy, Ph.D. © 2008 Nature Education 

Citation: Clancy, S. (2008) RNA transcription by RNA polymerase: prokaryotes vs eukaryotes. Nature Education 1(1):125

Gene expression is linked to RNA transcription, which cannot happen without RNA polymerase. However, this is where
the similarities between prokaryote and eukaryote expression end.
Every nucleated, diploid cell in the body contains the same DNA, or genome, yet different cells appear committed to different specialized tasks—for
example, kidney cells absorb sodium, while pancreatic cells produce insulin. How is this possible? The answer lies in differential use of the genome; in
other words, different cells within the body express different portions of their DNA. This process, which begins with the  transcription of DNA into RNA,
ultimately leads to changes in cell function. Changes in transcription are thus a fundamental means by which cell function is regulated across  species. In
fact, even single-celled organisms, such as bacteria, regulate gene transcription depending on cues in their environments. Therefore, understanding how
transcription is regulated is fundamental to deciphering the mysteries of the genome.
Central to the process of transcription is the complex of proteins known as the RNA polymerases. RNA polymerases have been found in all species, but
the number and composition of these proteins vary across taxa. For instance, bacteria contain a single type of RNA polymerase, while eukaryotes
(multicellular organisms and yeasts) contain three distinct types. In spite of these differences, there are striking similarities among transcriptional
mechanisms. For example, all species require a mechanism by which transcription can be regulated in order to achieve spatial and temporal changes
in gene expression. In order to fully understand what this means, it is first necessary to examine the mechanisms of RNA transcription in more detail.

Transcription: An Overview
In all species, transcription begins with the binding of the RNA polymerase complex (or holoenzyme) to a special DNA sequence at the beginning of the
gene known as the promoter. Activation of the RNA polymerase complex enables transcription initiation, and this is followed by elongation of the transcript.
In turn, transcript elongation leads to clearing of the promoter, and the transcription process can begin yet again. Transcription can thus be regulated at two
levels: the promoter level (cis regulation) and the polymerase level (trans regulation). These elements differ among bacteria and eukaryotes.
Transcription in Bacteria
In bacteria, all transcription is performed by a single type of RNA polymerase. This polymerase contains four catalytic subunits and a single regulatory
subunit known as sigma (s). Interestingly, several distinct sigma factors have been identified, and each of these oversees transcription of a unique set
of genes. Sigma factors are thus discriminatory, as each binds a distinct set of promoter sequences.
A striking example of the specialization of sigma factors for different gene promoters is provided by bacterial sporulation in the species  Bacillus subtilis.
This bacterium exists in two states: vegetative (growing) and sporulating. Genes involved in spore formation are not normally expressed during vegetative
growth. Remarkably, expression of a gene encoding a novel sigma factor turns on the first genes for sporulation. Subsequent expression of different sigma
factors then turns on new sets of genes needed later in the sporulation process (Losick & Stragier, 1992). Each of these sigma factors recognizes the
promoters of the genes in its group, not those "seen" by other sigma factors. This simple example illustrates how transcription can be regulated in both cis
and trans to cause changes in cell function. Therefore, while bacteria accomplish transcription of all genes using a single kind of RNA polymerase, the use
of different sigma factor subunits provides an extra level of control.

Transcription in Eukaryotes
Eukaryotic cells are more complex than bacteria in many ways, including in terms of transcription. Specifically, in eukaryotes, transcription is achieved by
three different types of RNA polymerase (RNA pol I-III). These polymerases differ in the number and type of subunits they contain, as well as the class of
RNAs they transcribe; that is, RNA pol I transcribes ribosomal RNAs (rRNAs), RNA pol II transcribes RNAs that will become messenger RNAs (mRNAs)
and also small regulatory RNAs, and RNA pol III transcribes small RNAs such as transfer RNAs (tRNAs).

Because RNA pol II transcribes protein-encoding genes, it has been of particular importance to scientists who study the regulation of eukaryotic gene
expression, and its function is well understood. For example, researchers know that RNA pol II can bind to a DNA sequence within the promoter of many
genes, known as the TATA box, to initiate transcription. Together with other common motifs (short recognition sequences in the DNA), these elements
constitute the core promoter. However, changes in RNA pol II affinity and, therefore, gene expression can be influenced by surrounding DNA sequences
(enhancers), which in turn recruit transcription factors. While these properties of transcription regulation are very important, they remain an area of active
research.

Interestingly, RNA pol II is uniquely sensitive to amatoxins, such as a-amanitin of the extremely toxic  Amanita genus of mushrooms (Weiland, 1968), a fact
that researchers have been able to exploit for the purposes of polymerase studies - although recreational mushroom hunters should beware! Thus, while
eukaryotic transcription is far more complex than bacterial transcription, the main difference between the two types of transcription lies in RNA polymerase.

Figure 5: Rho-independent termination in bacteria.

Inverted repeat sequences at the end of a gene allow folding of the newly transcribed RNA sequence into a hairpin loop. This terminates transcription and
stimulates release of the mRNA strand from the transcription machinery.
Translation: DNA to mRNA to Protein
By: Suzanne Clancy, Ph.D. & William Brown, Ph.D. (Write Science Right) © 2008 Nature Education 

Citation: Clancy, S. & Brown, W. (2008) Translation: DNA to mRNA to Protein. Nature Education 1(1):101

How does the cell convert DNA into working proteins? The process of translation can be seen as the decoding of
instructions for making proteins, involving mRNA in transcription as well as tRNA.

The genes in DNA encode protein molecules, which are the "workhorses" of the cell, carrying out all the functions necessary for life. For example,
enzymes, including those that metabolize nutrients and synthesize new cellular constituents, as well as DNA polymerases and other enzymes that make
copies of DNA during cell division, are all proteins.
In the simplest sense, expressing a gene means manufacturing its corresponding protein, and this multilayered process has two major steps. In the first
step, the information in DNA is transferred to a messenger RNA (mRNA) molecule by way of a process called transcription. During transcription, the DNA
of a gene serves as a template for complementary base-pairing, and an enzyme called RNA polymerase II catalyzes the formation of a pre-mRNA
molecule, which is then processed to form mature mRNA (Figure 1). The resulting mRNA is a single-stranded copy of the gene, which next must be
translated into a protein molecule.

Figure 1: A gene is expressed through the processes of transcription and translation. During transcription, the enzyme RNA polymerase (green)
uses DNA as a template to produce a pre-mRNA transcript (pink). The pre-mRNA is processed to form a mature mRNA molecule that can be translated to
build the protein molecule (polypeptide) encoded by the original gene.

During translation, which is the second major step in gene expression, the mRNA is "read" according to the genetic code, which relates the DNA sequence
to the amino acid sequence in proteins (Figure 2). Each group of three bases in mRNA constitutes a codon, and each codon specifies a particular amino
acid (hence, it is a triplet code). The mRNA sequence is thus used as a template to assemble—in order—the chain of amino acids that form a protein.
Figure 2: The amino acids specified by each mRNA codon. Multiple codons can code for the same amino acid. The codons are written 5' to 3', as
they appear in the mRNA. AUG is an initiation codon; UAA, UAG, and UGA are termination (stop) codons.

But where does translation take place within a cell? What individual substeps are a part of this process? And does translation differ between prokaryotes
and eukaryotes? The answers to questions such as these reveal a great deal about the essential similarities between all species.

Where Translation Occurs


Within all cells, the translation machinery resides within a specialized organelle called the  ribosome. In eukaryotes, mature mRNA molecules must leave
the nucleus and travel to the cytoplasm, where the ribosomes are located. On the other hand, in prokaryotic organisms, ribosomes can attach to mRNA
while it is still being transcribed. In this situation, translation begins at the 5' end of the mRNA while the 3' end is still attached to DNA.
In all types of cells, the ribosome is composed of two subunits: the large (50S) subunit and the small (30S) subunit (S, for svedberg unit, is a measure of
sedimentation velocity and, therefore, mass). Each subunit exists separately in the cytoplasm, but the two join together on the mRNA molecule. The
ribosomal subunits contain proteins and specialized RNA molecules—specifically, ribosomal RNA (rRNA) and transfer RNA (tRNA). The tRNA molecules
are adaptor molecules—they have one end that can read the triplet code in the mRNA through complementary base-pairing, and another end that attaches
to a specific amino acid (Chapeville et al., 1962; Grunberger et al., 1969). The idea that tRNA was an adaptor molecule was first proposed by Francis
Crick, co-discoverer of DNA structure, who did much of the key work in deciphering the genetic code (Crick, 1958).
Within the ribosome, the mRNA and aminoacyl-tRNA complexes are held together closely, which facilitates base-pairing. The rRNA catalyzes the
attachment of each new amino acid to the growing chain.

The Beginning of mRNA Is Not Translated


Interestingly, not all regions of an mRNA molecule correspond to particular amino acids. In particular, there is an area near the 5' end of the molecule that
is known as the untranslated region (UTR) or leader sequence. This portion of mRNA is located between the first nucleotide that is transcribed and the start
codon (AUG) of the coding region, and it does not affect the sequence of amino acids in a protein (Figure 3).
So, what is the purpose of the UTR? It turns out that the leader sequence is important because it contains a ribosome-binding site. In  bacteria, this site is
known as the Shine-Dalgarno box (AGGAGG), after scientists John Shine and Lynn Dalgarno, who first characterized it. A similar site in vertebrates was
characterized by Marilyn Kozak and is thus known as the Kozak box. In bacterial mRNA, the 5' UTR is normally short; in human mRNA, the median length
of the 5' UTR is about 170 nucleotides. If the leader is long, it may contain regulatory sequences, including binding sites for proteins, that can affect
the stability of the mRNA or the efficiency of its translation.
Figure 3: A DNA transcription unit. A DNA transcription unit is composed, from its 3' to 5' end, of an RNA-coding region (pink rectangle) flanked by a
promoter region (green rectangle) and a terminator region (black rectangle). Regions to the left, or moving towards the 3' end, of the transcription start site
are considered \"upstream;\" regions to the right, or moving towards the 5' end, of the transcription start site are considered \"downstream.\"

Translation Begins After the Assembly of a Complex Structure


The translation of mRNA begins with the formation of a complex on the mRNA (Figure 4). First, three initiation factor proteins (known as IF1, IF2, and IF3)
bind to the small subunit of the ribosome. This preinitiation complex and a methionine-carrying tRNA then bind to the mRNA, near the AUG start codon,
forming the initiation complex.

Figure 4: The translation initiation complex. When translation begins, the small subunit of the ribosome and an initiator tRNA molecule assemble on the
mRNA transcript. The small subunit of the ribosome has three binding sites: an amino acid site (A), a polypeptide site (P), and an exit site (E). The initiator
tRNA molecule carrying the amino acid methionine binds to the AUG start codon of the mRNA transcript at the ribosome’s P site where it will become the
first amino acid incorporated into the growing polypeptide chain. Here, the initiator tRNA molecule is shown binding after the small ribosomal subunit has
assembled on the mRNA; the order in which this occurs is unique to prokaryotic cells. In eukaryotes, the free initiator tRNA first binds the small ribosomal
subunit to form a complex. The complex then binds the mRNA transcript, so that the tRNA and the small ribosomal subunit bind the mRNA simultaneously.
Although methionine (Met) is the first amino acid incorporated into any new protein, it is not always the first amino acid in mature proteins—in many
proteins, methionine is removed after translation. In fact, if a large number of proteins are sequenced and compared with their known gene sequences,
methionine (or formylmethionine) occurs at the N-terminus of all of them. However, not all amino acids are equally likely to occur second in the chain, and
the second amino acid influences whether the initial methionine is enzymatically removed. For example, many proteins begin with methionine followed by
alanine. In both prokaryotes and eukaryotes, these proteins have the methionine removed, so that alanine becomes the  N-terminal amino acid (Table 1).
However, if the second amino acid is lysine, which is also frequently the case, methionine is not removed (at least in the sample proteins that have been
studied thus far). These proteins therefore begin with methionine followed by lysine (Flinta et al., 1986).

Table 1 shows the N-terminal sequences of proteins in prokaryotes and eukaryotes, based on a sample of 170 prokaryotic and 120  eukaryotic proteins
(Flinta et al., 1986). In the table, M represents methionine, A represents alanine, K represents lysine, S represents serine, and T represents threonine.
Table 1: N-Terminal Sequences of Proteins
N-Terminal Percent of Prokaryotic ProteinsPercent of Eukaryotic Proteins
Sequence with This Sequence with This Sequence

MA* 28.24% 19.17%

MK** 10.59% 2.50%

MS* 9.41% 11.67%

MT* 7.65% 6.67%

* Methionine was removed in all of these proteins

** Methionine was not removed from any of these proteins

Once the initiation complex is formed on the mRNA, the large ribosomal subunit binds to this complex, which causes the release of IFs (initiation factors).
The large subunit of the ribosome has three sites at which tRNA molecules can bind. The A (amino acid) site is the location at which the aminoacyl-
tRNA anticodon base pairs up with the mRNA codon, ensuring that correct amino acid is added to the growing  polypeptide chain. The P (polypeptide) site
is the location at which the amino acid is transferred from its tRNA to the growing polypeptide chain. Finally, the E (exit) site is the location at which the
"empty" tRNA sits before being released back into the cytoplasm to bind another amino acid and repeat the process. The initiator methionine tRNA is the
only aminoacyl-tRNA that can bind in the P site of the ribosome, and the A site is aligned with the second mRNA codon. The ribosome is thus ready to bind
the second aminoacyl-tRNA at the A site, which will be joined to the initiator methionine by the first peptide bond (Figure 5).

Figure 5: The large ribosomal subunit binds to the small ribosomal subunit to complete the initiation complex. The initiator tRNA molecule,
carrying the methionine amino acid that will serve as the first amino acid of the polypeptide chain, is bound to the P site on the ribosome. The A site is
aligned with the next codon, which will be bound by the anticodon of the next incoming tRNA.

The Elongation Phase


The next phase in translation is known as the elongation phase (Figure 6). First, the ribosome moves along the mRNA in the 5'-to-3'direction, which
requires the elongation factor G, in a process called translocation. The tRNA that corresponds to the second codon can then bind to the A site, a step that
requires elongation factors (in E. coli, these are called EF-Tu and EF-Ts), as well as guanosine triphosphate (GTP) as an energy source for the process.
Upon binding of the tRNA-amino acid complex in the A site, GTP is cleaved to form guanosine diphosphate (GDP), then released along with EF-Tu to be
recycled by EF-Ts for the next round.

Next, peptide bonds between the now-adjacent first and second amino acids are formed through a  peptidyl transferase activity. For many years, it was
thought that an enzyme catalyzed this step, but recent evidence indicates that the transferase activity is a catalytic function of rRNA (Pierce, 2000). After
the peptide bond is formed, the ribosome shifts, or translocates, again, thus causing the tRNA to occupy the E site. The tRNA is then released to the
cytoplasm to pick up another amino acid. In addition, the A site is now empty and ready to receive the tRNA for the next codon.
This process is repeated until all the codons in the mRNA have been read by tRNA molecules, and the amino acids attached to the tRNAs have been
linked together in the growing polypeptide chain in the appropriate order. At this point, translation must be terminated, and the nascent protein must be
released from the mRNA and ribosome.
Termination of Translation
There are three termination codons that are employed at the end of a protein-coding sequence in mRNA: UAA, UAG, and UGA. No tRNAs recognize these
codons. Thus, in the place of these tRNAs, one of several proteins, called release factors, binds and facilitates release of the mRNA from the ribosome and
subsequent dissociation of the ribosome.

Comparing Eukaryotic and Prokaryotic Translation


The translation process is very similar in prokaryotes and eukaryotes. Although different elongation, initiation, and termination factors are used, the genetic
code is generally identical. As previously noted, in bacteria, transcription and translation take place simultaneously, and mRNAs are relatively short-lived. In
eukaryotes, however, mRNAs have highly variable half-lives, are subject to modifications, and must exit the nucleus to be translated; these multiple steps
offer additional opportunities to regulate levels of protein production, and thereby fine-tune gene expression.

Figure 6: The elongation phase. At the beginning of elongation, an initiator tRNA molecule occupies the P site of a ribosome assembled on the mRNA
transcript. This initiator tRNA carries the amino acid formylmethionine. The ribosome's A site is open and ready to receive a second, incoming tRNA
molecule. The amino acid bound to the tRNA that occupies the P site is added to the amino acid bound to the tRNA that occupies the A site, forming a
growing peptide chain. As the ribosome moves from one codon to the next along the mRNA molecule, the tRNA molecule that occupies the A site is shifted
to the P site. The A site therefore cycles between occupied and exposed states, and is able to receive the incoming tRNA molecule that corresponds to
each sequential mRNA codon. The growing peptide chain is continuously transferred to the amino acid associated with the tRNA molecule located at the A
site.

Major Molecular Events of DNA Replication


By: Leslie A. Pray, Ph.D. © 2008 Nature Education 

Citation: Pray, L. (2008) Major molecular events of DNA replication. Nature Education 1(1):99

Arthur Kornberg compared DNA to a tape recording of instructions that can be copied over and over. How do cells
make these near-perfect copies, and does the process ever vary?
Scientists have devoted decades of effort to understanding how deoxyribonucleic acid (DNA) replicates itself. In simple terms, replication involves use of an
existing strand of DNA as a template for the synthesis of a new, identical strand. American enzymologist and Nobel Prize winner Arthur Kornberg
compared this process to a tape recording of instructions for performing a task: "[E]xact copies can be made from it, as from a tape recording, so that this
information can be used again and elsewhere in time and space" (Kornberg, 1960).
In reality, the process of replication is far more complex than suggested by Kornberg's analogy. Researchers typically utilize simple bacterial cells in their
experiments, but they still do not have all the answers, particularly when it comes to eukaryotic replication. Nonetheless, scientists are familiar with the
basic steps in the replication process, and they continue to rely on this information as the basis for continued research and experimentation.

The Molecular Machinery of Bacterial DNA Replication


A typical bacterial cell has anywhere from about 1 million to 4 million base pairs of DNA, compared to the 3 billion base pairs in the  genome of the common
house mouse (Mus musculus). Still, even in bacteria, with their smaller genomes, DNA replication involves an incredibly sophisticated, highly coordinated
series of molecular events. These events are divided into four major stages: initiation, unwinding, primer synthesis, and elongation.
Initiation and Unwinding
During initiation, so-called initiator proteins bind to the replication origin, a base-pair sequence of nucleotides known as oriC. This binding triggers events
that unwind the DNA double helix into two single-stranded DNA molecules. Several groups of proteins are involved in this unwinding (Figure 1). For
example, the DNA helicases are responsible for breaking the hydrogen bonds that join the complementary nucleotide bases to each other; these hydrogen
bonds are an essential feature of James Watson and Francis Crick's three-dimensional DNA  model. Because the newly unwound single strands have a
tendency to rejoin, another group of proteins, the single-strand-binding proteins, keep the single strands stable until elongation begins. A third family of
proteins, the topoisomerases, reduce some of the torsional strain caused by the unwinding of the double helix.

Figure 1: Facilitation of DNA unwinding. During DNA replication, several proteins facilitate the unwinding of the DNA double helix into two single
strands. Topoisomerases (red) reduce torsional strain caused by the unwinding of the DNA double helix; DNA helicase (yellow) breaks hydrogen bonds
between complementary base-pairs; single-strand binding proteins (SSBs) stabilize the separated strands and prevent them from rejoining.

As previously mentioned, the location at which a DNA strand begins to unwind into two separate single strands is known as the  origin of replication. As
shown in Figure 1, when the double helix unwinds, replication proceeds along the two single strands at the same time but in opposite directions (i.e., left to
right on one strand, and right to left on the other). This forms two replication forks that move along the DNA, replicating as they go.
Primer Synthesis
Primer synthesis marks the beginning of the actual synthesis of the new DNA  molecule. Primers are short stretches of nucleotides (about 10 to 12 bases in
length) synthesized by an RNA polymerase enzyme called primase. Primers are required because DNA polymerases, the enzymes responsible for the
actual addition of nucleotides to the new DNA strand, can only add deoxyribonucleotides to the 3'-OH group of an existing chain and cannot begin
synthesis de novo. Primase, on the other hand, can add ribonucleotides de novo. Later, after elongation is complete, the primer is removed and replaced
with DNA nucleotides.

Elongation
Finally, elongation--the addition of nucleotides to the new DNA strand--begins after the primer has been added. Synthesis of the growing strand involves
adding nucleotides, one by one, in the exact order specified by the original (template) strand. Recall that one of the key features of the Watson-Crick DNA
model is that adenine is always paired with thymine and cytosine is always paired with guanine. So, for example, if the original strand reads A-G-C-T, the
new strand will read T-C-G-A.
DNA is always synthesized in the 5'-to-3' direction, meaning that nucleotides are added only to the  3' end of the growing strand. As shown in Figure 2, the
5'-phosphate group of the new nucleotide binds to the 3'-OH group of the last nucleotide of the growing strand. Scientists have yet to identify a polymerase
that can add bases to the 5' ends of DNA strands.

Figure 2: New DNA is synthesized from deoxyribonucleoside triphosphates (dNTPs). (A) A deoxyribonucleoside triphosphate (dNTP). (B) During
DNA replication, the 3'-OH group of the last nucleotide on the new strand attacks the 5'-phosphate group of the incoming dNTP. Two phosphates are
cleaved off. (C) A phosphodiester bond forms between the two nucleotides, and phosphate ions are released.

The Discovery of DNA Polymerase


While studying E. coli bacteria, enzymologist Arthur Kornberg discovered that DNA polymerases catalyze DNA synthesis. Kornberg's experiment involved
mixing all of the basic "ingredients" necessary for E. coli DNA synthesis in a test tube, including nucleotides, E. coli extract, and ATP, and then purifying
and testing the enzymes involved. Using this method, Kornberg not only discovered DNA polymerases, but he also performed some of the initial work
demonstrating how enzymes add new nucleotides to growing DNA chains (Kornberg, 1959).
Scientists have since identified a total of five different DNA polymerases in E. coli, each with a specialized role. For example, DNA polymerase III does
most of the elongation work, adding nucleotides one by one to the 3' end of the new and growing single strand. Other enzymes, including  DNA polymerase
I and RNase H, are responsible for removing the RNA primer after DNA polymerase III has begun its work, replacing it with DNA nucleotides (Ogawa &
Okazaki, 1984). When these enzymes finish, they leave a nick between the section of DNA that was formerly the primer and the elongated section of DNA.
Another enzyme called DNA ligase then acts to seal the bond between the two adjacent nucleotides.

DNA Polymerase Only Moves in One Direction


After a primer is synthesized on a strand of DNA and the DNA strands unwind, synthesis and elongation can proceed in only one direction. As previously
mentioned, DNA polymerase can only add to the 3' end, so the 5' end of the primer remains unaltered. Consequently, synthesis proceeds immediately only
along the so-called leading strand. This immediate replication is known as continuous replication. The other strand (in the 5' direction from the primer) is
called the lagging strand, and replication along it is called discontinuous replication. The double helix has to unwind a bit before the synthesis of another
primer can be initiated further up on the lagging strand. Synthesis can then occur from the 3' end of that new primer. Next, the double helix unwinds a bit
more, and another spurt of replication proceeds. As a result, replication along the lagging strand can only proceed in short, discontinuous spurts (Figure 3).
Figure 3: Replication of the leading DNA strand is continuous, while replication along the lagging strand is discontinuous. After a short length of
the DNA has been unwound, synthesis must proceed in the 5' to 3' direction; that is, in the direction opposite that of the unwinding.

The fragments of newly synthesized DNA along the lagging strand are called Okazaki fragments, named in honor of their discoverer, Japanese molecular
biologist Reiji Okazaki. Okazaki and his colleagues made their discovery by conducting what is known as a pulse-chase experiment, which involved
exposing replicating DNA to a short "pulse" of isotope-labeled nucleotides and then varying the length of time that the cells would be exposed to
nonlabeled nucleotides. This later period is called the "chase" (Okazaki et al., 1968). The labeled nucleotides were incorporated into growing DNA
molecules only during the initial few seconds of the pulse; thereafter, only nonlabeled nucleotides were incorporated during the chase. The scientists then
centrifuged the newly synthesized DNA and observed that the shorter chases resulted in most of the radioactivity appearing in "slow" DNA. The
sedimentation rate was determined by size: smaller fragments precipitated more slowly than larger fragments because of their lighter weight. As the
investigators increased the length of the chases, radioactivity in the "fast" DNA increased with little or no increase of radioactivity in the slow DNA. The
researchers correctly interpreted these observations to mean that, with short chases, only very small fragments of DNA were being synthesized along the
lagging strand. As the chases increased in length, giving DNA more time to replicate, the lagging strand fragments started integrating into longer, heavier,
more rapidly sedimenting DNA strands. Today, scientists know that the Okazaki fragments of bacterial DNA are typically between 1,000 and 2,000
nucleotides long, whereas in eukaryotic cells, they are only about 100 to 200 nucleotides long.

The Challenges of Eukaryotic Replication


Bacterial and eukaryotic cells share many of the same basic features of replication; for instance, initiation requires a primer, elongation is always in the 5'-
to-3' direction, and replication is always continuous along the leading strand and discontinuous along the lagging strand. But there are also important
differences between bacterial and eukaryotic replication, some of which biologists are still actively researching in an effort to better understand the
molecular details. One difference is that eukaryotic replication is characterized by many replication origins (often thousands), not just one, and the
sequences of the replication origins vary widely among species. On the other hand, while the replication origins for bacteria, oriC, vary in length (from about
200 to 1,000 base pairs) and sequence, except among closely related organisms, all bacteria nonetheless have just a single replication origin
(Mackiewicz et al., 2004).
Eukaryotic replication also utilizes a different set of DNA polymerase enzymes (e.g., DNA polymerase δ and DNA polymerase ε instead of DNA
polymerase III). Scientists are still studying the roles of the 13 eukaryotic polymerases discovered to date. In addition, in eukaryotes,  the DNA template is
compacted by the way it winds around proteins called  histones. This DNA-histone complex, called a nucleosome, poses a unique challenge both for the
cell and for scientists investigating the molecular details of eukaryotic replication. What happens to nucleosomes during DNA replication? Scientists know
from electron micrograph studies that nucleosome reassembly happens very quickly after replication (the reassembled nucleosomes are visible in the
electron micrograph images), but they still do not know how this happens (Annunziato, 2005).
Also, whereas bacterial chromosomes are circular, eukaryotic chromosomes are linear. During circular DNA replication, the excised primer is readily
replaced by nucleotides, leaving no gap in the newly synthesized DNA. In contrast, in linear DNA replication, there is always a small gap left at the very end
of the chromosome because of the lack of a 3'-OH group for replacement nucleotides to bind. (As mentioned, DNA synthesis can proceed only in the 5'-to-
3' direction.) If there were no way to fill this gap, the DNA molecule would get shorter and shorter with every generation. However, the ends of linear
chromosomes—the telomeres—have several properties that prevent this.
DNA replication occurs during the S phase of cell division. In E. coli, this means that the entire genome is replicated in just 40 minutes, at a pace of
approximately 1,000 nucleotides per second. In eukaryotes, the pace is much slower: about 40 nucleotides per second. The coordination of
the proteincomplexes required for the steps of replication and the speed at which replication must occur in order for cells to divide are impressive,
especially considering that enzymes are also proofreading, which leaves very few errors behind.

Summary
The study of DNA replication started almost as soon as the structure of DNA was elucidated, and it continues to this day. Currently, the stages of initiation,
unwinding, primer synthesis, and elongation are understood in the most basic sense, but many questions remain unanswered, particularly when it comes to
replication of the eukaryotic genome. Scientists have devoted decades to the study of replication, and researchers such as Kornberg and Okazaki have
made a number of important breakthroughs. Nonetheless, much remains to be learned about replication, including how errors in this process contribute to
human disease.

Semi-Conservative DNA Replication: Meselson and Stahl


By: Leslie A. Pray, Ph.D. © 2008 Nature Education 

Citation: Pray, L. (2008) Semi-conservative DNA replication: Meselson and Stahl. Nature Education 1(1):98

Watson and Crick's discovery of DNA structure in 1953 revealed a possible mechanism for DNA replication. So why
didn't Meselson and Stahl finally explain this mechanism until 1958?
This structure has novel features which are of considerable biological interest . . . It has not escaped our notice that the specific pairing we have postulated
immediately suggests a possible copying mechanism for the genetic material.
—Watson & Crick (1953)
Perhaps the most significant aspect of Watson and Crick's discovery of DNA structure was not that it provided scientists with a three-dimensional model of
this molecule, but rather that this structure seemed to reveal the way in which DNA was replicated. As noted in their 1953 paper, Watson and Crick strongly
suspected that the specific base pairings within the DNA double helix existed in order to ensure a controlled system of DNA replication. However, it took
several years of subsequent study, including a classic 1958 experiment by American geneticists Matthew Meselson and Franklin Stahl, before the exact
relationship between DNA structure and replication was understood.

Three Proposed Models for DNA Replication


Replication is the process by which a cell copies its DNA prior to division. In humans, for example, each parent cell must copy its entire six billion base
pairs of DNA before undergoing mitosis. The molecular details of DNA replication are described elsewhere, and they were not known until some time after
Watson and Crick's discovery. In fact, before such details could be determined, scientists were faced with a more fundamental research concern.
Specifically, they wanted to know the overall nature of the process by which DNA replication occurs.
Defining the Models

Figure 1 : Three models of DNA replication. According to the conservative model of DNA replication, the original double-stranded DNA molecule serves
as the complete template for a new DNA molecule. In the dispersive replication model, the original DNA molecule breaks into fragments and the fragments
serve as templates for new DNA fragments. In semiconservative replication, the two strands of the original DNA molecule separate, and each strand
serves as a template for a new DNA strand. Each model predicts a different distribution of original DNA and new DNA in the DNA molecules produced after
the first and second rounds of replication.

As previously mentioned, Watson and Crick themselves had specific ideas about DNA replication, and these ideas were based on the structure of the DNA
molecule. In particular, the duo hypothesized that replication occurs in a "semiconservative" fashion. According to the  semiconservative replication model,
which is illustrated in Figure 1, the two original DNA strands (i.e., the two  complementary halves of the double helix) separate during replication; each
strand then serves as a template for a new DNA strand, which means that each newly synthesized double helix is a combination of one old (or original) and
one new DNA strand. Conceptually, semiconservative replication made sense in light of the double helix structural model of DNA, in particular its
complementary nature and the fact that adenine always pairs with thymine and cytosine always pairs with guanine. Looking at this model, it is easy to
imagine that during replication, each strand serves as a template for the synthesis of a new strand, with complementary bases being added in the order
indicated.

Semiconservative replication was not the only model of DNA replication proposed during the mid-1950s, however. In fact, two other prominent hypotheses
were put also forth: conservative replication and dispersive replication. According to the conservative replication model, the entire original DNA double helix
serves as a template for a new double helix, such that each round of cell division produces one daughter cell with a completely new DNA double helix and
another daughter cell with a completely intact old (or original) DNA double helix. On the other hand, in the dispersive replication model, the original DNA
double helix breaks apart into fragments, and each fragment then serves as a template for a new DNA fragment. As a result, every cell division produces
two cells with varying amounts of old and new DNA (Figure 1).
Making Predictions Based on the Models
When these three models were first proposed, scientists had few clues about what might be occurring at the molecular level during DNA replication.
Fortunately, the models yielded different predictions about the distribution of old versus new DNA in newly divided cells, no matter what the underlying
molecular mechanisms. These predictions were as follows:

 According to the semiconservative model, after one round of replication, every new DNA double helix would be a  hybrid that consisted of one
strand of old DNA bound to one strand of newly synthesized DNA. Then, during the second round of replication, the hybrids would separate, and
each strand would pair with a newly synthesized strand. Afterward, only half of the new DNA double helices would be hybrids; the other half
would be completely new. Every subsequent round of replication therefore would result in fewer hybrids and more completely new double helices.
 According to the conservative model, after one round of replication, half of the new DNA double helices would be composed of completely old, or
original, DNA, and the other half would be completely new. Then, during the second round of replication, each double helix would be copied in its
entirety. Afterward, one-quarter of the double helices would be completely old, and three-quarters would be completely new. Thus, each
subsequent round of replication would result in a greater proportion of completely new DNA double helices, while the number of completely
original DNA double helices would remain constant.
 According to the dispersive model, every round of replication would result in hybrids, or DNA double helices that are part original DNA and part
new DNA. Each subsequent round of replication would then produce double helices with greater amounts of new DNA.
Meselson and Stahl’s Elegant Experiment

Figure 2 : Meselson and Stahl’s experiment supporting the semi-conservative model of DNA replication.
In lab experiments, Meselson and Stahl used isotope labels to differentiate new DNA from original DNA in E.coli cultures. First, they grew several
generations of E.coli in a growth medium that contained only one species of nitrogen: 15N, which the E.coli cells incorporated into their DNA. Next,
Meselson and Stahl transferred the E.coli cells into a new medium that contained a different species of nitrogen: the less-dense 14N. DNA synthesized after
the culture was transferred to the new growth medium was composed of 14N as opposed to 15N; thus, Meselson and Stahl could determine the distribution
of original DNA (containing 15N) and new DNA (containing 14N) after replication. Because the two nitrogen species have different densities, and appear at
different positions in a density gradient, they could be differentiated in E.coli extracts. The distribution of original DNA and new DNA after each round of
replication was consistent with a semiconservative model of replication.

Matthew Meselson and Franklin Stahl were well acquainted with these three predictions, and they reasoned that if there were a way to distinguish old
versus new DNA, it should be possible to test each prediction. Aware of previous studies that had relied on isotope labels as a way to differentiate between
parental and progeny molecules, the scientists decided to see whether the same technique could be used to differentiate between parental and progeny
DNA. If it could, Meselson and Stahl were hopeful that they would be able to determine which prediction and replication model was correct.

The duo thus began their experiment by choosing two isotopes of nitrogen—the common and lighter 14N, and the rare and heavier 15N (so-called "heavy"
nitrogen)—as their labels and a technique known as cesium chloride (CsCl) equilibrium density gradient centrifugation as their sedimentation method.
Meselson and Stahl opted for nitrogen because it is an essential chemical component of DNA; therefore, every time a cell divides and its DNA replicates, it
incorporates new N atoms into the DNA of either one or both of its two daughter cells, depending on which model was correct. "If several different
density species of DNA are present," they predicted, "each will form a band at the position where the density of the CsCl solution is equal to the buoyant
density of that species. In this way, DNA labeled with heavy nitrogen ( 15N) may be resolved from unlabeled DNA" (Meselson & Stahl, 1958).
The scientists then continued their experiment by growing a culture of E. coli bacteria in a medium that had the heavier 15N (in the form of 15N-labeled
ammonium chloride) as its only source of nitrogen. In fact, they did this for 14 bacterial generations, which was long enough to create a  population of
bacterial cells that contained only the heavier isotope (all the original 14N-containing cells had died by then). Next, they changed the medium to one
containing only 14N-labeled ammonium salts as the sole nitrogen source. So, from that point onward, every new strand of DNA would be built with 14N rather
than 15N.
Just prior to the addition of 14N and periodically thereafter, as the bacterial cells grew and replicated, Meselson and Stahl sampled DNA for use
in equilibrium density gradient centrifugation to determine how much 15N (from the original or old DNA) versus 14N (from the new DNA) was present. For the
centrifugation procedure, they mixed the DNA samples with a solution of cesium chloride and then centrifuged the samples for enough time to allow the
heavier 15N and lighter 14N DNA to migrate to different positions in the centrifuge tube.
By way of centrifugation, the scientists found that DNA composed entirely of 15N-labeled DNA (i.e., DNA collected just prior to changing the culture from
one containing only 15N to one containing only 14N) formed a single distinct band, because both of its strands were made entirely in the "heavy" nitrogen
medium. Following a single round of replication, the DNA again formed a single distinct band, but the band was located in a different position along the
centrifugation gradient. Specifically, it was found midway between where all the 15N and all the 14N DNA would have migrated—in other words, halfway
between "heavy" and "light" (Figure 2). Based on these findings, the scientists were immediately able to exclude the conservative model of replication as a
possibility. After all, if DNA replicated conservatively, there should have been two distinct bands after a single round of replication; half of the new DNA
would have migrated to the same position as it did before the culture was transferred to the  14N-containing medium (i.e., to the "heavy" position), and only
the other half would have migrated to the new position (i.e., to the "light" position). That left the scientists with only two options: either DNA replicated
semiconservatively, as Watson and Crick had predicted, or it replicated dispersively.
To differentiate between the two, Meselson and Stahl had to let the cells divide again and then  sample the DNA after a second round of replication. After
that second round of replication, the scientists found that the DNA separated into two distinct bands: one in a position where DNA containing only  14N would
be expected to migrate, and the other in a position where hybrid DNA (containing half 14N and half 15N) would be expected to migrate. The scientists
continued to observe the same two bands after several subsequent rounds of replication. These results were consistent with the semiconservative model of
replication and the reality that, when DNA replicated, each new double helix was built with one old strand and one new strand. If the dispersive model were
the correct model, the scientists would have continued to observe only a single band after every round of replication.

Straight or Circular?
Following publication of Meselson and Stahl's results, many scientists confirmed that semiconservative replication was the rule, not just in  E. coli, but in
every other species studied as well. To date, no one has found any evidence for either conservative or dispersive DNA replication. Scientists have found,
however, that semiconservative replication can occur in different ways—for example, it may proceed in either a circular or a  linear fashion, depending
on chromosome shape.
In fact, in the early 1960s, English molecular biologist John Cairns performed another remarkably elegant experiment to demonstrate that E. coli and other
bacteria with circular chromosomes undergo what he termed "theta replication," because the structure generated resembles the Greek letter theta (Θ).
Specifically, Cairns grew E. coli bacteria in the presence of radioactive nucleotides such that, after replication, each new DNA molecule had one radioactive
(hot) strand and one nonradioactive strand. He then isolated the newly replicated DNA and used it to produce an electron micrograph image of the Θ-
shaped replication process (Figure 3; Cairns, 1961).
But how does theta replication work? It turns out that this process results from the original double-stranded DNA unwinding at a single spot on the
chromosome known as the replication origin. As the double helix unwinds, it creates a loop known as the replication bubble, with each newly separated
single strand serving as a template for DNA synthesis. Replication occurs as the double helix unwinds.
Eukaryotes undergo linear, not circular, replication. As with theta replication, as the double helix unwinds, each newly separated single strand serves as a
template for DNA synthesis. However, unlike bacterial replication, because eukaryotic cells carry vastly more DNA than bacteria do (for example, the
common house [and laboratory] mouse Mus musculus has about three billion base pairs of DNA, compared to a bacterial cell's one to four million base
pairs), eukaryotic chromosomes have multiple replication origins, with multiple replication bubbles forming. For example, M. musculus has as many as
25,000 replication origins, whereas the smaller-genomed fruit fly (Drosophila melanogaster), with its approximately 120 million base pairs of DNA, has only
about 3,500 replication origins.
Thus, the discovery of the structure of DNA in 1953 was only the beginning. When Watson and Crick postulated that form predicts  function, they provided
the scientific community with a challenge to determine exactly how DNA functioned in the cell, including how this molecule was replicated. The work of
Meselson and Stahl demonstrates how elegant experiments can distinguish between different hypotheses. Understanding that replication occurs
semiconservatively was just the beginning to understanding the key enzymatic events responsible for the physical copying of the genome.

Chemical Structure of RNA


By: Suzanne Clancy, Ph.D. © 2008 Nature Education 

Citation: Clancy, S. (2008) Chemical structure of RNA. Nature Education 7(1):60

The more researchers examine RNA, the more surprises they continue to uncover. What have we learned about RNA
structure and function so far?
Figure 1 : The chemical structures of DNA (left) and RNA (right). In double-stranded DNA, two hydrogen bonds connect the nucleotides thymine (T) to
adenine (A); three hydrogen bonds connect the nucleotides guanine (G) to cytosine (C). The sugar-phosphate backbones (grey) run anti-parallel to each
other, so that the 3’ and 5’ ends of the two strands are aligned. RNA is usually single-stranded. In DNA, the sugar that composes the sugar-phosphate
backbone is deoxyribose; in RNA, the sugar is ribose. The nucleotide thymine (T) is not present in RNA: rather than binding with thymine, the nitrogenous
base adenine (A) base-pairs with the nitrogenous base uracil (U) in RNA.

With the discovery of the molecular structure of the DNA double helix in 1953, researchers turned to the structure of ribonucleic acid (RNA) as the next
critical puzzle to be solved on the road to understanding the molecular basis of life. Indeed, RNA may be the only  molecule to have inspired the formation
of a club, known as the RNA Tie Club, whose members included Nobel Laureates James Watson and Francis Crick, the discoverers of DNA structure, as
well as Sydney Brenner, who was awarded the Nobel Prize in 2002 for his work involving gene regulation in the model organismCaenorhabditis elegans.
The members of this club, each nicknamed for a particular amino acid, exchanged letters in which they presented various unpublished ideas in an attempt
to understand the structure of RNA and how this molecule participates in the building of proteins. During the following 50 years, many questions were
answered and many surprises were uncovered.
Early Discoveries of RNA Structure

Figure 2 : The primary and secondary structures of RNA. (A) A region of single-stranded RNA. The sugar component of the sugar-phosphate backbone
is ribose: a five carbon sugar with a hydroxyl group on its 2’ carbon atom. The sugar component of the sugar-phosphate backbone in DNA is deoxyribose:
a five carbon sugar with a hydrogen atom, instead of a hydroxyl group, on its 2’ carbon atom. Additionally, RNA does not contain the nitrogenous base
thymine (T), found in DNA. Instead, the nitrogenous base uracil (U), not found in DNA, binds with the base adenine. (B) The primary and secondary
structures of RNA. The primary structure refers to the molecule’s nucleotide sequence; the secondary structure refers to its three-dimensional conformation
after folding has occurred.

Today, researchers know that cells contain a variety of forms of RNA—including messenger RNA (mRNA), transfer RNA (tRNA), and ribosomal
RNA (rRNA)—and each form is involved in different functions and activities. Messenger RNA is essentially a copy of a section of DNA and serves as a
template for the manufacture of one or more proteins. Transfer RNA binds to both mRNA and amino acids (the building blocks of proteins) and brings the
correct amino acids into the growing polypeptide chain during protein formation, based on the nucleotide sequence of the mRNA. The process by which
proteins are built is called translation. Translation occurs on ribosomes, which are cellular organelles composed of protein and rRNA.

Although there are multiple types of RNA molecules, the basic structure of all RNA is similar. Each kind of RNA is a polymeric molecule made by stringing
together individual ribonucleotides, always by adding the 5'-phosphate group of one nucleotide onto the 3'-hydroxyl group of the previous nucleotide. Like
DNA, each RNA strand has the same basic structure, composed of nitrogenous bases covalently bound to a sugar-phosphate backbone (Figure 1).
However, unlike DNA, RNA is usually a single-stranded molecule. Also, the sugar in RNA is ribose instead of deoxyribose (ribose contains one more
hydroxyl group on the second carbon), which accounts for the molecule's name. RNA consists of four nitrogenous bases:  adenine, cytosine, uracil,
and guanine. Uracil is a pyrimidine that is structurally similar to the thymine, another pyrimidine that is found in DNA. Like thymine, uracil can base-pair with
adenine (Figure 2).

Figure 3 : Maturation and architecture of tRNA. he global organization of the tRNA structure is highly conserved in all forms of life. Prominent features of
the tRNA structure, as shown in the figure, include: an acceptor stem with the CCA trinucleotide at the 3' end (the site of aminoacylation and trans-
peptidation reactions in the protein-biosynthesis cycle); a D loop (named after a tandem dihydrouridine modification, labelled "D" in the figure, which is
commonly found in this loop); an anticodon loop that includes the anticodon, which is a nucleotide triplet responsible for recognition of a coding triplet in
mRNA, and the T loop, named after the highly conserved triplet. Filled circles represent nucleotides that are not usually modified; open circles represent
commonly modified nucleotides other than D, T and psi.

Although RNA is a single-stranded molecule, researchers soon discovered that it can form double-stranded structures, which are important to its  function.
In 1956, Alexander Rich—an X-ray crystallographer and member of the RNA Tie Club—and David Davies, both working at the National Institutes of Health,
discovered that single strands of RNA can "hybridize," sticking together to form a double-stranded molecule (Rich & Davies, 1956). Later, in 1960, the
discovery that an RNA molecule and a DNA molecule could form a hybrid double helix was the first experimental demonstration of a way in which
information could be transferred from DNA to RNA (Rich, 1960).

Single-stranded RNA can also form many secondary structures in which a single RNA molecule folds over and forms  hairpin loops, stabilized by
intramolecular hydrogen bonds between complementary bases. Such base-pairing of RNA is critical for many RNA functions, such as the ability of tRNA to
bind to the correct sequence of mRNA during translation (Figure 3).
Robert Holley, a chemist at Cornell University, was the first researcher to work out the structure of tRNA (Holley  et al., 1965). This molecule turned out to
be the elusive structure that Francis Crick proposed in his so-called "adapter hypothesis" of 1955—a structure that carried amino acids and arranged them
in a certain order that corresponded to the sequence in the nucleic acid strand. In 1968, Holley was awarded the Nobel Prize in Physiology or Medicine
together with Gobind Khorana, at the University of Wisconsin, and Marshall Nirenberg, at the National Institutes of Health. Nirenberg and Khorana devised
the key experiments to decipher the genetic code—in other words, which sequences of three nucleotides (codons) in an mRNA molecule would code for
which amino acids.
mRNA and Splicing

Figure 4 : Several forms of RNA are involved in gene expression. During transcription, DNA (grey rectangle) is used as a template to produce an RNA
transcript (purple rectangle). The RNA is translated to build the protein molecule (polypeptide) encoded by the original gene. Both prokaryotic and
eukaryotic cells contain messenger RNA (mRNA), ribosomal RNA (rRNA), and transfer RNA (tRNA). Pre-messenger RNA (pre-mRNA), small nuclear RNA
(snRNA), small nucleolar RNA (snoRNA), small cytoplasmic RNA (scRNA), micro RNA (miRNA), and small interfering RNA (siRNA) are found exclusively
in eukaryotic cells

Several forms of RNA play pivotal roles in gene expression—the process responsible for manifesting the instructions stored in the sequence of DNA
nucleotides in either RNA or protein molecules that carry out the cell's activities (Figures 4 & 5). Messenger RNA (mRNA) is particularly important in this
process. mRNA is primarily composed of coding sequences; that is, it carries the genetic information for the amino acid sequence of a protein to
the ribosome, where that particular protein is synthesized. In addition, each mRNA molecule also contains noncoding, or untranslated, sequences that may
carry instructions for how the mRNA is handled by the cell (Figure 6). For example, the untranslated region at the 5' end of the mRNA molecules found
in bacteria and other prokaryotes contains what is called a Shine-Dalgarno sequence, which aids in the binding of the mRNA to ribosomes.
Figure 5 : Location and functions of different classes of RNA molecules.

In contrast, the mRNA of eukaryotic organisms is prepared for translation through more complex mechanisms. For one, the addition of a guanine
nucleotide with a methyl (CH 3) group to the 5' end of the mRNA, called the 5' cap, increases the stability of the mRNA and assists in the binding of the
mRNA to the ribosome for translation. Meanwhile, another untranslated region is added to the 3' end of the mRNA, thereby further affecting the stability of
the molecule. In this case, a "tail" consisting of anywhere from 50 to 250 adenine nucleotides is added to the 3' end. This poly(A) tail can increase the
stability of many mRNA molecules, depending on the proteins that attach to it. The greater the stability, and the longer an mRNA molecule exists in a cell,
the more protein that can be made from that molecule.

In eukaryotes (and to a lesser extent, prokaryotes), when RNA is first transcribed from DNA, it may contain additional noncoding sequences that are
interspersed within the coding sequence. This immature RNA molecule is referred to as precursor mRNA (pre-mRNA) or heterogeneous nuclear RNA
(hnRNA). The intervening noncoding sequences are called introns, and the segments of coding are known as material exons. The introns are then
removed by a process known as RNA splicing to produce the mature mRNA molecule (Figure 7). An organelle called the spliceosome, composed of
protein and small nuclear RNAs (snRNAs), is responsible for recognizing and removing the introns from pre-mRNA.

Figure 6 : mRNA molecules contain noncoding sequences in addition to protein-coding sequences

The surprising discovery of RNA splicing caused a paradigm shift in genetics. Much early work indicated that mRNA and the  genes in DNA were colinear;
that is, they were thought to match up, base for base, with the exception of the 3' poly(A) tail. In the late 1970s, however, seminal studies of  gene
expression in cells infected with an adenovirus demonstrated that the RNA transcripts produced by viral infection contained sequences that were not next
to one another in the viral genome. Further study revealed that these mRNAs were produced after material had been removed or spliced out of a larger
primary transcript (Berget et al., 1977; Evans et al., 1977). Since that time, introns have been found to occur in many eukaryotic cellular genes and some
prokaryotic genes.
Probably the most thoroughly studied class of introns consists of those found in protein-coding genes. The 5' end of these introns almost always begins
with the dinucleotide GU, and the 3' end typically contains AG. Changing one of these nucleotides precludes splicing. Another important sequence occurs
at the branch point, anywhere from 18 to 40 nucleotides upstream from the 3' end of an intron. This sequence always contains an adenine, but it is
otherwise loosely conserved. A typical sequence at a branch point is YNYYRAY, where Y indicates a pyrimidine, N denotes any nucleotide, R any purine,
and A is for adenine (Figure 8) (Pierce, 2000; Patel & Steitz, 2003).
Many eukaryotic genes can be spliced in a number of different ways by choosing between different potential 5′ and 3′ splice junctions, thereby creating
different combinations of exons and introns in the final mRNAs. This mix-and-match process allows the creation of several different proteins from a single
gene sequence. The first example of such "alternative splicing" (Figure 9) was discovered in the adenovirus in 1977 (Berget et al., 1977). The first example
in cellular genes was reported in 1980 in the IgM gene, which encodes an immunoglobulin, one of several proteins created by immune cells to fight
infection by foreign organisms and particles (Early et al., 1980).

The Dscam gene of Drosophila, which encodes proteins involved in guiding embryonic nerves to their target destinations during formation of the fly's
nervous system, exhibits an especially impressive number of alternative splicing patterns. Dozens of different forms of Dscam mRNAs and corresponding
proteins have been identified, while analysis of the gene's sequence reveals a staggering 38,000 potential additional mRNAs, based on the large number of
introns found. The ability to produce so many different proteins from a single gene may be necessary for forming as complex a structure as the nervous
system (Schmucker et al., 2000). In general, the existence of multiple mRNA transcripts from single genes may account for the complexity of some
organisms, such as humans, even though these organisms have relatively few genes (in the case of humans, approximately 25,000).

Figure 7: Introns are removed during RNA splicing. Non-coding sequences, or introns, are removed during RNA splicing to produce a mature mRNA
transcript composed of exons (coding sequences).

tRNA and rRNA: Their Role in Translation

Figure 8 : YNYYRAY is a common sequence at the branching points of introns. Y indicates a pyrimidine; N, a nucleotide; R, a purine; and A, the base
adenine

Two additional categories of RNA play a critical role in the translation process: tRNA and rRNA. Ribosomal RNA (rRNA) molecules were initially
characterized by how rapidly they would "sink" in a centrifuge tube—in other words, they were described by their sedimentation velocity as measured in
Svedberg (S) units. Prokaryotic organisms contain one type of rRNA gene that encodes three distinct RNA  species: the 23S, 5S, and 16S rRNAs. In
comparison, eukaryotic cells contain two types of rRNA genes that give rise to four rRNA species: the 28S, 5.8S, 5S, and 18S rRNAs. Both the eukaryotic
and prokaryotic genomes contain multiple copies of these rRNA genes to be able to manufacture the large number of ribosomes required by a cell. Mature
rRNAs are produced by cleavage and modification of initial transcripts (Pierce, 2000).
Figure 9 : Alternative splicing. Alternative splicing refers to the process by which a given gene is spliced into more than one type of mRNA molecule

Transfer RNA (tRNA) molecules serve as molecular adaptors that bind to mRNA on one end and carry amino acids into position on the other. Most types of
cells possess approximately 30 to 40 different tRNAs, with more than one tRNA corresponding to each amino acid. tRNAs fold into a cloverleaf structure
held together by the pairing of complementary nucleotides. Structural studies using X-ray crystallography have demonstrated that the cloverleaf is further
folded into an L shape (Figure 10). A loop at one end of the folded structure base-pairs with three nucleotides on the mRNA that are collectively called
a codon; the complementary three nucleotides on the tRNA are called the anticodon.

Figure 10 : The secondary structure of tRNA. The base sequence in the flattened model is for tRNA-Ala.

Although the pairing between codon and anticodon takes place over three nucleotides, strict complementary base-pairing is only necessary between the
first two nucleotides. The third position is referred to as the "wobble" position (Figure 11), and the rules for base-pairing are less stringent at this position.
Because of this flexibility, the 30 to 40 tRNAs present in a cell can "read" all 61 codons in mRNA.
The opposite end of the folded structure, which is the 3' end of the tRNA, binds to its corresponding amino acid at an attachment site that is also three
nucleotides long, invariably CCA. Enzymes called aminoacyl-tRNA synthetases attach the correct amino acid to each tRNA, based on the three-
dimensional structure of the tRNA molecule.

More and More RNAs


Finally, there are still more forms of RNA beyond mRNA, rRNA, and tRNA. For instance, short RNAs are not only part of organelles like ribosomes and
spliceosomes, but also of some enzymes. For example, the enzyme telomerase, which adds nucleotides to the ends of chromosomes, is composed of a
451-nucleotide RNA and several proteins. Juli Feigon at the University of California, Los Angeles, together with postdoctoral scholar Carla Theimer and
graduate student Craig Blois, first solved the structure of an essential piece of this RNA by nuclear magnetic resonance spectroscopy (Theimer  et al.,
2005). They revealed a unique RNA structure with extensive RNA folding, which is necessary for telomerase activity.

Other classes of RNA species include microRNAs, small interfering RNAs, and sRNAs—all of which are not translated into proteins but still perform
important functions in the cell. The discovery of these RNAs has been one of the most exciting advances in recent years, and there is currently a lot of
interest in the use of these molecules as possible therapies. But as far as their structure is concerned, these RNAs all share the same basic single-
stranded chemical structure with, in some cases, higher-order structures obtained through complementary base-pair folding.
From the RNA Tie Club to today, the more scientists have studied RNA, the more surprises they have uncovered. New functions for RNA, new
modifications to RNA, and other surprises undoubtedly await discovery in the years to come.

Figure 11: The "wobble" position. Base-pairing rules between the tRNA anticodon and the mRNA codon are less stringent at the third nucleotide
position. This base-pairing flexibility is also called "wobble."

Eukaryotic Genome Complexity


By: Leslie A. Pray, Ph.D. © 2008 Nature Education 

Citation: Pray, L. (2008) Eukaryotic genome complexity. Nature Education 1(1):96

How many genes are there? This question is surprisingly not very important, and has nothing to do with the organism's
complexity. There is more to genomes than protein-coding genes alone.
Consider these fundamental facts about the eukaryotic nuclear genome. It is linear, as opposed to the typically circular DNA of bacterial cells. It conforms
to the Watson-Crick double-helix structural model. Furthermore, it is embedded in nucleosomes—complex DNA-protein structures that pack together to
form chromosomes. Beyond these basic, universal features, eukaryotic genomes vary dramatically in terms of size and gene counts. Even so, genome
size and the number of genes present in an organism reveal little about that organism's complexity (Figure 1).
Figure 1: Chromatin has highly complex structure with several levels of organization. The simplest level is the double-helical structure of DNA.
Does Size Matter?

Figure 2 : Extensive variation in genome size within and among the main groups of life. Ever since the first general surveys of nuclear DNA content
were carried out in the early 1950s, it has been apparent that eukaryotic genome sizes vary enormously and that this is unrelated to intuitive ideas of
morphological complexity. This discrepancy between genome size and complexity remains clear more than half a century later, with genome sizes now
available for nearly 9,000 species of animals and plants. In prokaryotes, genome size and gene number are strongly correlated, but in eukaryotes the vast
majority of nuclear DNA is non-coding. Nevertheless, there is some overlap in genome size between the largest bacteria and the smallest parasitic protists.
The figure illustrates the means and overall ranges of genome size that have been observed so far in the main groups of living organisms, and are loosely
arranged according to common ideas of complexity to further emphasize the disparity between this parameter and genome size. Some commonly cited
extreme values for amoebae (700,000 Mb) have been omitted, as there is considerable uncertainty about the accuracy of these measurements and the
ploidy level of the species involved.

How big is it? That is usually the first question asked about an organism's genome. Over the past 60 years, scientists have estimated the genome sizes of
more than 10,000 plants, animals, and fungi. However, while information about an organism's genome size might seem like a good starting point for
attempting to understand the genetic content, or "complexity," of the organism, this approach often belies the tremendous complexity of the eukaryotic
genome. As Van Straalen and Roelofs (2006) explain, "There is a remarkable lack of correspondence between genome size and organism complexity,
especially among eukaryotes. For example, the marbled lungfish, Protopterus aethiopicus, has more than 40 times the amount of DNA per cell than
humans!" (Figure 2). Indeed, the marbled lungfish has the largest recorded genome of any eukaryote. One haploid copy of this fish's genome is composed
of a whopping 132.8 billion base pairs, while one copy of a human haploid genome has only 3.5 billion. (Genome size is usually measured in picograms
[pg] and then converted to nucleotide number. One pg is equivalent to approximately 1 billion base pairs.) Therefore, genome size is clearly not an
indicator of the genomic or biological complexity of an organism. Otherwise, humans would have at least as much DNA as the marbled lungfish, although
probably much more.
As further clarification, when scientists talk about the eukaryotic genome, they are usually referring to the haploid genome—this is the complete set of DNA
in a single haploid nucleus, such as in a sperm or egg. So, saying that the human genome is approximately 3 billion base pairs (bp) long is the same as
saying that each set of chromosomes is 3 billion bp long. In fact, each of our diploid cells contains twice that amount of base pairs. Moreover, scientists are
usually referring only to the DNA in a cell's nucleus, unless they state otherwise. All eukaryotic cells, however, also have mitochondrial genomes, and many
additionally contain chloroplast genomes. In humans, the mitochondrial genome has only about 16,500 nucleotide base pairs, a mere fraction of the length
of the 3 billion bp nuclear genome (Anderson et al., 1981).

How Many Protein-Coding Genes Are in That Genome?


Interestingly, the same "remarkable lack of correspondence" can be noted when discussing the relationship between the number of protein-coding genes
and organism complexity. Scientists estimate that the human genome, for example, has about 20,000 to 25,000 protein-coding genes. Before completion of
the draft sequence of the Human Genome Project in 2001, scientists made bets as to how many genes were in the human genome. Most predictions were
between about 30,000 and 100,000. Nobody expected a figure as low as 20,000, especially when compared to the number of protein-coding genes in an
organism like Trichomonas vaginalis. T. vaginalis is a single-celled parasitic organism responsible for an estimated 180 million urogenital tract infections in
humans every year. This tiny organism features the largest number of protein-coding genes of any eukaryotic genome sequenced to date: approximately
60,000.
In fact, compared to almost any other organism, humans' 25,000 protein-coding genes do not seem like many. The fruit fly Drosophila melanogaster, for
example, has an estimated 13,000 protein-coding genes. Or consider the mustard plant Arabidopsis thaliana, the "fruit fly" of the plant world, which
scientists use as a model organism for studying plant genetics. A. thaliana has just about the same number of protein-coding genes as humans—actually, it
has slightly more, coming in at about 25,500. Moreover, A. thaliana has one of the smallest genomes in the plant world! It would seem obvious that humans
would have more protein-coding genes than plants, but that is not the case. These observations suggest that there is more to the genome than protein-
coding genes alone.
As shown in Table 1 (adapted from Van Straalen & Roelofs, 2006), there is no clear correspondence between genome size and number of protein-coding
genes—another indication that the number of genes in a eukaryotic genome reveals little about organismal complexity. The number of protein-coding
genes usually caps off at around 25,000 or so, even as genome size increases.

Table 1: Genome Size and Number of Protein-Coding Genes for a Select Handful of Species
Estimated Total Size of GenomeEstimated Number of Protein-
Species and Common Name
(bp)* Encoding Genes*

Saccharomyces
cerevisiae (unicellular 12 million 6,000
budding yeast)

Trichomonas vaginalis 160 million 60,000

Plasmodium falciparum (unicellular
23 million 5,000
malaria parasite)

Caenorhabditis elegans (nematode) 95.5 million 18,000

Drosophila melanogaster (fruit fly) 170 million 14,000

Arabidopsis thaliana (mustard; thale


125 million 25,000
cress)

Oryza sativa (rice) 470 million 51,000

Gallus gallus (chicken) 1 billion 20,000-23,000

Canis familiaris (domestic dog) 2.4 billion 19,000

Mus musculus (laboratory mouse) 2.5 billion 30,000

Homo sapiens (human) 2.9 billion 20,000-25,000

* There may be other estimates in the literature, but most estimates approximate those listed here.

While the majority of emphasis has been placed on protein-coding genes in particular, scientists have continued to refine their definition of what exactly a
gene is, partly in response to the realization that DNA encodes more than just proteins. For instance, in a study of the mouse genome, scientists found that
more than 60% of this 2.5 billion bp genome is transcribed, but less than 2% is actually translated into functional protein products (FANTOM Consortium  et
al., 2005). Within this article, however, the discussion focuses on protein-coding genes, unless otherwise stated. Note, however, that much of the
genome's transcription is dedicated to making tRNA, rRNA, and many RNAs involved in splicing and gene regulation.
While scientists have been measuring genome size for decades, they have only recently had the technological capacity and know-how to count genes. To
estimate the number of protein-coding genes in a genome, scientists often start by using what are known as gene-prediction programs: computational
programs that align the sequence of interest with one or more known genome sequences. Other computer programs can predict gene location by looking
for sequence characteristics of genes, such as open reading frames within exons and CpG islands within promoter regions.
However, all of these computer programs only predict the presence of genes. Each prediction must then be experimentally validated, such as by
using microarray hybridization to confirm that the predicted genes are represented in RNA (Yandell et al., 2005). As Michael Brent, a professor of computer
engineering at Washington University, explained in Nature Biotechnology, gene prediction has become much more accurate over the past several years
(Brent, 2007). Its improved precision accounts for why estimates of the number of genes in the human genome have decreased from 45,000 about 10
years ago, to Venter et al.'s estimate of 26,588 upon completion of the Human Genome Project (Venter et al., 2001), to the current estimate of between
20,000 and 21,000. In short, the older computational methods generated a lot of false positives, meaning that they predicted the presence of protein-coding
genes that weren't actually there.

Beyond Estimating the Number of Protein-Coding Genes


As with genome size, having more protein-coding genes does not necessarily translate into greater complexity. This is because the eukaryotic genome has
evolved other ways to generate biological complexity. Much of this complexity derives from how the genome "behaves," or more precisely, how various
genes are expressed.

Alternative splicing was the first phenomenon scientists discovered that made them realize that genomic complexity cannot be judged by the number of
protein-coding genes. During alternative splicing, which occurs after transcription and before  translation, introns are removed and exons are spliced
together to make an mRNA molecule. However, the exons are not necessarily all spliced back together in the same way. Thus, a single gene,
or transcription unit, can code for multiple proteins or other gene products, depending on how the exons are spliced back together. In fact, scientists have
estimated that there may be as many as 500,000 or more different human proteins, all coded by a mere 20,000 protein-coding genes.
Scientists have since come across several other mechanisms that contribute to the eukaryotic genome's capacity to generate phenotypic complexity.
These include RNA editing, trans-splicing, and tandem chimerism. RNA editing is the alteration of an mRNA molecule after transcription—for example, the
modification of a cytosine to a uracil before an mRNA molecule is translated into a protein. The phenotypic consequences of RNA editing vary among
genes and species. While sometimes detrimental (e.g., some RNA editing events have been associated with  disease), those RNA editing events that lead
to slight changes in protein structure could be selectively advantageous (Reenan, 2005). Trans-splicing is the splicing together of separate transcripts to
form an mRNA molecule, as opposed to alternative splicing, which is the splicing together of exons from the same transcript. Tandem chimerism occurs
when adjacent transcription units are transcribed together to form a single "chimeric" mRNA molecule (Parra et al., 2005).
Consider again those 60,000 protein-coding genes in Trichomonas vaginalis. If all of those 60,000 genes operated at the same level of complexity as the
20,000 or so genes in Homo sapiens, then shouldn't T. vaginalis be a much more complex organism than it is? As it turns out, its genes do not operate at
that same level of complexity. For starters, few of the genes have any introns at all, which means that alternative splicing is not a major source of protein
variation. Rather, scientists suspect the large number of genes—which, incidentally, is 10 times more than they expected they would find before they
started the sequencing project—is due to duplication (Carlton et al., 2007). In other words, many of the genes are simply copies of each other.
Furthermore, about half are believed to be "pseudogenes," or DNA sequences that are similar to functional protein-coding genes but have lost their protein-
encoding capacities. Scientists still don't know why the T. vaginalis genome has so many genes, including so many defunct genes.
Organismal complexity is thus the result of much more than the sheer number of nucleotides that compose a genome and the number of coding sequences
in that genome. Not only may one coding sequence encode a large number of separate protein products via  alternative splicing, but many genomes are
also rich with noncoding RNA sequences that work to coordinate gene expression. When one combines these elements with other regulatory elements,
such as enhancers and promoters, as well as with potential sequences that remain uncharacterized, it becomes clear that while size is one component of
organismal complexity, its contribution to that complexity is small.

Genome Packaging in Prokaryotes: the Circular Chromosome of E. coli


By: Ann Griswold, Ph.D. © 2008 Nature Education 

Citation: Griswold, A. (2008) Genome packaging in prokaryotes: the circular chromosome of E. coli. Nature Education 1(1):57

How do bacteria, lacking a nucleus, organize and pack their genome into the cell? Supercoiling enables this but forces
a different kind of transcription and translation in prokaryotes.
Most students learn at an early age that organisms can be broadly divided into two types: prokaryotes and eukaryotes. In primary school, children are
taught that the main difference between these organisms is that eukaryotic cells contain membrane-bound organelles, such as the nucleus, while
prokaryotic cells do not. There is much more to the story, however, particularly with regard to chromosomal structure and organization.

E. coli: A Model Prokaryote


Much of what is known about prokaryotic chromosome structure was derived from studies of Escherichia coli, a bacterium that lives in the human colon and
is commonly used in laboratory cloning experiments. In the 1950s and 1960s, this bacterium became the model organism of choice for prokaryotic
research when a group of scientists used phase-contrast microscopy and autoradiography to show that the essential genes of E. coli are encoded on a
single circular chromosome packaged within the cell nucleoid (Mason & Powelson, 1956; Cairns, 1963).
Prokaryotic cells do not contain nuclei or other membrane-bound organelles. In fact, the word " prokaryote" literally means "before the nucleus." The
nucleoid is simply the area of a prokaryotic cell in which the chromosomal  DNA is located. This arrangement is not as simple as it sounds, however,
especially considering that the E. coli chromosome is several orders of magnitude larger than the cell itself. So, if bacterial chromosomes are so huge, how
can they fit comfortably inside a cell—much less in one small corner of the cell?

DNA Supercoiling
The answer to this question lies in DNA packaging. Whereas eukaryotes wrap their DNA around proteins called  histones to help package the DNA into
smaller spaces, most prokaryotes do not have histones (with the exception of those species in the domain Archaea). Thus, one way prokaryotes compress
their DNA into smaller spaces is through supercoiling (Figure 1). Imagine twisting a rubber band so that it forms tiny coils. Now twist it even further, so that
the original coils fold over one another and form a condensed ball. When this type of twisting happens to a bacterial  genome, it is known as supercoiling.
Genomes can be negatively supercoiled, meaning that the DNA is twisted in the opposite direction of the double helix, or positively supercoiled, meaning
that the DNA is twisted in the same direction as the double helix. Most bacterial genomes are negatively supercoiled during normal growth.
Proteins Involved in Supercoiling

Figure 1 : Supercoiled chromosome of E. coli. In memory of Dr. Ruth Kavenoff (1944–1999).

During the 1980s and 1990s, researchers discovered that multiple proteins act together to fold and condense prokaryotic DNA. In particular, one protein
called HU, which is the most abundant protein in the nucleoid, works with an enzyme called topoisomerase I to bind DNA and introduce sharp bends in the
chromosome, generating the tension necessary for negative supercoiling. Recent studies have also shown that other proteins, including integration host
factor (IHF), can bind to specific sequences within the genome and introduce additional bends (Rice et al., 1996). The folded DNA is then organized into a
variety of conformations (Sinden & Pettijohn, 1981) that are supercoiled and wound around tetramers of the HU  protein, much like eukaryotic
chromosomes are wrapped around histones (Murphy & Zimmerman, 1997).
Once the prokaryotic genome has been condensed, DNA topoisomerase I, DNA gyrase, and other proteins help maintain the supercoils. One of these
maintenance proteins, H-NS, plays an active role in transcription by modulating the expression of the genes involved in the response to environmental
stimuli. Another maintenance protein, factor for inversion stimulation (FIS), is abundant during exponential growth and regulates the expression of more
than 231 genes, including DNA topoisomerase I (Bradley et al., 2007).

Accessing Supercoiled Genes


Supercoiling explains how chromosomes fit into a small corner of the cell, but how do the proteins involved in replication and transcription access the
thousands of genes in prokaryotic chromosomes when everything is packaged together so tightly? It has been determined that prokaryotic DNA replication
occurs at a rate of 1,000 nucleotides per second, and prokaryotic transcription occurs at a rate of about 40 nucleotides per second (Lewin, 2007),
so bacteria must have highly efficient methods of accessing their DNA strands. But how?
Researchers have noted that the nucleoid usually appears as an irregularly shaped mass within the prokaryotic cell, but it becomes spherical when the cell
is treated with chemicals to inhibit transcription or translation. Moreover, during transcription, small regions of the chromosome can be seen to project from
the nucleoid into the cytoplasm (i.e., the interior of the cell), where they unwind and associate with ribosomes, thus allowing easy access by various
transcriptional proteins (Dürrenberger et al., 1988). These projections are thought to explain the mysterious shape of nucleoids during active growth. When
transcription is inhibited, however, the projections retreat into the nucleoid, forming the aforementioned spherical shape.
Because there is no nuclear membrane to separate prokaryotic DNA from the ribosomes within the cytoplasm, transcription and translation occur
simultaneously in these organisms. This is strikingly different from eukaryotic chromosomes, which are confined to the membrane-bound nucleus during
most of the cell cycle. In eukaryotes, transcription must be completed in the nucleus before the newly synthesized mRNA molecules can be transported to
the cytoplasm to undergo translation into proteins.

Variations in Prokaryotic Genome Structure

Figure 2 : Deer tick. This black-legged tick (Ixodes sp.) carries the bacterium (Borrelia sp.) that causes Lyme disease

Recently, it has become apparent that one size does not fit all when it comes to prokaryotic chromosome structure. While most prokaryotes, like  E. coli,
contain a single circular DNA molecule that makes up their entire genome, recent studies have indicated that some prokaryotes contain as many as
four linear or circular chromosomes. For example, Vibrio cholerae, the bacteria that causes cholera, contains two circular chromosomes. One of these
chromosomes contains the genes involved in metabolism and virulence, while the other contains the remaining essential genes (Trucksis et al., 1998). An
even more extreme example is provided by Borrelia burgdorferi, the bacterium that causes Lyme disease. This organism is transmitted through the bite of
deer ticks (Figure 2), and it contains up to 11 copies of a single linear chromosome (Ferdows & Barbour, 1989). Unlike  E. coli, Borrelia cannot supercoil its
linear chromosomes into a tight ball within the nucleoid; rather, these strands are diffused throughout the cell (Hinnebusch & Bendich, 1997).
Other organisms, such as Bacillus subtilis, form nucleoids that closely resemble those of E. coli, but they use different architectural proteins to do so.
Furthermore, the DNA molecules of Archaea, a taxonomic domain composed of single-celled, nonbacterial prokaryotes that share many similarities with
eukaryotes, can be negatively supercoiled, positively supercoiled, or not supercoiled at all. It is important to note that archaeans are the only group of
prokaryotes that use eukaryote-like histones, rather than the architectural proteins described above, to condense their DNA molecules (Sandman  et al.,
1990). The acquisition of histones by archaeans is thought to have paved the way for the evolution of larger and more complex eukaryotic cells (Minsky et
al., 1997).

Other DNA Differences Between Prokaryotes and Eukaryotes


Most prokaryotes reproduce asexually and are haploid, meaning that only a single copy of each gene is present. This makes it relatively easy to generate
mutations in the lab and study the resulting phenotypes. By contrast, eukaryotes that reproduce sexually generally contain multiple chromosomes and are
said to be diploid, because two copies of each gene exist—with one copy coming from each of an organism's parents.
Yet another difference between prokaryotes and eukaryotes is that prokaryotic cells often contain one or more plasmids (i.e., extrachromosomal DNA
molecules that are either linear or circular). These pieces of DNA differ from chromosomes in that they are typically smaller and encode nonessential
genes, such as those that aid growth in specific conditions or encode antibiotic resistance. Borrelia, for instance, contains more than 20 circular and linear
plasmids that encode genes responsible for infecting ticks and humans (Fraser et al., 1997). Plasmids are often much smaller than chromosomes (i.e., less
than 1,500 kilobases), and they replicate independently of the rest of the genome. However, some plasmids are capable of integrating into chromosomes
or moving from cell to cell.
Perhaps due to the space constraints of packing so many essential genes onto a single chromosome, prokaryotes can be highly efficient in terms of
genomic organization. Very little space is left between prokaryotic genes. As a result, noncoding sequences account for an average of 12% of the
prokaryotic genome, as opposed to upwards of 98% of the genetic material in eukaryotes (Ahnert  et al., 2008). Furthermore, unlike eukaryotic
chromosomes, most prokaryotic genomes are organized into polycistronic operons, or clusters of more than one coding region attached to a
singlepromoter, separated by only a few base pairs. The proteins encoded by each operon often collaborate on a single task, such as the metabolism of a
sugar into by-products that can be used for energy (Figure 3).

Figure 3: The prokaryotic lac operon. Three structural genes code for proteins involved in lactose import and metabolism in bacteria. The genes are
organized together in a cluster called the lac operon.

The organization of prokaryotic DNA therefore differs from that of eukaryotes in several important ways. The most notable difference is the condensation
process that prokaryotic DNA molecules undergo in order to fit inside relatively small cells. Other differences, while not as dramatic, are summarized in
Table 1.

Table 1: Prokaryotic versus Eukaryotic Chromosomes


Prokaryotic Chromosomes Eukaryotic Chromosomes

Many prokaryotes contain a single circular Eukaryotes contain multiple linear


chromosome. chromosomes.
Prokaryotic chromosomes are condensed in Eukaryotic chromosomes are condensed in a
the nucleoid via DNA supercoiling and the binding of membrane-bound nucleus via histones.
various architectural proteins. In eukaryotes, transcription occurs in the
Because prokaryotic DNA can interact with nucleus, and translation occurs in the cytoplasm.
the cytoplasm, transcription and translation occur Most eukaryotes contain two copies of each
simultaneously. gene (i.e., they are diploid).
Most prokaryotes contain only one copy of Some eukaryotic genomes are organized into
each gene (i.e., they are haploid). operons, but most are not.
Nonessential prokaryotic genes are Extrachromosomal plasmids are not commonly
commonly encoded on extrachromosomal plasmids. present in eukaryotes.
Prokaryotic genomes are efficient and Eukaryotes contain large amounts of noncoding
compact, containing little repetitive DNA. and repetitive DNA.
RNA Functions
By: Suzanne Clancy, Ph.D. © 2008 Nature Education 

Citation: Clancy, S. (2008) RNA Functions. Nature Education 1(1):102

The central dogma of molecular biology suggests that the primary role of RNA is to convert the information stored in
DNA into proteins. In reality, there is much more to the RNA story.

RNA

The central dogma of molecular biology suggests that DNA maintains the information to encode all of our proteins, and that three different types
of RNA rather passively convert this code into polypeptides. Specifically, messenger RNA (mRNA) carries the protein blueprint from a cell's DNA to
its ribosomes, which are the "machines" that drive protein synthesis. Transfer RNA (tRNA) then carries the appropriate amino acids into the ribosome for
inclusion in the new protein. Meanwhile, the ribosomes themselves consist largely of ribosomal RNA (rRNA) molecules.

However, in the half-century since the structure of DNA was first elaborated, scientists have learned that RNA does much more than simply play a role in
protein synthesis. For example, many types of RNA have been found to be catalytic--that is, they carry out biochemical reactions just like enzymes do.
Furthermore, many other varieties of RNA have been found to have complex regulatory roles in cells.
Thus, RNA molecules play numerous roles in both normal cellular processes and disease states. Generally, those RNA molecules that do not take the form
of mRNA are referred to as noncoding, because they do not encode proteins. The involvement of noncoding mRNAs in many regulatory processes, their
abundance, and their diversity of functions has led to the hypothesis that an "RNA world" may have preceded the  evolution of DNA and proteins (Gilbert,
1986).

Noncoding RNAs in Eukaryotes


In eukaryotes, noncoding RNA comes in several varieties, most prominently transfer RNA (tRNA) and ribosomal RNA (rRNA). As previously mentioned,
both tRNA and rRNA have long been known to be essential in the translation of mRNA to proteins. For instance, Francis Crick proposed the existence of
adaptor RNA molecules that were able to bind to the nucleotide code of mRNA, thereby facilitating the transfer of amino acids to
growing polypeptidechains. The  work of Hoagland et al. (1958) indeed confirmed that a specific fraction of cellular RNA was covalently bound to amino
acids. Later, the fact that rRNA was found to be a structural component of ribosomes suggested that like tRNA, rRNA was also noncoding.
In addition to rRNA and tRNA, a number of other noncoding RNAs exist in eukaryotic cells. These molecules assist in many essential functions, which are
still being enumerated and defined. As a group, these RNAs are frequently referred to as small regulatory RNAs (sRNAs), and, in eukaryotes, they have
been further classified into a number of subcategories. Together, these various regulatory RNAs exert their effects through a combination
of complementary base pairing, complexing with proteins, and their own enzymatic activities.
Small Nuclear RNAs
One important subcategory of small regulatory RNAs consists of the molecules known as small nuclear RNAs (snRNAs). These molecules play a critical
role in gene regulation by way of RNA splicing. snRNAs are found in the nucleus and are typically tightly bound to proteins in complexes called snRNPs
(small nuclear ribonucleoproteins, sometimes pronounced "snurps"). The most abundant of these molecules are the U1, U2, U5, and U4/U6 particles,
which are involved in splicing pre-mRNA to give rise to mature mRNA.

MicroRNAs
Another topic of intense research interest is that of microRNAs (miRNAs), which are small regulatory RNAs that are approximately 22 to 26 nucleotides in
length. The existence of miRNAs and their functions in gene regulation were initially discovered in the nematode C. elegans (Lee et al., 1993; Wightman et
al., 1993). Since the time of their discovery, miRNAs have also been found in many other  species, including flies, mice, and humans. Several hundred
miRNAs have been identified thus far, and many more may exist (He & Hannon, 2004).
miRNAs have been shown to inhibit gene expression by repressing translation. For example, the miRNAs encoded by C. elegans, lin-4 and let-7, bind to
the 3' untranslated region of their target mRNAs, preventing functional proteins from being produced during certain stages of larval development. Most
miRNAs studied thus far appear to control gene expression by binding to target mRNAs through imperfect base pairing and subsequent inhibition of
translation, although some exceptions have been noted.
Additional studies indicate that miRNAs also play significant roles in cancer and other diseases. For example, the species miR-155 is enriched in B cells
derived from Burkitt's lymphoma, and its sequence also correlates with a known chromosomal translocation (exchange of DNA between chromosomes).
Small Interfering RNAs

Figure 1 : The current model for the biogenesis and post-transcriptional suppression of microRNAs and small interfering RNAs.
The nascent pri-microRNA (pri-miRNA) transcripts are first processed into approximately 70-nucleotide pre-miRNAs by Drosha inside the nucleus. Pre-
miRNAs are transported to the cytoplasm by Exportin 5 and are processed into miRNA:miRNA* duplexes by Dicer. Dicer also processes long dsRNA
molecules into small interfering RNA (siRNA) duplexes. Only one strand of the miRNA:miRNA* duplex or the siRNA duplex is preferentially assembled into
the RNA-induced silencing complex (RISC) , which subsequently acts on its target by translational repression or mRNA cleavage, depending, at least in
part, on the level of complementarity between the small RNA and its target. ORF, open reading frame.

Small interfering RNAs (siRNAs) are yet another class of small RNAs. Although these molecules are only 21 to 25 base pairs in length, they also work to
inhibit gene expression. Specifically, one strand of a double-stranded siRNA molecule can be incorporated into a complex called RISC. This RNA-
containing complex can then inhibit transcription of an mRNA molecule that has a sequence complementary to its RNA component.
siRNAs were first defined by their participation in RNA interference (RNAi). They may have evolved as a defense mechanism against double-stranded RNA
viruses. siRNAs are derived from longer transcripts in a process similar to that by which miRNAs are derived, and processing of both types of RNA involves
the same enzyme, Dicer (Figure 1). The two classes appear to be distinguished by their mechanisms of repression, but exceptions have been found in
which siRNAs exhibit behavior more typical of miRNAs, and vice versa (He & Hannon, 2004).
Small Nucleolar RNAs
Inside the eukaryotic nucleus, the nucleolus is the structure where rRNA processing and ribosomal assembly take place. Molecules called small nucleolar
RNAs (snoRNAs) were isolated from nucleolar extracts because of their abundance in this structure. These molecules function to process rRNA molecules,
often resulting in the methylation and pseudouridylation of specific nucleosides. These modifications are mediated by one of two classes of snoRNAs: the
C/D box or H/ACA box families, which generally mediate the addition of methyl groups or the isomerization of uradine in immature rRNA molecules,
respectively.

Noncoding RNAs in Prokaryotes


Eukaryotes have not cornered the market on noncoding RNAs with specific regulatory functions, however: Bacteria also possess a class of small regulatory
RNAs. Bacterial sRNAs are involved in processes ranging from virulence to the transition from growth to the stationary phase, which occurs when a
bacterium encounters a situation such as nutrient deprivation.
One example of a bacterial sRNA is the 6S RNA found within Escherichia coli; this molecule has been well characterized, with its initial sequencing
occurring in 1980. 6S RNA is conserved across many bacterial species, indicating its important role in gene regulation. This RNA has been shown to affect
the activity of RNA polymerase (RNAP), the molecule that transcribes messenger RNA from DNA. 6S RNA inhibits this activity by binding to a subunit of
the polymerase that stimulates transcription during growth. Through this mechanism, 6S RNA inhibits the expression of genes that drive active growth and
helps cells enter a stationary phase (Jabri, 2005).

Riboswitches
Gene regulation in both prokaryotes and eukaryotes is also affected by RNA regulatory elements, called riboswitches (or RNA switches). Riboswitches are
RNA sensors that detect and respond to environmental or metabolic cues and affect gene expression accordingly.
A simple example of this group is the RNA thermosensor found in the virulence genes of the bacterial pathogen Listeria monocytogenes. When this
bacterium invades a host, the elevated temperature inside the host's body melts the secondary structure of a segment in the 5' untranslated region of the
mRNA produced by the bacterial prfA gene. As a result of this alteration in secondary structure, a ribosome-binding site is exposed, and translation of
protein can take place (Figure 2). Additional riboswitches have been shown to respond to heat and cold shocks in a variety of organisms, as well as to
regulate synthesis of metabolites such as sugars and amino acids (Serganov & Patel, 2007). It is important to note that, although riboswitches seem to be
more prevalent in prokaryotes, many have also been found in eukaryotic cells.

Figure 2: Gene regulation by RNA switches RNA regions that are involved in gene expression switching are shown in the same color. a) Translation
activation of virulence genes in the pathogen Listeria monocytogenes. An increase in temperature melts the secondary structure around the ribosome
binding site (RBS) and start codon, allowing ribosome binding and translation initiation. b) Upregulation of an  Escherichia coli gene by the DsrA antisense
short RNA (sRNA). DsrA RNA pairs with the translational operator of the rpoS gene using two sequences (colored blue and light blue) located within
helices 1 and 2. This base pairing exposes translation initiation signals for ribosome binding and increases mRNA stability.

Catalytic RNA
RNAs with enzymatic (specifically, catalytic) activity, such as the self-splicing molecules, are commonly referred to as ribozymes. Ribozymes have roles
in replication, mRNA processing, and splicing. By definition, these molecules can initiate their activities without the assistance of additional protein
components, although they are often more efficient in vivo (Serganov & Patel, 2007).

Significance of Noncoding RNAs


Forms of noncoding RNAs with novel functions continue to be discovered yet today. This rich complexity and diversity of RNA forms and activities in both
prokaryotes and eukaryotes lends credence to the so-called "RNA world" hypothesis; this hypothesis states that RNA may have evolved prior to DNA and
protein, and it may have played the roles of both of these molecules in the earliest life-forms. The fact that some RNAs have both coding and catalytic
capacity without the need for a protein-based enzyme makes such a hypothesis viable. Nevertheless, it remains unclear whether today's RNA molecules
with catalytic properties are remnants of the evolutionary past, or whether they have more recent origins. The ongoing discovery of new small regulatory
RNA molecules suggests that additional functions may yet be uncovered. Thus, the full contribution of RNA to the life of the cell may still be unknown.

Summary
Ever since the central dogma was first proposed in the 1950s, RNA's role in protein synthesis has been widely appreciated. Today, scientists and lay
people alike know that mRNA is essential to the process of transcription, that tRNA is essential to the process of translation, and that rRNA makes up the
ribosomes in which translation takes place. What far fewer people realize is that RNA is also responsible for numerous other tasks. For example, within
eukaryotes, noncoding RNAs help regulate gene expression and modify other types of RNA. Similarly, in prokaryotes, these molecules are involved in a
wide range of processes, from virulence to regulation of bacterial growth. New forms and uses of noncoding RNAs continue to be discovered, and the
diverse nature of these molecules has led many researchers to believe that RNA may evolutionarily predate both DNA and protein. However, much more
work remains to be completed before this theory can be conclusively confirmed or denied, as well as before scientists fully understand the diverse nature of
the RNA molecule.
RNA Splicing: Introns, Exons and Spliceosome
By: Suzanne Clancy, Ph.D. © 2008 Nature Education 

Citation: Clancy, S. (2008) RNA splicing: introns, exons and spliceosome. Nature Education 1(1):31

What's the difference between mRNA and pre-mRNA? It's all about splicing of introns. See how one RNA sequence can
exist in nearly 40,000 different forms.
For most eukaryotic genes (and some prokaryotic ones), the initial RNA that is transcribed from a gene's DNA template must be processed before it
becomes a mature messenger RNA (mRNA) that can direct the synthesis of protein. One of the steps in this processing, called RNA splicing, involves the
removal or "splicing out" of certain sequences referred to as intervening sequences, or introns. The final mRNA thus consists of the remaining sequences,
called exons, which are connected to one another through the splicing process. RNA splicing was initially discovered in the 1970s, overturning years of
thought in the field of gene expression.
Early Studies in Bacteria
Gene regulation was first studied most thoroughly in relatively simple bacterial systems. Most bacterial RNA transcripts do not undergo splicing; these
transcripts are said to be colinear, with DNA directly encoding them. In other words, there is a one-to-one correspondence of bases between the gene and
the mRNA transcribed from the gene (excepting 5′ and 3′ noncoding regions). However, in 1977, several groups of researchers who were working with
adenoviruses that infect and replicate in mammalian cells obtained some surprising results. These scientists identified a series of RNA molecules that they
termed "mosaics," each of which contained sequences from noncontiguous sites in the viral genome (Berget et al., 1977; Chow et al., 1977). These
mosaics were found late in viral infection. Studies of early infection revealed long primary RNA transcripts that contained all of the sequences from the late
RNAs, as well as what came to be called the intervening sequences (introns). 
Subsequent to the adenoviral discovery, introns were found in many other viral and eukaryotic genes, including those for hemoglobin and immunoglobulin
(Darnell, 1978). Splicing of RNA transcripts was then observed in several in vitro systems derived from eukaryotic cells, including removal of introns
from transfer RNA in yeast cell-free extracts (Knapp et al., 1978). These observations solidified the hypothesis that splicing of large initial transcripts did, in
fact, yield the mature mRNA. Other hypotheses proposed that the DNA template in some way looped or assumed a secondary structure that
allowed transcription from noncontiguous regions (Darnell, 1978).

How Splicing Occurs

Figure 1: Pre-mRNA splicing. Splicing of a pre-mRNA molecule occurs in several steps that are catalyzed by small nuclear ribonucleoproteins (snRNPs).
After the U1 snRNP binds to the 5′ splice site, the 5′ end of the intron base pairs with the downstream branch sequence, forming a lariat. The 3′ end of the
exon is cut and joined to the branch site by a hydroxyl (OH) group at the 3′ end of the exon that attacks the phosphodiester bond at the 3′ splice site. As a
result, the exons (L1 and L2) are covalently bound, and the lariat containing the intron is released.

The biochemical mechanism by which splicing occurs has been studied in a number of systems and is now fairly well characterized. Introns are removed
from primary transcripts by cleavage at conserved sequences called splice sites. These sites are found at the 5′ and 3′ ends of introns. Most commonly, the
RNA sequence that is removed begins with the dinucleotide GU at its 5′ end, and ends with AG at its 3′ end. These consensus sequences are known to be
critical, because changing one of the conserved nucleotides results in inhibition of splicing. Another important sequence occurs at what is called
the branch point, located anywhere from 18 to 40 nucleotides upstream from the 3′ end of an intron. The branch point always contains an adenine, but it is
otherwise loosely conserved. A typical sequence is YNYYRAY, where Y indicates a pyrimidine, N denotes any nucleotide, R denotes any purine, and A
denotes adenine. Rarely, alternate splice site sequences are found that begin with the dinucleotide AU and end with AC; these are spliced through a similar
mechanism.
Splicing occurs in several steps and is catalyzed by small nuclear ribonucleoproteins (snRNPs, commonly pronounced "snurps"). First, the pre-mRNA is
cleaved at the 5′ end of the intron following the attachment of a snRNP called U1 to its complementary sequence within the intron. The cut end then
attaches to the conserved branch point region downstream through pairing of guanine and adenine nucleotides from the 5′ end and the branch point,
respectively, to form a looped structure known as a lariat (Figure 1). The bonding of the guanine and adenine bases takes place via a chemical reaction
known as transesterification, in which a hydroxyl (OH) group on a carbon atom of the adenine "attacks" the bond of the guanine nucleotide at the splice
site. The guanine residue is thus cleaved from the RNA strand and forms a new bond with the adenine.
Next, the snRNPs U2 and U4/U6 appear to contribute to positioning of the 5′ end and the branch point in proximity. With the participation of U5, the 3′ end
of the intron is brought into proximity, cut, and joined to the 5′ end. This step occurs by transesterification; in this case, an OH group at the 3′ end of
the exon attacks the phosphodiester bond at the 3′ splice site. The adjoining exons are covalently bound, and the resulting lariat is released with U2, U5,
and U6 bound to it.
In addition to consensus sequences at their splice sites, eukaryotic genes with long introns also contain exonic splicing enhancers (ESEs). These
sequences, which help position the splicing apparatus, are found in the exons of genes and bind proteins that help recruit splicing machinery to the correct
site. Most splicing occurs between exons on a single RNA transcript, but occasionally trans-splicing occurs, in which exons on different pre-mRNAs are
ligated together.
The splicing process occurs in cellular machines called spliceosomes, in which the snRNPs are found along with additional proteins. The primary variety of
spliceosome is one of the most plentiful structures in the cell, and recently, a secondary type of spliceosome has been identified that processes a minor
category of introns. These introns are referred to as U12-type introns because they depend upon the action of a snRNP called U12 (the common introns
described above are called U2-type introns). The role of U12-type introns is not yet defined, but their persistence throughout  evolution and conservation
between homologous genes of widely divergent species suggests an important functional basis (Patel & Steitz, 2003).

Self-Splicing and Alternative Splicing

Figure 2: A schematic representation of alternative splicing. Alternative splicing refers to the process by which a given gene is spliced into more than
one type of mRNA molecule.

Some RNA molecules have the capacity to splice themselves; the initial discovery of this self-splicing ability in the protozoan  Tetrahymena thermophila was
recognized with the Nobel Prize in 1989. The self-splicing introns found in T. thermophila are now referred to as Group I introns; this class also includes
other protozoan ribosomal RNA genes, some fungal mitochondrial genes, and some phage genes. Group I introns all fold into a complex secondary
structure with nine loops and employ transesterification reactions as described above. On the other hand, Group II self-splicing introns are found in
mitochondrial genes and are excised by a mechanism that bears similarities to pre-mRNA splicing, including the production of lariats. For this reason, it has
been proposed that perhaps pre-mRNA introns and splicing mechanisms evolved from the Group II introns.
Early in the course of splicing research, yet another surprising discovery was made; specifically, researchers noticed that not only was pre-mRNA
punctuated by introns that needed to be excised, but also that alternative patterns of splicing within a single pre-mRNA  molecule could yield different
functional mRNAs (Figure 2; Berget et al. 1977). The first example of alternative splicing was defined in the adenovirus in 1977 and demonstrated that one
pre-mRNA molecule could be spliced at different junctions to result in a variety of mature mRNA molecules, each containing different combinations of
exons.
Shortly afterward, alternative splicing was found to occur in cellular genes as well, with the first example identified in the  IgM gene, a member of the
immunoglobulin superfamily (Early et al., 1980). Another example of a gene with an impressive number of alternative splicing patterns is the Dscam gene
from Drosophila, which is involved in guiding embryonic nerves to their targets during formation of the fly's nervous system. Examination of
the Dscam sequence reveals such a large number of introns that differential splicing could, in theory, create a staggering 38,000 different mRNAs. This
ability to create so many mRNAs may provide the diversity necessary for forming a complex structure such as the nervous system (Schmucker  et al.,
2000). In fact, the existence of multiple mRNA transcripts within single genes may account for the complexity of some organisms, such as humans, that
have relatively few genes (approximately 20,000). For example, work from Wang et al. (2008) suggests that more than 90% of human genes are
alternatively spliced.

The Past and Future of Introns


The existence of introns and differential splicing helps explain how new genes are created during evolution. Splicing makes genes more "modular,"
allowing new combinations of exons to be created during evolution. Furthermore, new exons can be inserted into old introns, creating new proteins without
disrupting the function of the old gene.
Our knowledge of RNA splicing is quite new. Nonetheless, because nearly all eukaryotes have introns and share mechanisms of RNA splicing, splicing
itself must be quite ancient. Proponents of the "intron-early" theory suggest that all organisms (including prokaryotes) at one time had introns in their
genome but subsequently lost these elements, while "intron-late" supporters believe that the restriction of introns to eukaryotes suggests a more recent
introduction (Roy & Gilbert, 2006). There is no apparent pattern in which eukaryotes have introns, and that makes it difficult for researchers to make
predictions about how introns were gained or lost through evolution. What is clear, however, is that introns and splicing have clearly played a significant
role in evolution, and scientists are only beginning to discover the nature of that role.

You might also like