Molecular Biology Notes

Molecular Biology.
BOT-601 Recommended by:

BS-Botany-7th semester Dr. Muther Mansoor Qaisrani
DNA Is a Structure That Encodes Biological

Information
What do a human, a rose, and a bacterium have in common? Each of these things —
along with every other organism on Earth — contains the molecular instructions for life,
calleddeoxyribonucleic acid or DNA. Encoded within this DNA are the directions for traits as
diverse as the color of a person's eyes, the scent of a rose, and the way in which bacteria
infect a lung cell.
DNA is found in nearly all living cells. However, its exact location within a cell
depends on whether that cell possesses a special membrane-bound organelle called a nucleus.
Organisms composed of cells that contain nuclei are classified as eukaryotes, whereas
organisms composed of cells that lack nuclei are classified as prokaryotes. In eukaryotes,
DNA is housed within the nucleus, but in prokaryotes, DNA is located directly within the
cellular cytoplasm, as there is no nucleus available.
But what, exactly, is DNA? In short, DNA is a complex molecule that consists of
many components, a portion of which are passed from parent organisms to their offspring
during the process of reproduction. Although each organism's DNA is unique, all DNA is
composed of the same nitrogen-based molecules. So how does DNA differ from organism to
organism? It is simply the order in which these smaller molecules are arranged that differs
among individuals. In turn, this pattern of arrangement ultimately determines each organism's
unique characteristics, thanks to another set of molecules that "read" the pattern and stimulate
the chemical and physical processes it calls for.
Composed by:
LUBABA KOMAL Page 1
Molecular Biology.BOT-601 Recommended by:
What components make up DNA?
Figure 1: A single nucleotide contains a nitrogenous base (red), a deoxyribose sugar molecule
(gray), and a phosphate group attached to the 5' side of the sugar (indicated by light gray).
Opposite to the 5' side of the sugar molecule is the 3' side (dark gray), which has a free
hydroxyl group attached (not shown).
At the most basic level, all DNA is composed of a series of smaller molecules
callednucleotides. In turn, each nucleotide is itself made up of three primary components: a
nitrogen-containing region known as anitrogenous base, a carbon-based sugar molecule
called deoxyribose, and a phosphorus-containing region known as aphosphate group attached
to the sugar molecule (Figure 1). There are four different DNA nucleotides, each defined by a
specific nitrogenous base: adenine (often abbreviated "A" in science
writing), thymine (abbreviated "T"), guanine (abbreviated "G"), and cytosine(abbreviated
"C") (Figure 2).
Figure 2: The four nitrogenous bases that compose DNA nucleotides are shown in bright
colors: adenine (A, green), thymine (T, red), cytosine (C, orange), and guanine (G, blue).
Although nucleotides derive their names from the nitrogenous bases they contain,
they owe much of their structure and bonding capabilities to their deoxyribose molecule. The
central portion of this molecule contains five carbon atoms arranged in the shape of a ring,
and each carbon in the ring is referred to by a number followed by the prime symbol ('). Of
Composed by:
LUBABA KOMAL Page 2
these carbons, the 5' carbon atom is particularly notable, because it is the site at which the
phosphate group is attached to the nucleotide. Appropriately, the area surrounding this carbon
atom is known as the 5' end of the nucleotide. Opposite the 5' carbon, on the other side of the
deoxyribose ring, is the 3' carbon, which is not attached to a phosphate group. This portion of
the nucleotide is typically referred to as the 3' end (Figure 1). When nucleotides join together
in a series, they form a structure known as a polynucleotide. At each point of juncture within
a polynucleotide, the 5' end of one nucleotide attaches to the 3' end of the adjacent nucleotide
through a connection called aphosphodiester bond (Figure 3). It is this alternating sugar-
phosphate arrangement that forms the "backbone" of a DNA molecule.
Figure 3: All polynucleotides contain an alternating sugar-phosphate backbone. This

backbone is formed when the 3' end (dark gray) of one nucleotide attaches to the 5' phosphate
end (light gray) of an adjacent nucleotide by way of a phosphodiester bond.
How is the DNA strand organized?

Although DNA is often found as a single-stranded polynucleotide, it assumes
its most stable form when double stranded. Double-stranded DNA consists of two
polynucleotides that are arranged such that the nitrogenous bases within one polynucleotide
are attached to the nitrogenous bases within another polynucleotide by way of special
chemical bonds called hydrogen bonds. This base-to-base bonding is not random; rather, each
A in one strand always pairs with a T in the other strand, and each C always pairs with a G.
The double-stranded DNA that results from this pattern of bonding looks much like a ladder
with sugar-phosphate side supports and base-pair rungs.
Note that because the two polynucleotides that make up double-stranded DNA
are "upside down" relative to each other, their sugar-phosphate ends are anti-parallel, or
arranged in opposite orientations. This means that one strand's sugar-phosphate chain runs in
the 5' to 3' direction, whereas the other's runs in the 3' to 5' direction (Figure 4). It's also
critical to understand that the specific sequence of A, T, C, and G nucleotides within an
Composed by:
LUBABA KOMAL Page 3
organism's DNA is unique to that individual, and it is this sequence that controls not only the
operations within a particular cell, but within the organism as a whole.
Figure 4: Double-stranded DNA consists of two polynucleotide chains whose nitrogenous

bases are connected by hydrogen bonds. Within this arrangement, each strand mirrors the
other as a result of the anti-parallel orientation of the sugar-phosphate backbones, as well as
the complementary nature of the A-T and C-G base pairing.
Figure Detail
Figure 5: Rosalind Franklin's X-ray diffraction image of DNA. Images like this one enabled
the precise calculation of molecular distances within the double helix.
Beyond the ladder-like structure described above, another key characteristic of

double-stranded DNA is its unique three-dimensional shape. The first photographic evidence
of this shape was obtained in 1952, when scientist Rosalind Franklin used a process called X-
ray diffraction to capture images of DNA molecules (Figure 5). Although the black lines in
these photos look relatively sparse, Dr. Franklin interpreted them as representing distances
between the nucleotides that were arranged in a spiral shape called a helix.
Around the same time, researchers James Watson and Francis Crick were
pursuing a definitive model for the stable structure of DNA inside cell nuclei. Watson and
Crick ultimately used Franklin's images, along with their own evidence for the double-
Composed by:
LUBABA KOMAL Page 4
stranded nature of DNA, to argue that DNA actually takes the form of a double helix, a
ladder-like structure that is twisted along its entire length (Figure 6). Franklin, Watson, and
Crick all published articles describing their related findings in the same issue of Nature in
1953.
Figure 6: The double helix looks like a twisted ladder.
How is DNA packaged inside cells?
Figure 7: To better fit within the cell, long pieces of double-stranded DNA are tightly packed
into structures called chromosomes.
Most cells are incredibly small. For instance, one human alone consists of
approximately 100 trillion cells. Yet, if all of the DNA within just one of these cells were
arranged into a single straight piece, that DNA would be nearly two meters long! So, how can
this much DNA be made to fit within a cell? The answer to this question lies in the process
known asDNA packaging, which is the phenomenon of fitting DNA into dense compact
forms (Figure 7).
DNA Damage & Repair: Mechanisms for
Maintaining DNA Integrity
DNA RENATURATION
Composed by:
LUBABA KOMAL Page 5
DNA integrity is always under attack from environmental agents like skin cancer-causing UV
rays. How do DNA repair mechanisms detect and repair damaged DNA, and what happens
when they fail?
Because DNA is the storehouse of genetic information in each living cell, its
integrity and stability are essential to life. DNA, however, is not inert; rather, it is a chemical
entity subject to assault from the environment, and any resulting damage, if not repaired, will
lead to mutation and possibly disease. Perhaps the best-known example of the link between
environmental-induced DNA damage and disease is that of skin cancer, which can be caused
by excessive exposure to UV radiation in the form of sunlight (and, to a lesser degree,
tanning beds). Another example is the damage caused by tobacco smoke, which can lead to
mutations in lung cells and subsequent cancer of the lung. Beyond environmental agents,
DNA is also subject to oxidative damage from by-products of metabolism, such as free
radicals. In fact, it has been estimated that an individual cell can suffer up to one million
DNA changes per day (Lodish et al., 2005).
In addition to genetic insults caused by the environment, the very process of

DNA replication during cell division is prone to error. The rate at which DNA
polymerase adds incorrect nucleotides during DNA replication is a major factor in
determining the spontaneous mutation rate in an organism. While a
"proofreading" enzyme normally recognizes and corrects many of these errors, some
mutations survive this process. Estimates of the frequency at which human DNA undergoes
lasting, uncorrected errors range from 1 x 10-4 to 1 x 10-6mutations per gamete for a
given gene. A rate of 1 x 10-6 means that a scientist would expect to find one mutation at a
specific locus per one milliongametes. Mutation rates in other organisms are often much
lower (Table 1).
One way scientists are able to estimate mutation rates is by considering the rate
of new dominant mutations found at different loci. For example, by examining the number of
individuals in a given population who were diagnosed with neurofibromatosis (NF1, a
disease caused by a spontaneous—or noninherited—dominant mutation), scientists
determined that the spontaneous mutation rate of the gene responsible for this disease
averaged 1 x 10-4mutations per gamete (Crowe et al., 1956). Other researchers have found
that the mutation rates of other genes, like that for Huntington's disease, are significantly
lower than the rate for NF1. The fact that investigators have reported different mutation rates
for different genes suggests that certain loci are more prone to damage or error than others.
Figure 1
DNA repair processes exist in both prokaryotic and eukaryotic organisms, and

many of the proteins involved have been highly conserved throughout evolution. In fact, cells
have evolved a number of mechanisms to detect and repair the various types of damage that
can occur to DNA, no matter whether this damage is caused by the environment or by errors
Composed by:
LUBABA KOMAL Page 6
in replication. Because DNA is a molecule that plays an active and critical role in cell
division, control of DNA repair is closely tied to regulation of thecell cycle. (Recall that cells
transit through a cycle involving the G1, S, G2, and M phases, with DNA replication occurring
in the S phase and mitosis in the M phase.) During the cell cycle, checkpoint mechanisms
ensure that a cell's DNA is intact before permitting DNA replication and cell division to
occur. Failures in these checkpoints can lead to an accumulation of damage, which in turn
leads to mutations.
Defects in DNA repair underlie a number of human genetic diseases that affect a
wide variety of body systems but share a group of common traits, most notably a
predisposition to cancer (Table 2). These disorders include ataxia-telangiectasia (AT), a
degenerative motor condition caused by failure to repair oxidative damage in the cerebellum,
and xerodermapigmentosum (XP), a condition characterized by sensitivity to sunlight and
linked to a defect in an important ultraviolet (UV) damage repair pathway. In addition, a
number of genes that have been implicated in cancer, such as the RAD group, have also been
determined to encode proteins critical for DNA damage repair.
UV Damage, Nucleotide Excision Repair, and Photoreactivation
Figure 3
Figure 2
As previously mentioned, one important DNA damage response (DDR) is

triggered by exposure to UV light. Of the three categories of solar UV radiation, only UV-A
and UV-B are able to penetrate Earth's atmosphere. Thus, these two types of UV radiation are
of greatest concern to humans, especially as continuing depletion of the ozone layercauses
higher levels of this radiation to reach the planet's surface.
UV radiation causes two classes of DNA lesions: cyclobutane pyrimidine dimers

(CPDs, Figure 1) and 6-4 photoproducts (6-4 PPs, Figure 2). Both of these lesions distort
DNA's structure, introducing bends or kinks and thereby impeding transcription and
Composed by:
LUBABA KOMAL Page 7
replication. Relatively flexible areas of the DNA double helix are most susceptible to
damage. In fact, one "hot spot" for UV-induced damage is found within a commonly
mutated oncogene, the p53 gene.
Bacteria and several other organisms also possess another mechanism to repair UV
damage called photoreactivation. This method is often referred to as "light repair," because it
is dependent on the presence of light energy. (In comparison, NER and most other repair
mechanisms are frequently referred to as "dark repair," as they do not require light as an
energy source.) During photoreactivation, an enzyme called photolyase binds pyrimidine
dimerlesions; in addition, a second molecule known as chromophore converts light energy
into the chemical energy required to directly revert the affected area of DNA to its
undamaged form. Photolyases are found in numerous organisms, including fungi, plants,
invertebrates such as fruit flies, and vertebrates including frogs. They do not appear to exist
in humans, however (Sinha & Hader, 2002).
Additional DNA Repair mechanisms
Figure 4
NER and photoreactivation are not the only methods of DNA repair. For
instance, base excision repair (BER) is the predominant mechanism that handles the
spontaneous DNA damage caused by free radicals and other reactive speciesgenerated by
metabolism. Bases can become oxidized, alkylated, or hydrolyzed through interactions with
these agents. For example, methyl (CH 3) chemical groups are frequently added to guanine to
form 7-methylguanine; alternatively,purine groups may be lost. All such changes result in
abnormal bases that must be removed and replaced. Thus, enzymes known as DNA
glycosylases remove damaged bases by literally cutting them out of the DNA strand
throughcleavage of the covalent bonds between the bases and the sugar-phosphate
backbone. The resulting gap is then filled by a specialized repair polymerase and sealed by
ligase. Many such enzymes are found in cells, and each is specific to certain types of base
alterations.
Yet another form of DNA damage is double-strand breaks, which are caused by
ionizing radiation, including gamma rays and X-rays. These breaks are highly deleterious. In
addition to interfering with transcription or replication, they can lead to chromosomal
rearrangements, in which pieces of one chromosome become attached to another
chromosome. Genes are disrupted in this process, leading to hybrid proteins or inappropriate
activation of genes. A number of cancers are associated with such rearrangements. Double-
strand breaks are repaired through one of two mechanisms: nonhomologous end joining
(NHEJ) or homologous recombination repair (HRR). In NHEJ, an enzyme calledDNA
ligase IV uses overhanging pieces of DNA adjacent to the break to join and fill in the ends.
Additional errors can be introduced during this process, which is the case if a cell has not
Composed by:
LUBABA KOMAL Page 8
completely replicated its DNA in preparation for division. In contrast, during HRR,
the homologous chromosome itself is used as a template for repair.
Mutations in an organism's DNA are a part of life. Our genetic code is exposed to
a variety of insults that threaten its integrity. But, a rigorous system of checks and balances is
in place through the DNA repair machinery. The errors that slip through the cracks may
sometimes be associated with disease, but they are also a source of variation that is acted
upon by longer-term processes, such as evolution and natural selection.
DNA Replication and Causes of Mutation

DNA replication is a truly amazing biological phenomenon. Consider the countless
number of times that your cells divide to make you who you are—not just
during development, but even now, as a fully mature adult. Then consider that every time a
human cell divides and its DNA replicates, it has to copy and transmit the exact same
sequence of 3 billion nucleotides to its daughter cells. Finally, consider the fact that in life
(literally), nothing is perfect. While most DNA replicates with fairly high fidelity, mistakes
do happen, with polymerase enzymes sometimes inserting the wrong nucleotide or too many
or too few nucleotides into a sequence. Fortunately, most of these mistakes are fixed through
various DNA repair processes. Repair enzymes recognize structural imperfections between
improperly paired nucleotides, cutting out the wrong ones and putting the right ones in their
place. But some replication errors make it past these mechanisms, thus becoming permanent
mutations. These altered nucleotide sequences can then be passed down from one cellular
generation to the next, and if they occur in cells that give rise to gametes, they can even be
transmitted to subsequent organismal generations. Moreover, when the genes for the DNA
repair enzymes themselves become mutated, mistakes begin accumulating at a much higher
rate. In eukaryotes, such mutations can lead to cancer.
Errors Are a Natural Part of DNA Replication

After James Watson and Francis Crick published their model of the double-helix structure of
DNA in 1953, biologists initially speculated that most replication errors were caused by what
are called tautomeric shifts. Both the purine and pyrimidine bases in DNA exist in different
chemical forms, or tautomers, in which the protons occupy different positions in
the molecule (Figure 1). The Watson-Crick model required that the nucleotide bases be in
their more common "keto" form (Watson & Crick, 1953). Scientists believed that if and when
a nucleotide base shifted into its rarer tautomeric form (the "imino" or "enol" form), a likely
result would be base-pair mismatching. But evidence for these types of tautomeric shifts
remains sparse.
Composed by:
LUBABA KOMAL Page 9
Composed by:
LUBABA KOMAL Page 10
Figure 1: Tautomeric shifts in nucleotide bases.
The purine and pyrimidine bases in DNA exist in two different tautomers, or
chemical forms. (A) Nucleotide bases shift from their common “keto” form to their rarer,
tautomeric “enol” form. (B) In common base pair arrangements, the common form of
thymine (T) binds with the common form of adenine (A), and the common form of cytosine
(C) binds with the common form of guanine (G). (C) Rare base-pairing arrangements result
when one nucleotide in a base pair is the rare form instead of the common form. Here, the
rare form of cytosine binds to the common form of adenine instead of guanine. The rare form
of guanine binds to the common form of thymine instead of cytosine.
Figure Detail
Today, scientists suspect that most DNA replication errors are caused by
mispairings of a different nature: either between different but nontautomeric chemical forms
of bases (e.g., bases with an extra proton, which can still bind but often with a mismatched
nucleotide, such as an A with a G instead of a T) or between "normal" bases that nonetheless
bond inappropriately (e.g., again, an A with a G instead of a T) because of a slight shift in
position of the nucleotides in space (Figure 2). This type of mispairing is known as wobble. It
occurs because the DNA double helix is flexible and able to accommodate slightly misshaped
pairings (Crick, 1966).
Figure 2: Wobble in mismatched nucleotide base pairs.

A shift in the position of nucleotides causes a wobble between a normal thymine and normal
guanine. An additional proton on adenine causes a wobble in an adenine-cytosine base-pair.
Replication errors can also involve insertions or deletions of nucleotide bases that
occur during a process called strand slippage. Sometimes, a newly synthesized strand loops
out a bit, resulting in the addition of an extra nucleotide base (Figure 3). Other times,
the template strand loops out a bit, resulting in the omission, or deletion, of a nucleotide base
Composed by:
in the newly synthesized, or primer, strand. Regions of DNA containing many copies of small
repeated sequences are particularly prone to this type of error.
Fixing Mistakes in DNA Replication
DNA polymerase enzymes are amazingly particular with respect to their choice of
nucleotides during DNA synthesis, ensuring that the bases added to a growing strand are
correctly paired with their complements on the template strand (i.e., A's with T's, and C's
with G's). Nonetheless, these enzymes do make mistakes at a rate of about 1 per every
100,000 nucleotides. That might not seem like much, until you consider how much DNA
a cell has. In humans, with our 6 billion base pairs in each diploid cell, that would amount to
about 120,000 mistakes every time a cell divides!
Fortunately, cells have evolved highly sophisticated means of fixing most, but not all,
of those mistakes. Some of the mistakes are corrected immediately during replication through
a process known as proofreading, and some are corrected after replication in a process
called mismatch repair. When an incorrect nucleotide is added to the growing strand,
replication is stalled by the fact that the nucleotide's exposed 3′-OH group is in the "wrong"
position. (Recall that new nucleotides are added to the growing strand during replication by
means of their 5′-phosphate group binding to the 3′-OH group of the previous nucleotide on
the strand.) During proofreading, DNA polymerase enzymes recognize this and replace the
incorrectly inserted nucleotide so that replication can continue. Proofreading fixes about 99%
of these types of errors, but that's still not good enough for normal cell functioning.
After replication, mismatch repair reduces the final error rate even further. Incorrectly paired
nucleotides cause deformities in the secondary structure of the final DNA molecule. During
mismatch repair, enzymes recognize and fix these deformities by removing the incorrectly
paired nucleotide and replacing it with the correct nucleotide.
When Replication Errors Become Mutations

Incorrectly paired nucleotides that still remain following mismatch repair
become permanent mutations after the next cell division. This is because once such mistakes
are established, the cell no longer recognizes them as errors. Consider the case of wobble-
induced replication errors. When these mistakes are not corrected, the incorrectly sequenced
DNA strand serves as a template for future replication events, causing all the base-pairings
thereafter to be wrong. For instance, in the lower half of Figure 2, the original strand had a C-
G pair; then, during replication, cytosine (C) is incorrectly matched to adenine (A) because of
wobble. In this example, wobble occurs because A has an extra hydrogen atom. In the next
round of cell division, the double strand with the C-A pairing would separate during
replication, each strand serving as a template for synthesis of a new DNA molecule. At that
particular spot, C would pair with G, forming a double helix with the same sequence as its
original (i.e., before the wobble occurred), but A would pair with T, forming a new DNA
molecule with an A-T pair in place of the original C-G pair. This type of mutation is known
as a base, or base-pair, substitution. Base substitutions involving replacement of one purine
for another or one pyrimidine for another (e.g., a mismatched A-A pair, instead of A-T) are
known as transitions; the replacement of a purine by a pyrimidine, or vice versa, is called
a transversion.
Likewise, when strand-slippage replication errors are not corrected, they
become insertion and deletion mutations. Much of the early research on strand-slippage
mutations was conducted by George Streisinger in the 1970s. Streisinger, a professor at the
University of Oregon and a fish hobbyist, is known by some as the "founding father of
zebrafish research." However, he is also known for his work with phage T4, a bacterial virus.
Composed by:
Streisinger used this virus to show that most nucleotide insertion and deletion mutations
occur in areas of DNA that contain many repeated sequences (also called tandem repeats),
and he formulated the strand-slippage hypothesis to explain why this was the case
(Streisinger et al., 1966). (In Figure 3, notice the series of repeat T's on the template strand
where the slippage has occurred.) When slippage takes place, the presence of nearby
duplicate bases stabilizes the slippage so that replication can proceed. During the next round
of replication, when the two strands separate, the insertion or deletion on either the template
or primer strand, respectively, will be perpetuated as a permanent mutation. Scientists have
collected enough evidence to confirm Streisinger's strand-slippage hypothesis, and this type
of mutagenesis remains an active field of scientific research.
Figure 3: Strand slippage during DNA replication.

When strand slippage occurs during DNA replication, a DNA strand may loop out, resulting
in the addition or deletion of a nucleotide on the newly-synthesized strand.
Figure Detail
Although most mutations are believed to be caused by replication errors, they can
also be caused by various environmentally induced and spontaneous changes to DNA that
occur prior to replication but are perpetuated in the same way as unfixed replication errors.
As with replication errors, most environmentally induced DNA damage is repaired, resulting
in fewer than 1 out of every 1,000 chemically induced lesions actually becoming permanent
mutations. The same is true of so-called spontaneous mutations. "Spontaneous" refers to the
fact that the changes occur in the absence of chemical,radiation, or other environmental
Composed by:
damage. Rather, they are usually caused by normal chemical reactions that go on in cells,
such as hydrolysis. These types of errors include depurination, which occurs when the bond
connecting a purine to its deoxyribose sugar is broken by a molecule of water, resulting in a
purine-free nucleotide that can't act as a template during DNA replication, and deamination,
which results in the loss of an amino group from a nucleotide, again by reaction with water.
Again, most of these spontaneous errors are corrected by DNA repair processes. But if this
does not occur, a nucleotide that is added to the newly synthesized strand can become a
permanent mutation.
Even Low Mutation Rates Can Be Cause for Concern
Mutation rates vary substantially among taxa, and even among different parts of
the genome in a single organism. Scientists have reported mutation rates as low as 1 mistake
per 100 million (10-8) to 1 billion (10-9) nucleotides, mostly in bacteria, and as high as 1
mistake per 100 (10-2) to 1,000 (10-3) nucleotides, the latter in a group of error-prone
polymerase genes in humans (Johnson et al., 2000).
Even mutation rates as low as 10 -10 can accumulate quickly over time,
particularly in rapidly reproducing organisms like bacteria. This is one reason whyantibiotic
resistance is such an important public health problem; after all, mutations that accumulate in
a population of bacteria provide ample genetic variation with which to adapt (or respond) to
the natural selection pressures imposed by antibacterial drugs (Smolinski et al., 2003).
Take E. coli, for example. The genome of this common intestinal bacterium has about 4.2
million base pairs, or 8.4 million bases. Assuming a mutation rate of 10-9 (i.e., midway
between reported estimates of 10-8 and 10-10), every time E. coli divides, each daughter cell
will have, on average, 0.0084 new mutations. Or, another way to think about it is like this:
Approximately 1% of bacterial cells will contain a new mutation. That may not seem like
much. However, because bacteria can divide as rapidly as twice per hour, a single bacterium
can grow into a colony of 1 million cells in only about 10 hours (2 20 = 1,048,576). At that
point, approximately 10,000 of these bacteria will have accumulated at least one mutation. As
the number of bacteria carrying different mutations increases, so too does the likelihood that
at least one of them will develop a drug-resistant phenotype.
Likewise, in eukaryotes, cells accumulate mutations as they divide. In humans, if
enough somatic mutations (i.e., mutations in body cells rather than sperm or egg cells)
accumulate over the course of a person's lifetime, the end result could be cancer. Or, less
frequently, some cancer mutations are inherited from one or both parents; these are often
referred to as germ-line mutations. One of the first cancer-associated somatic mutations was
discovered in 1982, when researchers found that a mutated HRAS gene was associated with
bladder cancer (Reddy et al., 1982). HRAS encodes for aprotein that helps regulate cell
division. Since then, scientists have identified several hundred additional "cancer genes."
Some of them, like the handful of germ-line mutations associated with a form of colorectal
cancer known as hereditary nonpolyposis colorectal cancer (HNPCC), play crucial roles in
DNA repair (Wijnen et al., 1998).
Of course, not all mutations are "bad." But, because so many mutations can
cause cancer, DNA repair is obviously a crucially important property ofeukaryotic cells.
However, too much of a good thing can be dangerous. If DNA repair were perfect and no
mutations ever accumulated, there would be no genetic variation—and this variation serves
as the raw material for evolution. Successful organisms have thus evolved the means to repair
their DNA efficiently but not too efficiently, leaving just enough genetic variability for
evolution to continue.
Composed by:
What is a Gene? Colinearity and Transcription Units

In the early part of the twentieth century, scientists knew what genes did, but they
did not know what they were. Francis Crick, one of the codiscoverers of the three-
dimensional double helical structure of DNA, was among the first to propose that a gene was
a linear sequence of nucleotides and that each gene encoded a single protein. Crick called this
proposal the sequence hypothesis (Crick, 1958); other scientists have since referred to it as
the genes-on-a-string hypothesis. In Crick's words, this hypothesis "assumes that the
specificity of a piece of nucleic acid is expressed solely by the sequence of its bases, and this
sequence is a (simple) code for the amino acid sequence of a particular protein." Crick freely
admitted that his hypothesis was just that: a hypothesis "for which proof is completely
lacking." However, in an effort to rationalize his speculation, Crick cited some experimental
work with bacteriophages that had been conducted by American molecular biologist Seymour
Benzer. Benzer's work demonstrated that, in Crick's words, "a functional gene consists of
many sites arranged strictly in a linear order" (Crick, 1958; italics original).
Today, scientists no longer speak of the sequence hypothesis. Instead, the
notion that nucleotide sequences (genes) directly dictate amino acid sequences is known
as colinearity (Figure 1). Scientists have confirmed that colinearity is a regular occurrence
among many viruses, like the ones Benzer studied, as well as among bacteria. However, it
turns out that colinearity is the exception, not the rule, in eukaryotic genomes.
Figure 1: The colinearity of nucleotide and amino acid sequences.

Colinearity is the concept that nucleotide sequences in genes dictate amino acid
sequences in proteins.
Figure Detail
Composed by:
Alternatives to Colinearity
Figure 2
One of the first clues that the colinearity of DNA and amino acid sequences is not
as simple as what Crick had proposed was the discovery of RNA splicing in the 1970s. Using
common cold viruses as their experimental systems, English molecular chemist Richard
Roberts and American molecular biologist Philip Sharp independently discovered that genes
can be split into several segments along the genome (Berget et al., 1977; Chow et al., 1977).
Then, using electron microscopy, both scientists observed that a single messenger RNA
(mRNA) molecule hybridized not to a single stretch of DNA but to as many as four or more
discontinuous DNA segments (Figure 2).
Roberts and Sharp also noted that the genetic material actually breaks apart and
then re-forms itself at certain points in protein synthesis. Specifically, the sections of DNA
that encode protein production are known as exons, and the noncoding sections interspersed
among the exons are known as introns. During splicing, which occursaftertranscription (i.e.,
the synthesis of RNA from a DNA template), the introns are removed and the exons are
joined, or spliced together.
Roberts's and Sharp's findings not only raised serious doubts about the concept
of a gene as a continuous, clearly demarcated segment of DNA, but they also led to a flurry
of research activity, with scientists curious about whether the same was true in other species.
As other researchers were quick to discover, discontinuous gene structure and splicing
during RNA processing are the norm, not the exception, in most eukaryotes. Some vertebrate
genes contain as many as 50 exons, and exons often make up only a small portion of the
transcribed region of a gene. For example, in one early splicing study that involved
examination of the intron-exon pattern of a chicken ovalbumin gene, Stein et al. (1980)
measured eight exons ranging in length from 20 to 181 base pairs and seven introns ranging
in length from 264 to 1,150 base pairs. Since that study, scientists have detected introns as
long as 50,000 base pairs or more in some species.
Composed by:
Figure 3
Figure Detail
The final protein products encoded by any given intron-exon sequence also
vary in structure, depending on which exons are spliced back together during RNA
processing. This so-called "alternative splicing" is illustrated in Figure 3. Scientists have also
since learned that eukaryotic cells have evolved another "alternative" mRNA processing
pathway: the use ofmultiple 3' cleavage sites in a single exon. (Every intron has a 5' and 3'
splice site.) As illustrated in Figure 3, the end result is the same as with alternative splicing:
different mRNA molecules are produced from a single protein-coding gene. Clearly, contrary
to the conventional notion of a single gene encoding a single protein, a single continuous
stretch of DNA can encode multiple mRNA molecules and, ultimately, multiple protein
products.
Transcription Units Instead of Genes
Given the vast quantity of DNA that appears to have little protein-encoding power and the
fact that so much of this DNA resides right in the middle of functional genes (as introns),
some scientists prefer to think in terms of "transcription units" rather than "genes." A
transcription unit is a linear sequence of DNA that extends from a transcription start site to a
transcription stop site (Figure 4).
Figure 4
The promoter, a DNA sequence that lies upstream of the RNA coding region,

serves as an indicator of where and in which direction transcription should proceed. The
promoter is not actually transcribed; its role is purely regulatory. While promoters vary
Composed by:
tremendously among eukaryotes, there are some common features. For example, most
promoters lie immediately upstream of the transcription unit (transcription proceeds in an
upstream to downstream direction), and most contain what is known as a TATA box; this is a
sequence that is recognized and bound by a so-called TATA binding protein. The TATA
binding protein helps position the RNA polymerase machinery and initiates transcription.
Some promoters work in concert with other types of regulatory sequences known as
enhancers, which sometimes lie several kilobases further upstream or downstream from the
coding sequence itself, or even within introns. These two sequences are able to interact
because of the way DNA molecules bend in space, enabling sections that would otherwise be
very far from each other to interact (via DNA-binding proteins). Enhancer regions serve as
binding sites for proteins known as activators (Figure 5). The proteins that bind to promoters
to regulate transcription are called transcription factors. The RNA coding region, the main
component of the transcription unit, contains the actual exons and introns. The terminator, a
sequence of nucleotides at the end of the transcription unit, is transcribed along with the RNA
coding region. The terminator serves as a speed bump of sorts; transcription stops only after
this region has been transcribed.
Scientists have recently discovered that some mRNA molecules are coded
by exons from multiple transcription units through a process known as trans-splicing. In fact,
in 2005, a European group of researchers estimated that about 4% to 5% of tandem
transcription units (i.e., distinct but adjacent transcription units) in humans are transcribed
together to create single "chimeric" mRNA molecules (Parra et al., 2005). Scientists are not
sure how this occurs. Some speculate that transcription overrides the first transcription
terminator and doesn't stop until it reaches the second termination site; others suspect that
both transcripts are formed independently and then spliced together to form the chimeric
mRNA molecule.
Composed by:
Figure 5: The promoter during transcription initiation.

In preparation for the transcription process, RNA polymerase is positioned on
DNA with the help of TATA binding proteins. TATA binding proteins bind the TATA box, a
DNA sequence that comprises part of the promoter.
Figure Detail
Composed by:
Delineating Gene Regions

It seems that the more scientists learn about the genome and gene
expression, the less they seem to be able to identify the point along a stretch of nucleotides at
which a single gene actually begins and ends; indeed, it appears to be increasingly more
difficult to determine whether there are even actual discrete nucleotide start and stop points
for genes. This complexity continues to make it difficult for scientists to agree on exactly
what a gene is. At the very least, scientists now know that Crick's original sequence
hypothesis was overly simplistic, at least for eukaryotes. Genes are not linear sequences of
DNA that directly correspond one-to-one with their protein counterparts.
Moreover, scientists now know that not all transcribed RNA molecules, or
transcripts, end up being translated into protein products. For example, in a study of the
mouse genome, researchers found that as much as 63% of the genome is transcribed but only
about 1% to 2% is translated into a functional protein product (FANTOM Consortium et al.,
2005). So not only is the notion of colinearity overly simplistic, but so too is the notion that
all genes encode proteins. Many code other types of molecules, like tRNA and rRNA, that
have important known cellular functions. Other non-protein-coding RNAs work to regulate
gene expression at multiple levels, and still other transcripts have unknown functions.
Major Molecular Events of DNA Replication

Scientists have devoted decades of effort to understanding how deoxyribonucleic
acid (DNA) replicates itself. In simple terms, replication involves use of an existing strand of
DNA as a template for the synthesis of a new, identical strand. American enzymologist and
Nobel Prize winner Arthur Kornberg compared this process to a tape recording of
instructions for performing a task: "[E]xact copies can be made from it, as from a tape
recording, so that this information can be used again and elsewhere in time and space"
(Kornberg, 1960).
In reality, the process of replication is far more complex than suggested by
Kornberg's analogy. Researchers typically utilize simple bacterial cells in their experiments,
but they still do not have all the answers, particularly when it comes to eukaryotic replication.
Nonetheless, scientists are familiar with the basic steps in the replication process, and they
continue to rely on this information as the basis for continued research and experimentation.
The Molecular Machinery of Bacterial DNA Replication
A typical bacterial cell has anywhere from about 1 million to 4 million base pairs
of DNA, compared to the 3 billion base pairs in the genome of the common house mouse
(Mus musculus). Still, even in bacteria, with their smaller genomes, DNA replication involves
an incredibly sophisticated, highly coordinated series of molecular events. These events are
divided into four major stages: initiation, unwinding, primer synthesis, and elongation.
Initiation and Unwinding
Initiation
Composed by:
During initiation, so-called initiator proteins bind to the replication origin, a base-pair sequence of
nucleotides known as oriC. This binding triggers events that unwind the DNA double helix into two
single-stranded DNA molecules. Several groups of proteins are involved in this unwinding (Figure 1).
For example, the DNA helicases are responsible for breaking the hydrogen bonds that join
the complementary nucleotide bases to each other; these hydrogen bonds are an essential feature
of James Watson and Francis Crick's three-dimensional DNA model. Because the newly unwound
single strands have a tendency to rejoin, another group of proteins, the single-strand-binding
proteins, keep the single strands stable until elongation begins. A third family of proteins, the
topoisomerases, reduce some of the torsional strain caused by the unwinding of the double helix.
Figure 1: Facilitation of DNA unwinding.

During DNA replication, several proteins facilitate the unwinding of the DNA
double helix into two single strands. Topoisomerases (red) reduce torsional strain caused by
the unwinding of the DNA double helix; DNA helicase (yellow) breaks hydrogen bonds
between complementary base-pairs; single-strand binding proteins (SSBs) stabilize the
separated strands and prevent them from rejoining.
As previously mentioned, the location at which a DNA strand begins to unwind
into two separate single strands is known as the origin of replication. As shown in Figure 1,
when the double helix unwinds, replication proceeds along the two single strands at the same
time but in opposite directions (i.e., left to right on one strand, and right to left on the other).
This forms two replication forks that move along the DNA, replicating as they go.
Composed by:
Primer Synthesis
Primer synthesis marks the beginning of the actual synthesis of the new
DNA molecule. Primers are short stretches of nucleotides (about 10 to 12 bases in length)
synthesized by an RNA polymerase enzyme called primase. Primers are required because
DNA polymerases, the enzymes responsible for the actual addition of nucleotides to the new
DNA strand, can only add deoxyribonucleotides to the 3’-OH group of an existing chain and
cannot begin synthesis de novo. Primase, on the other hand, can add ribonucleotides de novo.
Later, after elongation is complete, the primer is removed and replaced with DNA
nucleotides.
Elongation
Finally, elongation--the addition of nucleotides to the new DNA strand--begins
after the primer has been added. Synthesis of the growing strand involves adding
nucleotides, one by one, in the exact order specified by the original (template) strand.
Recall that one of the key features of the Watson-Crick DNA model is that adenine is
always paired with thymine and cytosine is always paired with guanine. So, for example,
if the original strand reads A-G-C-T, the new strand will read T-C-G-A.
DNA is always synthesized in the 5'-to-3' direction, meaning that nucleotides are
added only to the 3' end of the growing strand. As shown in Figure 2, the 5'-phosphate group
of the new nucleotide binds to the 3'-OH group of the last nucleotide of the growing strand.
Scientists have yet to identify a polymerase that can add bases to the 5' ends of DNA strands.
Composed by:
Figure 2: New DNA is synthesized from deoxyribonucleoside triphosphates (dNTPs).

(A) A deoxyribonucleoside triphosphate (dNTP). (B) During DNA replication, the 3'-OH
group of the last nucleotide on the new strand attacks the 5'-phosphate group of the incoming
dNTP. Two phosphates are cleaved off. (C) A phosphodiester bond forms between the two
nucleotides, and phosphate ions are released.
The Discovery of DNA Polymerase

While studying E. coli bacteria, enzymologist Arthur Kornberg discovered that
DNA polymerases catalyze DNA synthesis. Kornberg's experiment involved mixing all of the
basic "ingredients" necessary for E. coli DNA synthesis in a test tube, including
nucleotides, E. coli extract, and ATP, and then purifying and testing the enzymes involved.
Composed by:
Using this method, Kornberg not only discovered DNA polymerases, but he also performed
some of the initial work demonstrating how enzymes add new nucleotides to growing DNA
chains (Kornberg, 1959).
Scientists have since identified a total of five different DNA polymerases in E. coli,
each with a specialized role. For example, DNA polymerase III does most of the elongation
work, adding nucleotides one by one to the 3' end of the new and growing single strand.
Other enzymes, including DNA polymerase I and RNase H, are responsible for removing the
RNA primer after DNA polymerase III has begun its work, replacing it with DNA
nucleotides (Ogawa & Okazaki, 1984). When these enzymes finish, they leave a nick
between the section of DNA that was formerly the primer and the elongated section of DNA.
Another enzyme called DNA ligase then acts to seal the bond between the two adjacent
nucleotides.
DNA Polymerase Only Moves in One Direction
After a primer is synthesized on a strand of DNA and the DNA strands unwind,
synthesis and elongation can proceed in only one direction. As previously mentioned, DNA
polymerase can only add to the 3' end, so the 5' end of the primer remains unaltered.
Consequently, synthesis proceeds immediately only along the so-called leading strand. This
immediate replication is known as continuous replication. The other strand (in the 5' direction
from the primer) is called the lagging strand, and replication along it is called discontinuous
replication. The double helix has to unwind a bit before the synthesis of another primer can
be initiated further up on the lagging strand. Synthesis can then occur from the 3' end of that
new primer. Next, the double helix unwinds a bit more, and another spurt of replication
proceeds. As a result, replication along the lagging strand can only proceed in short,
discontinuous spurts (Figure 3).
Composed by:
Figure 3: Replication of the leading DNA strand is continuous, while replication along the
lagging strand is discontinuous.
Composed by:
After a short length of the DNA has been unwound, synthesis must proceed in the 5'
to 3' direction; that is, in the direction opposite that of the unwinding.
Figure Detail
The fragments of newly synthesized DNA along the lagging strand are called Okazaki
fragments, named in honor of their discoverer, Japanese molecular biologist Reiji Okazaki.
Okazaki and his colleagues made their discovery by conducting what is known as a pulse-
chase experiment, which involved exposing replicating DNA to a short "pulse" of isotope-
labeled nucleotides and then varying the length of time that the cells would be exposed to
nonlabeled nucleotides. This later period is called the "chase" (Okazaki et al., 1968). The
labeled nucleotides were incorporated into growing DNA molecules only during the initial
few seconds of the pulse; thereafter, only nonlabeled nucleotides were incorporated during
the chase. The scientists then centrifuged the newly synthesized DNA and observed that the
shorter chases resulted in most of the radioactivity appearing in "slow" DNA. The
sedimentation rate was determined by size: smaller fragments precipitated more slowly than
larger fragments because of their lighter weight. As the investigators increased the length of
the chases, radioactivity in the "fast" DNA increased with little or no increase of radioactivity
in the slow DNA. The researchers correctly interpreted these observations to mean that, with
short chases, only very small fragments of DNA were being synthesized along the lagging
strand. As the chases increased in length, giving DNA more time to replicate, the lagging
strand fragments started integrating into longer, heavier, more rapidly sedimenting DNA
strands. Today, scientists know that the Okazaki fragments of bacterial DNA are typically
between 1,000 and 2,000 nucleotides long, whereas in eukaryotic cells, they are only about
100 to 200 nucleotides long.
The Challenges of Eukaryotic Replication
Bacterial and eukaryotic cells share many of the same basic features of replication;
for instance, initiation requires a primer, elongation is always in the 5'-to-3' direction, and
replication is always continuous along the leading strand and discontinuous along the lagging
strand. But there are also important differences between bacterial and eukaryotic replication,
some of which biologists are still actively researching in an effort to better understand the
molecular details. One difference is that eukaryotic replication is characterized by many
replication origins (often thousands), not just one, and the sequences of the replication origins
vary widely among species. On the other hand, while the replication origins for bacteria,
oriC, vary in length (from about 200 to 1,000 base pairs) and sequence, except among closely
related organisms, all bacteria nonetheless have just a single replication origin
(Mackiewicz et al., 2004).
Eukaryotic replication also utilizes a different set of DNA polymerase enzymes
(e.g., DNA polymerase δ and DNA polymerase ε instead of DNA polymerase III). Scientists
are still studying the roles of the 13 eukaryotic polymerases discovered to date. In addition, in
eukaryotes, the DNA template is compacted by the way it winds around proteins
called histones. This DNA-histone complex, called a nucleosome, poses a unique challenge
both for the cell and for scientists investigating the molecular details of eukaryotic
replication. What happens to nucleosomes during DNA replication? Scientists know from
electron micrograph studies that nucleosome reassembly happens very quickly after
replication (the reassembled nucleosomes are visible in the electron micrograph images), but
they still do not know how this happens (Annunziato, 2005).
Also, whereas bacterial chromosomes are circular, eukaryotic chromosomes
are linear. During circular DNA replication, the excised primer is readily replaced by
Composed by:
nucleotides, leaving no gap in the newly synthesized DNA. In contrast, in linear DNA
replication, there is always a small gap left at the very end of the chromosome because of the
lack of a 3'-OH group for replacement nucleotides to bind. (As mentioned, DNA synthesis
can proceed only in the 5'-to-3' direction.) If there were no way to fill this gap, the DNA
molecule would get shorter and shorter with every generation. However, the ends of linear
chromosomes—the telomeres—have several properties that prevent this.
DNA replication occurs during the S phase of cell division. In E. coli, this means
that the entire genome is replicated in just 40 minutes, at a pace of approximately 1,000
nucleotides per second. In eukaryotes, the pace is much slower: about 40 nucleotides per
second. The coordination of the proteincomplexes required for the steps of replication and the
speed at which replication must occur in order for cells to divide are impressive, especially
considering that enzymes are also proofreading, which leaves very few errors behind.
Summary
The study of DNA replication started almost as soon as the structure of DNA was
elucidated, and it continues to this day. Currently, the stages of initiation, unwinding, primer
synthesis, and elongation are understood in the most basic sense, but many questions remain
unanswered, particularly when it comes to replication of the eukaryotic genome. Scientists
have devoted decades to the study of replication, and researchers such as Kornberg and
Okazaki have made a number of important breakthroughs. Nonetheless, much remains to be
learned about replication, including how errors in this process contribute to human disease.
References and Recommended Reading
DNA Transcription
The genetic code is frequently referred to as a "blueprint" because it contains the
instructions a cell requires in order to sustain itself. We now know that there is more to these
instructions than simply the sequence of letters in the nucleotide code, however. For example,
vast amounts of evidence demonstrate that this code is the basis for the production of various
molecules, including RNA and protein. Research has also shown that the instructions stored
within DNA are "read" in two steps: transcription and translation. In transcription, a portion
of the double-stranded DNA template gives rise to a single-stranded RNA molecule. In some
cases, the RNA molecule itself is a "finished product" that serves some
important function within the cell. Often, however, transcription of an RNA molecule is
followed by a translation step, which ultimately results in the production of a protein
molecule.
Visualizing Transcription
Composed by:
Figure 1
The process of transcription can be visualized by electron microscopy (Figure 1); in

fact, it was first observed using this method in 1970. In these early electron micrographs, the
DNA molecules appear as "trunks," with many RNA "branches" extending out from them.
When DNAse and RNAse (enzymes that degrade DNA and RNA, respectively) were added
to the molecules, the application of DNAse eliminated the trunk structures, while the use of
RNAse wiped out the branches.
DNA is double-stranded, but only one strand serves as a template for transcription at
any given time. This template strand is called the noncoding strand. The nontemplate
strand is referred to as the coding strand because its sequence will be the same as that of the
new RNA molecule. In most organisms, the strand of DNA that serves as the template for
one gene may be the nontemplate strand for other genes within the same chromosome.
The Transcription Process

The process of transcription begins when an enzyme called RNA
polymerase (RNA pol) attaches to the template DNA strand and begins to catalyze
production of complementary RNA. Polymerases are large enzymes composed of
approximately a dozen subunits, and when active on DNA, they are also typically complexed
with other factors. In many cases, these factors signal which gene is to be transcribed.
Three different types of RNA polymerase exist in eukaryotic cells,

whereas bacteria have only one. In eukaryotes, RNA pol I transcribes the genes that encode
most of the ribosomal RNAs (rRNAs), and RNA pol III transcribes the genes for one small
rRNA, plus the transfer RNAs that play a key role in the translation process, as well as other
small regulatory RNA molecules. Thus, it is RNA pol II that transcribes the messenger
RNAs, which serve as the templates for production of protein molecules.
Transcription Initiation
Figure 3
Composed by:
Figure Detail
Figure 2
Figure Detail
The first step in transcription is initiation, when the RNA pol binds to the
DNA upstream(5′) of the gene at a specialized sequence called a promoter (Figure 2a). In
bacteria, promoters are usually composed of three sequence elements, whereas in eukaryotes,
there are as many as seven elements.
In prokaryotes, most genes have a sequence called the Pribnow box, with
theconsensus sequence TATAAT positioned about ten base pairs away from the site that
serves as the location of transcription initiation. Not all Pribnow boxes have this exact
nucleotide sequence; these nucleotides are simply the most common ones found at each site.
Although substitutions do occur, each box nonetheless resembles this consensus fairly
closely. Many genes also have the consensus sequence TTGCCA at a position 35 bases
upstream of the start site, and some have what is called anupstream element, which is an A-T
rich region 40 to 60 nucleotides upstream that enhances the rate of transcription (Figure 3). In
any case, upon binding, the RNA pol "core enzyme" binds to another subunit called the
sigma subunit to form a holoezyme capable of unwinding the DNA double helix in order to
facilitate access to the gene. The sigma subunit conveys promoter specificity to RNA
polymerase; that is, it is responsible for telling RNA polymerase where to bind. There are a
number of different sigma subunits that bind to different promoters and therefore assist in
turning genes on and off as conditions change.
Eukaryotic promoters are more complex than their prokaryotic counterparts, in part
because eukaryotes have the aforementioned three classes of RNA polymerase that transcribe
different sets of genes. Many eukaryotic genes also possess enhancer sequences, which can
be found at considerable distances from the genes they affect. Enhancer sequences control
gene activation by binding with activator proteins and altering the 3-D structure of the DNA
to help "attract" RNA pol II, thus regulating transcription. Because eukaryotic DNA is tightly
Composed by:
packaged as chromatin, transcription also requires a number of specialized proteins that help
make the template strand accessible.
In eukaryotes, the "core" promoter for a gene transcribed by pol II is most often
found immediately upstream (5′) of the start site of the gene. Most pol II genes have a TATA
box (consensus sequence TATTAA) 25 to 35 bases upstream of the initiation site, which
affects the transcription rate and determines location of the start site. Eukaryotic RNA
polymerases use a number of essential cofactors (collectively called general transcription
factors), and one of these, TFIID, recognizes the TATA box and ensures that the correct start
site is used. Another cofactor, TFIIB, recognizes a different common consensus sequence,
G/C G/C G/C G C CC, approximately 38 to 32 bases upstream (Figure 4).
Figure 4: Eukaryotic core promoter region.
In eukaryotes, genes transcribed into RNA transcripts by the enzyme RNA

polymerase II are controlled by a core promoter. A core promoter consists of a transcription
start site, a TATA box (at the -25 region), and a TFIIB recognition element (at the -35
region).
The terms "strong" and "weak" are often used to describe promoters and
enhancers, according to their effects on transcription rates and thereby ongene expression.
Alteration of promoter strength can have deleterious effects upon a cell, often resulting
in disease. For example, some tumor-promoting viruses transform healthy cells by inserting
strong promoters in the vicinity of growth-stimulating genes, while translocations in
some cancer cells place genes that should be "turned off" in the proximity of strong
promoters or enhancers.
Enhancer sequences do what their name suggests: They act to enhance the rate at
which genes are transcribed, and their effects can be quite powerful. Enhancers can be
thousands of nucleotides away from the promoters with which they interact, but they are
brought into proximity by the looping of DNA. This looping is the result of interactions
between the proteins bound to the enhancer and those bound to the promoter. The proteins
that facilitate this looping are called activators, while those that inhibit it are called
repressors.
Composed by:
Transcription of eukaryotic genes by polymerases I and III is initiated in a similar
manner, but the promoter sequences and transcriptional activator proteins vary.
Strand Elongation
Once transcription is initiated, the DNA double helix unwinds and RNA polymerase
reads the template strand, adding nucleotides to the 3′ end of the growing chain (Figure 2b).
At a temperature of 37 degrees Celsius, new nucleotides are added at an estimated rate of
about 42-54 nucleotides per second in bacteria (Dennis & Bremer, 1974), while eukaryotes
proceed at a much slower pace of approximately 22-25 nucleotides per second (Izban&Luse,
1992).
Transcription Termination
Composed by:
Figure 5: Rho-independent termination in bacteria.
Inverted repeat sequences at the end of a gene allow folding of the newly
transcribed RNA sequence into a hairpin loop. This terminates transcription and stimulates
release of the mRNA strand from the transcription machinery.
Figure Detail
Terminator sequences are found close to the ends of noncoding sequences (Figure

2c). Bacteria possess two types of these sequences. In rho-independent terminators, inverted
repeat sequences are transcribed; they can then fold back on themselves in hairpin loops,
causing RNA pol to pause and resulting in release of the transcript (Figure 5). On the other
Composed by:
hand, rho-dependent terminators make use of a factor called rho, which actively unwinds the
DNA-RNA hybrid formed during transcription, thereby releasing the newly synthesized
RNA.
In eukaryotes, termination of transcription occurs by different processes, depending

upon the exact polymerase utilized. For pol I genes, transcription is stopped using a
termination factor, through a mechanism similar to rho-dependent termination in bacteria.
Transcription of pol III genes ends after transcribing a termination sequence that includes a
polyuracil stretch, by a mechanism resembling rho-independent prokaryotic termination.
Termination of pol II transcripts, however, is more complex.
Transcription of pol II genes can continue for hundreds or even thousands of

nucleotides beyond th e end of a noncoding sequence. The RNA strand is then cleaved by a
complex that appears to associate with the polymerase.Cleavage seems to be coupled with
termination of transcription and occurs at a consensus sequence. Mature pol II mRNAs are
polyadenylated at the 3′-end, resulting in a poly(A) tail; this process follows cleavage and is
also coordinated with termination.
Both polyadenylation and termination make use of the same consensus sequence,
and the interdependence of the processes was demonstrated in the late 1980s by work from
several groups. One group of scientists working with mouse globin genes showed that
introducing mutations into the consensus sequence AATAAA, known to be necessary for
poly(A) addition, inhibited both polyadenylation and transcription termination. They
measured the extent of termination by hybridizing transcripts with the different poly(A)
consensus sequence mutants with wild-type transcripts, and they were able to see a decrease
in the signal of hybridization, suggesting that proper termination was inhibited. They
therefore concluded that polyadenylation was necessary for termination (Logan et. al., 1987).
Another group obtained similar results using a monkey viral system, SV40 (simian virus 40).
They introduced mutations into a poly(A) site, which caused mRNAs to accumulate to levels
far above wild type (Connelly & Manley, 1988).
The exact relationship between cleavage and termination remains to be determined.

One model supposes that cleavage itself triggers termination; another proposes that
polymerase activity is affected when passing through the consensus sequence at the cleavage
site, perhaps through changes in associated transcriptional activation factors. Thus, research
in the area of prokaryotic and eukaryotic transcription is still focused on unraveling the
molecular details of this complex process, data that will allow us to better understand how
genes are transcribed and silenced.
Gene expression is linked to RNA transcription, which cannot happen without RNA
polymerase. However, this is where the similarities between prokaryote and eukaryote
expression end.
Every nucleated, diploid cell in the body contains the same DNA, or genome, yet

different cells appear committed to different specialized tasks—for example, kidney cells
Composed by:
absorb sodium, while pancreatic cells produce insulin. How is this possible? The answer lies
in differential use of the genome; in other words, different cells within the body express
different portions of their DNA. This process, which begins with the transcription of DNA
into RNA, ultimately leads to changes in cell function. Changes in transcription are thus a
fundamental means by which cell function is regulated across species. In fact, even single-
celled organisms, such as bacteria, regulate gene transcription depending on cues in their
environments. Therefore, understanding how transcription is regulated is fundamental to
deciphering the mysteries of the genome.
Central to the process of transcription is the complex of proteins known as the

RNA polymerases. RNA polymerases have been found in all species, but the number and
composition of these proteins vary across taxa. For instance, bacteria contain a single type of
RNA polymerase, while eukaryotes (multicellular organisms and yeasts) contain three
distinct types. In spite of these differences, there are striking similarities among
transcriptional mechanisms. For example, all species require a mechanism by which
transcription can be regulated in order to achieve spatial and temporal changes ingene
expression. In order to fully understand what this means, it is first necessary to examine the
mechanisms of RNA transcription in more detail.
Transcription: An Overview
In all species, transcription begins with the binding of the RNA
polymerase complex (or holoenzyme) to a special DNA sequence at the beginning of the
gene known as the promoter. Activation of the RNA polymerase complex enables
transcription initiation, and this is followed by elongation of the transcript. In turn, transcript
elongation leads to clearing of the promoter, and the transcription process can begin yet
again. Transcription can thus be regulated at two levels: the promoter level (cis regulation)
and the polymerase level (trans regulation). These elements differ among bacteria and
eukaryotes.
Transcription in Bacteria
In bacteria, all transcription is performed by a single type of RNA polymerase.
This polymerase contains four catalytic subunits and a single regulatory subunit known as
sigma (s). Interestingly, several distinct sigma factors have been identified, and each of these
oversees transcription of a unique set ofgenes. Sigma factors are thus discriminatory, as each
binds a distinct set of promoter sequences.
A striking example of the specialization of sigma factors for different gene

promoters is provided by bacterial sporulation in the species Bacillus subtilis. This bacterium
exists in two states: vegetative (growing) and sporulating. Genes involved in spore formation
are not normally expressed during vegetative growth. Remarkably, expression of a gene
encoding a novel sigma factor turns on the first genes for sporulation. Subsequent expression
of different sigma factors then turns on new sets of genes needed later in the sporulation
Composed by:
process (Losick&Stragier, 1992). Each of these sigma factors recognizes the promoters of the
genes in its group, not those "seen" by other sigma factors. This simple example illustrates
how transcription can be regulated in both cis and trans to cause changes in cell function.
Therefore, while bacteria accomplish transcription of all genes using a single kind of RNA
polymerase, the use of different sigma factor subunits provides an extra level of control.
Transcription in Eukaryotes
Figure 1
Eukaryotic cells are more complex than bacteria in many ways, including in terms
of transcription. Specifically, in eukaryotes, transcription is achieved by three different types
of RNA polymerase (RNA pol I-III). These polymerases differ in the number and type of
subunits they contain, as well as the class of RNAs they transcribe; that is, RNA pol I
transcribes ribosomal RNAs (rRNAs), RNA pol II transcribes RNAs that will become
messenger RNAs (mRNAs) and also small regulatory RNAs, and RNA pol III transcribes
small RNAs such as transfer RNAs (tRNAs).
Because RNA pol II transcribes protein-encoding genes, it has been of

particular importance to scientists who study the regulation of eukaryotic gene expression,
and its function is well understood. For example, researchers know that RNA pol II can bind
to a DNA sequence within the promoter of many genes, known as the TATA box, to initiate
transcription. Together with other common motifs (short recognition sequences in the DNA),
these elements constitute the core promoter. However, changes in RNA pol II affinity and,
therefore, gene expression can be influenced by surrounding DNA sequences (enhancers),
which in turn recruit transcription factors. While these properties of transcription regulation
are very important, they remain an area of active research.
Interestingly, RNA pol II is uniquely sensitive to amatoxins, such as a-

amanitin of the extremely toxic Amanita genus of mushrooms (Weiland, 1968), a fact that
researchers have been able to exploit for the purposes of polymerase studies - although
recreational mu shroom hunters should beware! Thus, while eukaryote ic transcription is far
more complex than bacterial transcription, the main difference between the two types of
transcription lies in RNA polymerase.
Composed by:
Translation: DNA to mRNA to Protein

How does the cell convert DNA into working proteins? The process of translation can be
seen as the decoding of instructions for making proteins, involving mRNA in transcription as
well as tRNA.
The genes in DNA encode protein molecules, which are the "workhorses" of

the cell, carrying out all the functions necessary for life. For example, enzymes, including
those that metabolize nutrients and synthesize new cellular constituents, as well as DNA
polymerases and other enzymes that make copies of DNA during cell division, are all
proteins.
In the simplest sense, expressing a gene means manufacturing its

corresponding protein, and this multilayered process has two major steps. In the first step, the
information in DNA is transferred to a messenger RNA (mRNA) molecule by way of a
process called transcription. During transcription, the DNA of a gene serves as a template
for complementary base-pairing, and an enzyme called RNA polymerase II catalyzes the
formation of a pre-mRNA molecule, which is then processed to form mature mRNA (Figure
1). The resulting mRNA is a single-stranded copy of the gene, which next must be translated
into a protein molecule.
Composed by:
Figure 1: A gene is expressed through the processes of transcription and translation.
During transcription, the enzyme RNA polymerase (green) uses DNA as a

template to produce a pre-mRNA transcript (pink). The pre-mRNA is processed to form a
mature mRNA molecule that can be translated to build the protein molecule (polypeptide)
encoded by the original gene.
Figure Detail
During translation, which is the second major step in gene expression, the

mRNA is "read" according to the genetic code, which relates the DNA sequence to the amino
acid sequence in proteins (Figur e 2). Each group of three bases in mRNA constitutes
a codon, and each codon specifies a particular amino acid (hence, it is a triplet code). The
mRNA sequence is thus used as a template to assemble—in order—the chain of amino acids
that form a protein.
Composed by:
Figure 2: The amino acids specified by each mRNA codon. Multiple codons can code for the
same amino acid.
The codons are written 5' to 3', as they appear in the mRNA. AUG is an initiation codon;
UAA, UAG, and UGA are termination (stop) codons.
Figure Detail
Where translation occurs?

But where does translation take place within a cell? What individual substeps are a
part of this process? And does translation differ between prokaryotes and eukaryotes? The
answers to questions such as these reveal a great deal about the essential similarities between
all species.
Within all cells, the translation machinery resides within a specialized organelle called
the ribosome. In eukaryotes, mature mRNA molecules must leave the nucleus and travel to
the cytoplasm, where the ribosomes are located. On the other hand, in prokaryotic organisms,
ribosomes can attach to mRNA while it is still being transcribed. In this situation, translation
begins at the 5' end of the mRNA while the 3' end is still attached to DNA.
In all types of cells, the ribosome is composed of two subunits: the large (50S)
subunit and the small (30S) subunit (S, for svedberg unit, is a measure of sedimentation
velocity and, therefore, mass). Each subunit exists separately in the cytoplasm, but the two
join together on the mRNA molecule. The ribosomal subunits contain proteins and
Composed by:
specialized RNA molecules—specifically, ribosomal RNA (rRNA) and transfer RNA
(tRNA). The tRNA molecules are adaptor molecules—they have one end that can read the
triplet code in the mRNA through complementary base-pairing, and another end that attaches
to a specific amino acid (Chapeville et al., 1962; Grunberger et al., 1969). The idea that
tRNA was an adaptor molecule was first proposed by Francis Crick, co-discoverer of DNA
structure, who did much of the key work in deciphering the genetic code (Crick, 1958).
Within the ribosome, the mRNA and aminoacyl-tRNA complexes are held
together closely, which facilitates base-pairing. The rRNAcatalyzes the attachment of each
new amino acid to the growing chain.
The Beginning of mRNA Is Not Translated

Interestingly, not all regions of an mRNA molecule correspond to particular
amino acids. In particular, t here is an area near the 5' end of the molecule that is known as
the untranslated region (UTR) or leader sequence. This portion of mRNA is located between
the first nucleotide that is transcribed and the start codon (AUG) of the coding region, and it
does not affect the sequence of amino acids in a protein (Figure 3).
So, what is the purpose of the UTR? It turns out that the leader sequence is
important because it contains a ribosome-binding site. In bacteria, this site is known as the
Shine-Dalgarno box (AGGAGG), after scientists John Shine and Lynn Dalgarno, who first
characterized it. A similar site in vertebrates was characterized by Marilyn Kozak and is thus
known as the Kozak box. In bacterial mRNA, the 5' UTR is normally short; in human
mRNA, the median length of the 5' UTR is about 170 nucleotides. If the leader is long, it may
contain regulatory sequences, including binding sites for proteins, that can affect
the stability of the mRNA or the efficiency of its translation.
Figure 3: A DNA transcription unit.
A DNA transcription unit is composed, from its 3' to 5' end, of an RNA-
coding region (pink rectangle) flanked by a promoter region (green rectangle) and a
terminator region (black rectangle). Regions to the left, or moving towards the 3' end, of the
Composed by:
transcription start site are considered \"upstream;\" regions to the right, or moving towards
the 5' end, of the transcription start site are considered \"downstream.\
Translation Begins After the Assembly of a Complex

Structure
The translation of mRNA begins with the formation of a complex on the mRNA
(Figure 4). First, three initiation factor proteins (known as IF1, IF2, and IF3) bind to the small
subunit of the ribosome. This preinitiation complex and a methionine-carrying tRNA then
bind to the mRNA, near the AUG start codon, forming the initiation complex.
Composed by:
Composed by:
Figure 4: The translation initiation complex.
When translation begins, the small subunit of the ribosome and an initiator
tRNA molecule assemble on the mRNA transcript. The small subunit of the ribosome has
three binding sites: an amino acid site (A), a polypeptide site (P), and an exit site (E). The
initiator tRNA molecule carrying the amino acid methionine binds to the AUG start codon of
the mRNA transcript at the ribosome’s P site where it will become the first amino acid
incorporated into the growing polypeptide chain. Here, the initiator tRNA molecule is shown
binding after the small ribosomal subunit has assembled on the mRNA; the order in which
this occurs is unique to prokaryotic cells. In eukaryotes, the free initiator tRNA first binds the
small ribosomal subunit to form a complex. The complex then binds the mRNA transcript, so
that the tRNA and the small ribosomal subunit bind the mRNA simultaneously.
Figure Detail
Although methionine (Met) is the first amino acid incorporated into any new
protein, it is not always the first amino acid in mature proteins—in many proteins, methionine
is removed after translation. In fact, if a large number of proteins are sequenced and
compared with their known gene sequences, methionine (or formylmethionine) occurs at
the N-terminus of all of them. However, not all amino acids are equally likely to occur
second in the chain, and the second amino acid influences whether the initial methionine is
enzymatically removed. For example, many proteins begin with methionine followed by
alanine. In both prokaryotes and eukaryotes, these proteins have the methionine removed, so
that alanine becomes the N-terminal amino acid (Table 1). However, if the second amino acid
is lysine, which is also frequently the case, methionine is not removed (at least in the sample
proteins that have been studied thus far). These proteins therefore begin with methionine
followed by lysine (Flinta et al., 1986).
Table 1 shows the N-terminal sequences of proteins in prokaryotes and

eukaryotes, based on a sample of 170 prokaryotic and 120 eukaryotic proteins (Flinta et al.,
1986). In the table, M represents methionine, A represents alanine, K represents lysine, S
represents serine, and T represents threonine.
Table 1: N-Terminal Sequences of Proteins
N-Terminal Percent of Prokaryotic Percent of Eukaryotic

Sequence Proteins with This Proteins with This
Sequence Sequence
MA* 28.24% 19.17%
MK** 10.59% 2.50%
MS* 9.41% 11.67%
Composed by:
MT* 7.65% 6.67%
* Methionine was removed in all of these proteins
** Methionine was not removed from any of these proteins
Once the initiation complex is formed on the mRNA, the large

ribosomal subunit binds to this complex, which causes the release of IFs (initiation factors).
The large subunit of the ribosome has three sites at which tRNA molecules can bind. The A
(amino acid) site is the location at which the aminoacyl-tRNAanticodon base pairs up with
the mRNA codon, ensuring that correct amino acid is added to the
growing polypeptide chain. The P (polypeptide) site is the location at which the amino acid is
transferred from its tRNA to the growing polypeptide chain. Finally, the E (exit) site is the
location at which the "empty" tRNA sits before being released back into the cytoplasm to
bind another amino acid and repeat the process. The initiator methionine tRNA is the only
aminoacyl-tRNA that can bind in the P site of the ribosome, and the A site is aligned with the
second mRNA codon. The ribosome is thus ready to bind the second aminoacyl-tRNA at the
A site, which will be joined to the initiator methionine by the first peptide bond (Figure 5).
Figure 5: The large ribosomal subunit binds to the small ribosomal subunit to complete the
initiation complex.
The initiator tRNA molecule, carrying the methionine amino acid that will
serve as the first amino acid of the polypeptide chain, is bound to the P site on the ribosome.
The A site is aligned with the next codon, which will be bound by the anticodon of the next
incoming tRNA.
© 2013 Nature Education All rights reserved.
The Elongation Phase
Composed by:
Figure 6
Figure Detail
The next phase in translation is known as the elongation phase (Figure 6).
First, the ribosome moves along the mRNA in the 5'-to-3'direction, which requires the
elongation factor G, in a process called translocation. The tRNA that corresponds to the
second codon can then bind to the A site, a step that requires elongation factors (in E. coli,
these are called EF-Tu and EF-Ts), as well as guanosine triphosphate (GTP) as an energy
source for the process. Upon binding of the tRNA-amino acid complex in the A site, GTP is
cleaved to form guanosine diphosphate (GDP), then released along with EF-Tu to be recycled
by EF-Ts for the next round.
Next, peptide bonds between the now-adjacent first and second amino acids

are formed through a peptidyl transferaseactivity. For many years, it was thought that an
enzyme catalyzed this step, but recent evidence indicates that the transferase activity is a
catalytic function of rRNA (Pierce, 2000). After the peptide bond is formed, the ribosome
shifts, or translocates, again, thus causing the tRNA to occupy the E site. The tRNA is then
released to the cytoplasm to pick up another amino acid. In addition, the A site is now empty
and ready to receive the tRNA for the next codon.
This process is repeated until all the codons in the mRNA have been read
by tRNA molecules, and the amino acids attached to the tRNAs have been linked together in
the growing polypeptide chain in the appropriate order. At this point, translation must be
terminated, and the nascent protein must be released from the mRNA and ribosome.
Composed by:
Termination of Translation
There are three termination codons that are employed at the end of a protein-
coding sequence in mRNA: UAA, UAG, and UGA. No tRNAs recognize these codons. Thus,
in the place of these tRNAs, one of several proteins, called release factors, binds and
facilitates release of the mRNA from the ribosome and subsequent dissociation of the
ribosome.
Comparing Eukaryotic and Prokaryotic Translation

The translation process is very similar in prokaryotes and eukaryotes.
Although different elongation, initiation, and termination factors are used, the genetic code is
generally identical. As previously noted, in bacteria, transcription and translation take place
simultaneously, and mRNAs are relatively short-lived. In eukaryotes, however, mRNAs have
highly variable half-lives, are subject to modifications, and must exit the nucleus to be
translated; these multiple steps offer additional opportunities to regulate levels of protein
production, and thereby fine-tune gene expression.
An Evolutionary Perspective on Amino Acids
Amino acids and proteins

Amino acids are one of the first organic molecules to appear on Earth. What are
they made of and how have they evolved?
Amino acids play a central role in cellular metabolism, and organisms need

to synthesize most of them (Figure 1). Many of us become familiar with amino acids when
we first learn about translation, the synthesis of protein from the nucleic acid code in mRNA.
To date, scientists have discovered more than five hundred amino acids in nature, but only
twenty-two participate in translation. In 1943, Gordon, Martin, and Synge used partition
chromatography to separate and study constituents of proteins (Gordon, Martin, & Synge
1943), a major breakthrough that contributed to the rapid identification of the twenty amino
acids used in proteins by all living organisms. After this initial burst of discovery, two
additional amino acids, which are not used by all organisms, were added to the list:
selenocysteine (Bock 2000) and pyrrolysine (Srinivasan et al. 2002).
Composed by:
Aside from their role in composing proteins, amino acids have many
biologically important functions. They are also energy metabolites, and many of them are
essential nutrients. Amino acids can often function as chemical messengers in communication
between cells. For example, ArvidCarlsson discovered in 1957 that the amine 3-
hydroxytyramine (dopamine) was not only a precursor for the synthesis of adrenaline from
tyrosine, but is also a key neurotransmitter. Certain amino acids — such as citrulline and
ornithine, which are intermediates in urea biosynthesis — are important intermediaries in
various pathways involving nitrogenous metabolism. Although other amino acids are
important in several pathways, S-adenosylmethionine acts as a universal methylating agent.
What follows is a discussion of amino acids, their biosynthesis, and the evolution of their
synthesis pathways, with a focus on tryptophan and lysine.
Figure 1: Major events in the evolution of amino acid synthesis
The way amino acids are synthesized has changed during the history of
Earth. The Hadean eon represents the time from which Earth first formed. The subsequent
Archean eon (approximately 3,500 million years ago) is known as the age of bacteria and
archaea. The Proterozoic eon was the gathering up of oxygen in Earth's atmosphere, and the
Phanerozoic eon coincides with the major diversification of animals, plants, and fungi.
Figure Detail
In 1953, Miller and Urey attempted to re-create the conditions of primordial

Earth. In a flask, they combined ammonia, hydrogen, methane, and water vapor plus
electrical sparks (Miller 1953). They found that new molecules were formed, and they
identified these molecules as eleven standard amino acids. From this observation, they
posited that the first organisms likely arose in anenvironment similar to the one they
constructed in their flask, one rich in organic compounds, now widely described as the
primordial soup. This hypothesis is further extended to the claim that, within this soup,
single-celled organisms evolved, and as the number of organisms increased, the organic
compounds were depleted. Necessarily, in this competitive environment, those organisms that
were able to biosynthesize their own nutrients from elements had a great advantage over
those that could not. Today, the vast majority of organic compounds derive from biological
organisms that break down and replenish the resources for sustaining other organisms. And,
Composed by:
rather than emerging from an electrified primordial soup, amino acids emerge from
biosynthetic enzymaticreactions.
What Is an Amino Acid Made Of?

As implied by the root of the word (amine), the key atom in amino
acid composition is nitrogen. The ultimate source of nitrogen for the biosynthesis of amino
acids is atmospheric nitrogen (N2), a nearly inert gas. However, to be metabolically useful,
atmospheric nitrogen must be reduced. This process, known as nitrogen fixation, occurs only
in certain types of bacteria. Even though nitrogen is one of the most prominent chemical
elements in living systems, N2 is almost unreactive (and very stable) because of its triple
bond (N≡N). This bond is extremely difficult to break because the three chemical bonds need
to be separated and bonded to different compounds. Nitrogenase is the only family of
enzymes capable of breaking this bond (i.e., it carries out nitrogen fixation). These proteins
use a collection of metal ions as the electron carriers that are responsible for the reduction of
N2 to NH3. All organisms can then use this reduced nitrogen (NH 3) to make amino acids. In
humans, reduced nitrogen enters the physiological system in dietary sources containing
amino acids. All organisms contain the enzymes glutamate dehydrogenase and glutamine
synthetase, which convert ammonia to glutamate and glutamine, respectively. Amino and
amide groups from these two compounds can then be transferred to other carbon backbones
bytransamination and transamidation reactions to make amino acids. Interestingly, glutamine
is the universal donor of amine groups for the formation of many other amino acids as well as
many biosynthetic products. Glutamine is also a key metabolite for ammonia storage. All
amino acids, with the exception of proline, have a primary amino group (NH 2) and a
carboxylic acid (COOH) group. They are distinguished from one another primarily by ,
appendages to the central carbon atom.
Amino Acid Precursors and Biosynthesis Pathways

In the study of metabolism, a series of biochemical reactions for
compound synthesis or degradation is called a pathway. Amino acid synthesis can occur in a
variety of ways. For example, amino acids can be synthesized from precursor molecules by
simple steps. Alanine, aspartate, and glutamate are synthesized from keto acids called
pyruvate, oxaloacetate, and alpha-ketoglutarate, respectively, after a transamination reaction
Composed by:
step. Similarly, asparagine and glutamine are synthesized from aspartate and glutamate,
respectively, by an amidation reaction step. The synthesis of other amino acids requires more
steps; between one and thirteen biochemical reactions are necessary to produce the different
amino acids from their precursors of the central metabolism (Figure 2). The relative uses of
amino acid biosynthetic pathways vary widely among species because different synthesis
pathways have evolved to fulfill unique metabolic needs in different organisms. Although
some pathways are present in certain organisms, they are absent in others. Therefore,
experimental results about amino acid metabolism that are achieved with model organisms
may not always have relevance for the majority of other organisms.
What Makes an Amino Acid Essential?

Not all the organisms are capable of synthesizing all the amino acids, and
many are synthesized by pathways that are present only in certain plants and bacteria.
Mammals, for example, must obtain eight of twenty amino acids from their diets. This
requirement leads to a convention that divides amino acids into two categories: essential and
nonessential (given a certain metabolism). Because of particular structural features, essential
amino acids cannot be synthesized by mammalian enzymes (Reeds 2000). Nonessential
amino acids, therefore, can be synthesized by nearly all organisms. The loss of the ability to
synthesize essential amino acids likely emerged very early in evolution, because this
dependence on other organisms for the source of amino acids is common among all
eukaryotes, not just those of mammals.
How do certain amino acids become essential for a given organism? Studies
in ecology and evolution give some clues. Organisms evolve under environmental
constraints, which are dynamic over time. If an amino acid is available for uptake,
the selective pressure to keep intact the genesresponsible for that pathway might be lowered,
because they would not be constantly expressing these biosynthetic genes. Without the
selective pressure, the biosynthetic routes might be lost or the gene could allow mutations
that would lead to a diversification of the enzyme's function. Following this logic, amino
acids that are essential for certain organisms might not be essential for other organisms
subjected to different selection pressures. For example, in 2000, Ishikawa and colleagues
completed the genome sequence of the endosymbiont bacteria Buchnera, and in it they found
Composed by:
the genes for the biosynthetic pathways necessary for the synthesizing essential amino acids
for its symbiotic host, the aphid. Interestingly, those genes for the synthesis of its
"nonessential" amino acids are almost completely missing (Shigenobu et al. 2000). In this
way, Buchnera provides the host with some amino acids and obtains the other amino acids
from the host (Baumann 2005; Pal et al. 2006).
Tryptophan Synthesis: Only Created Once

Free-living bacteria synthesize tryptophan (Trp), which is an essential
amino acid for mammals, some plants, and lower eukaryotes. The Trp synthesis pathway
appears to be highly conserved, and the enzymes needed to synthesize tryptophan are widely
distributed across the three domains of life. This pathway is one of three that compose
aromatic amino acids from chorismate (Figure 2, red pathway). (The other amino acids are
phenylalanine and tyrosine.) Trp biosynthetic enzymes are widely distributed across the three
domains of life (Xie et al. 2003). The genes that code for the enzymes in this pathway likely
evolved once, and they did so more recently than those for other amino acid synthesis
pathways. Researchers made this contention because all organisms containing this Trp
synthesis pathway use homologous enzymes (Merino, Jensen, &Yanofsky 2008). As another
point of distinction, the Trp pathway is the most biochemically expensive of the amino acid
pathways, and for this reason it is expected to be tightly regulated.
Lysine Synthesis: Created Multiple Times

To date, scientists have discovered six different biosynthetic pathways
in different organisms that synthesize lysine. These pathways can be grouped into the
diaminopimelic acid (DAP) and aminoadipic acid (AAA) pathways (Figure 2, dark blue). The
DAP pathway synthesizes lysine (Lys) from aspartate and pyruvate. Most bacteria,
some archaea, fungi, algae, and plants use the DAP pathways. On the other hand, the AAA
pathways synthesize Lys from alpha-ketoglutarate and acetyl coenzyme A. Most fungi, some
algae, and some archaea use this route. Why do we observe this diversity, and why does it
occur particularly for Lys synthesis? Interestingly, the DAP pathways retain duplicated genes
from the biosynthesis of arginine, whereas the AAA pathways retain duplicated genes from
leucine biosynthesis (Figure 2), indicating that each of the pathways experienced at least
one duplicationevent during evolution (Hernandez-Montes et al. 2008; Velasco et al. 2002).
Composed by:
Fani and coworkers performed a comparative analysis of the synthesis enzyme sequences and
their phylogenetic distribution that suggested that the synthesis of leucine, lysine, and
arginine were initially carried out with the same set of versatile enzymes. Over the course of
time came a series of gene duplication events and enzyme specializations that gave rise to the
unambiguous pathways we know today. Which of the pathways appeared earlier is still a
source of query and debate.
To support this hypothesis, there is evidence from a fascinating

archaea, Pyrococcushorikoshii. This organism can synthesize leucine, lysine, and arginine,
yet its genome contains only genes for one pathway. Such a gap indicates that P.
horikoshii has a mechanism similar to the ancestral one: versatile enzymes. Biochemical
experiments are needed to further support the idea that these enzymes can use multiple
substrates and to rule out the possibility that amino acid synthesis in this organism does not
arise from enzymes yet unidentified.
Synthesis on the tRNA molecule

Selenocysteine (SeC) (Bock 2000) is a genetically encoded amino acid not
present in all organisms. Scientists have identified SeC in several archaeal, bacterial,
and eukaryotic species (even mammals). When present, SeC is usually confined to active
sites of proteins involved in reduction-oxidation (redox) reactions. It is highly reactive and
has catalytic advantages over cysteine, but this high reactivity is undermined by its potential
to cause cell damage if free in the cytoplasm. Hence, it is too dangerous, and no pool of free
SeC is available. How, then, is this amino acid synthesized for use in protein synthesis? The
answer demonstrates the versatility of synthesis strategies deployed by organisms forced to
cope with singularities. The synthesis of SeC is carried out directly on the tRNA substrate
before being used in protein synthesis. First, SeC-specific tRNA (tRNA sec) is charged with
serine via seril-tRNAsynthetase, which acts in a somehow promiscuous fashion, serilating
either tRNAser or tRNAsec. Then, another enzyme modifies Ser to SeC by substituting the OH
radical with SeH, using selenophosphate as the selenium donor (Figure 2, pink pathway).
This synthesis is a form of a trick to avoid the existence of a free pool of SeC while still
maintaining a source of SeC-tRNAsec needed for protein synthesis. Strictly speaking, this
mechanism is not an actual synthesis of amino acids, but rather a synthesis of
aminoacetylated-tRNAs. However, this technique involving tRNA directly is not exclusive to
Composed by:
SeC, and similar mechanisms dependent on tRNA have been described for asparagine,
glutamine, and cysteine. Owing to its appearance of SeC across all three domains of life,
scientists wonder if it is an ancestral mechanism for amino acid biosynthesis or simply a
coincidence of selection pressures.
Summary
Scientists now recognize twenty-two amino acids as the building blocks of proteins: the
twenty common ones and two more, selenocysteine and pyrrolysine. Amino acids have
several functions. Their primary function is to act as the monomer unit in protein synthesis.
They can also be used as substrates for biosynthetic reactions; the nucleotide bases and a
number of hormones and neurotransmitters are derived from amino acids. Amino acids can
be synthesized from glycolytic or Krebs cycle intermediates. The essential amino acids, those
that are needed in the diet, require more steps to be synthesized. Some amino acids need to be
synthesized when charged onto their corresponding tRNAs. We have discussed only two
biosynthetic routes: the Trp pathway, which appears to have evolved only once, and the Lys
pathway, which seems to have evolved independently in different lineages. Prevailing
evidence suggests that metabolic pathways themselves seem to be evolving following the
patchwork assembly model, which proposes that pathways originated through
the recruitment of generalist enzymes that could react with a wide range of substrates. The
study of the evolution of amino acid metabolism has helped us understand the evolution of
metabolism in general.
Nucleic Acids to Amino Acids: DNA Specifies Protein

Hidden within the genetic code lies the "triplet code," a series of three
nucleotides that determine a single amino acid. How did scientists discover and unlock this
amino acid code?
Once it was determined that messenger RNA (mRNA) serves as a copy of

chromosomal DNA and specifies the sequence of amino acids in proteins, the question of
how this process is actually carried out naturally followed. It had long been known that only
20 amino acids occur in naturally derived proteins. It was also known that there are only four
Composed by:
nucleotides in mRNA: adenine (A), uracil (U), guanine (G), and cytosine (C). Thus, 20 amino
acids are coded by only four unique bases in mRNA, but just how is this coding achieved?
The Codon
The discordance between the number of nucleic acid bases and the number of
amino acids immediately eliminates the possibility of a code of one base per amino acid. In
fact, even two nucleotides per amino acid (a doublet code) could not account for 20 amino
acids (with four bases and a doublet code, there would only be 16 possible combinations [42 =
16]). Thus, the smallest combination of four bases that could encode all 20 amino acids
would be atriplet code. However, a triplet code produces 64 (4 3 = 64) possible combinations,
or codons. Thus, a triplet code introduces the problem of there being more than three times
the number of codons than amino acids. Either these "extra" codons produce redundancy,
with multiple codons encoding the same amino acid, or there must instead be numerous dead-
end codons that are not linked to any amino acid.
Preliminary evidence indicating that the genetic code was indeed a triplet code

came from an experiment by Francis Crick and Sydney Brenner (1961). This experiment
examined the effect of frameshift mutations on protein synthesis. Frameshift mutations are
much more disruptive to the genetic code than simple base substitutions, because they
involve a base insertion or deletion, thus changing the number of bases and their positions in
a gene. For example, the mutagen proflavine causes frameshift mutations by inserting itself
between DNA bases. The presence of proflavine in a DNA molecule thus interferes with the
molecule's replication such that the resultant DNA copy has a base inserted or deleted.
Crick and Brenner showed that proflavine-mutated bacteriophages (viruses

that infect bacteria) with single-base insertion or deletion mutations did not produce
functional copies of the protein encoded by the mutated gene. The production of defective
proteins under these circumstances can be attributed to
misdirected translation. Mutant proteins with two- or four-nucleotide insertions or deletions
were also nonfunctional. However, some mutant strains became functional again when they
accumulated a total of three extra nucleotide s or when they were missing three nucleotides.
This rescue effect provided compelling evidence that the genetic code for one amino acid is
indeed a three-base, or triplet, code.
Composed by:
Decoding the Genetic Code
Figure 1
Figure Detail
Once the budding molecular biology community was convinced about the
triplet code, the race to decode which triplets specified which amino acids began. The
simplest way to decipher the code would be to start with an mRNA molecule of known
sequence, use it to direct the synthesis of a protein, and then determine the amino acid
sequence of the synthesized protein. Then, comparison of the original mRNA sequence with
the amino acid sequence of the synthesized protein could provide a means for directly
decoding the genetic code (Figure 1).
However, at the time when this decoding project was conducted, researchers
did not yet have the benefit of modern sequencing techniques. To circumvent this challenge,
Marshall W. Nirenberg and Heinrich J. Matthaei (1962) made their own simple, artificial
mRNA and identified the polypeptide product that was encoded by it. To do this, they used
theenzyme polynucleotide phosphorylase, which randomly joins together any RNA
nucleotides that it finds. Nirenberg and Matthaei began with the simplest codes possible.
Specifically, they added polynucleotide phosphorylase to a solution of pure uracil (U), such
that the enzyme would generate RNA molecules consisting entirely of a sequence of U's;
these molecules were known as poly(U) RNAs. Each poly(U) RNA thus contained a pure
series of UUU codons, assuming a triplet code. These poly(U) RNAs were added to 20 tubes
containing components for protein synthesis (ribosomes, activating enzymes, tRNAs, and
other factors). Each tube contained one of the 20 amino acids, which were radioactively
labeled. Of the 20 tubes, 19 failed to yield a radioactive polypeptide product. Only one tube,
the one that had been loaded with the labeled amino acid phenylalanine, yielded a product.
Nirenberg and Matthaei had therefore found that the UUU codon could be translated into the
amino acid phenylalanine. Similar experiments using poly(C) and poly(A) RNAs showed that
proline was encoded by the CCC codon, and lysine by the AAA codon.
Composed by:
Figure 2
Figure Detail
In further experiments to decode the other codons, Nirenberg and his colleagues
made artificial RNAs containing defined proportions of two or three different bases. As
previously mentioned, polynucleotide phosphorylase joins nucleotides randomly; as a result,
these artificial RNAs contained random mixtures of the bases in proportion to the amounts of
bases mixed. Hence, the resulting products provided clues that the researchers could use to
deduce potential codon–amino acid relationships.
For example, when A and C were mixed with polynucleotide phosphorylase,

the resulting RNA molecules contained eight different triplet codons: AAA, AAC, ACC,
ACA, CAA, CCA, CAC, and CCC. These eight random poly(AC) RNAs produced proteins
containing only six amino acids: asparagine, glutamine, histidine, lysine, proline, and
threonine. Remember that previous experiments had already revealed that CCC and AAA
code for proline and lysine, respectively. Thus, the four newly incorporated amino acids
could only be encoded by AAC, ACC, ACA, CAA, CCA, and/or CAC. With the random
sequence approach, the decoding endeavor was almost completed, but some work remained
to be done.
Thus, in 1965, H. Gobind Khorana and his colleagues used another method to
further crack the genetic code. These researchers had the insight to employ chemically
synthesized RNA molecules of known repeating sequences rather than random sequences.
For example, an artificial mRNA of alternating guanine and uracil nucleotides
(GUGUGUGUGUGU) should be read in translation as two alternating codons, GUG and
UGU, thus encoding a protein of two alternating amino acids. Translation of the artificial
GUGU mRNA yielded a protein of alternating cysteine and valine residues. However, this
technique alone could not determine whether GUG or UGU encoded cysteine, for example.
Next, Nirenberg and Philip Leder developed a technique using ribosome-

bound transfer RNAs (tRNAs). They showed that a short mRNA sequence—even a
Composed by:
single codon (three bases)—could still bind to a ribosome, even if this short sequence was
incapable of directing protein synthesis. The ribosome-bound codon could then base pair with
a particular tRNA that carried the amino acid specified by the codon (Figure 2).
Nirenberg and Leder thus synthesized many short mRNAs with known codons.
They then added the mRNAs one by one to a mix of ribosomes and aminoacyl-tRNAs with
one amino acid radioactively labeled. For each, they determined whether the aminoacyl-
tRNA was bound to the short mRNA-like sequence and ribosome (the rest passed through the
filter), providing conclusive demonstrations of the particular aminoacyl-tRNA that bound to
each mRNA codon.
Degeneracy of the Amino Acid Code

Examination of the full table of codons enables one to immediately determine whether
the "extra" codons are associated with redundancy or dead-end codes (Figure 3). Note that
both possibilities occur in the code. There are only a few instances in which one codon codes
for one amino acid, such as the codon for tryptophan. Note also that the codon for the amino
acid methionine (AUG) acts as the start signal for protein synthesis in an mRNA. Moreover,
the genetic code also includes stop codons, which do not code for any amino acid. The stop
codons serve as termination signals for translation. When a ribosome reaches a stop codon,
translation stops, and the polypeptide is released.
Composed by:
Figure 3: The amino acids specified by each mRNA codon. Multiple codons can code for
the same amino acid.
The codons are written 5' to 3', as they appear in the mRNA. AUG is an initiation codon;
UAA, UAG, and UGA are termination (stop) codons.
© 2014 Nature Education All rights reserved.
Figure Detail
Composed by:

Molecular Biology Notes

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Molecular Biology Notes

Uploaded by

Copyright:

Available Formats

Molecular Biology.

BOT-601 Recommended by:

DNA Is a Structure That Encodes Biological

What components make up DNA?

Figure 3: All polynucleotides contain an alternating sugar-phosphate backbone. This

How is the DNA strand organized?

Figure 4: Double-stranded DNA consists of two polynucleotide chains whose nitrogenous

Beyond the ladder-like structure described above, another key characteristic of

Figure 6: The double helix looks like a twisted ladder.

How is DNA packaged inside cells?

In addition to genetic insults caused by the environment, the very process of

DNA repair processes exist in both prokaryotic and eukaryotic organisms, and

UV Damage, Nucleotide Excision Repair, and Photoreactivation

As previously mentioned, one important DNA damage response (DDR) is

UV radiation causes two classes of DNA lesions: cyclobutane pyrimidine dimers

DNA Replication and Causes of Mutation

Errors Are a Natural Part of DNA Replication

Figure 2: Wobble in mismatched nucleotide base pairs.

When Replication Errors Become Mutations

Figure 3: Strand slippage during DNA replication.

What is a Gene? Colinearity and Transcription Units

Figure 1: The colinearity of nucleotide and amino acid sequences.

The promoter, a DNA sequence that lies upstream of the RNA coding region,

Figure 5: The promoter during transcription initiation.

Delineating Gene Regions

Major Molecular Events of DNA Replication

Figure 1: Facilitation of DNA unwinding.

Figure 2: New DNA is synthesized from deoxyribonucleoside triphosphates (dNTPs).

The Discovery of DNA Polymerase

The process of transcription can be visualized by electron microscopy (Figure 1); in

The Transcription Process

Three different types of RNA polymerase exist in eukaryotic cells,

Figure 4: Eukaryotic core promoter region.

In eukaryotes, genes transcribed into RNA transcripts by the enzyme RNA

Figure 5: Rho-independent termination in bacteria.

Terminator sequences are found close to the ends of noncoding sequences (Figure

In eukaryotes, termination of transcription occurs by different processes, depending

Transcription of pol II genes can continue for hundreds or even thousands of

The exact relationship between cleavage and termination remains to be determined.

Every nucleated, diploid cell in the body contains the same DNA, or genome, yet

Central to the process of transcription is the complex of proteins known as the

A striking example of the specialization of sigma factors for different gene

Because RNA pol II transcribes protein-encoding genes, it has been of

Interestingly, RNA pol II is uniquely sensitive to amatoxins, such as a-

Translation: DNA to mRNA to Protein

The genes in DNA encode protein molecules, which are the "workhorses" of

In the simplest sense, expressing a gene means manufacturing its

During transcription, the enzyme RNA polymerase (green) uses DNA as a

During translation, which is the second major step in gene expression, the

Where translation occurs?

The Beginning of mRNA Is Not Translated

Figure 3: A DNA transcription unit.

Translation Begins After the Assembly of a Complex

Table 1 shows the N-terminal sequences of proteins in prokaryotes and

Table 1: N-Terminal Sequences of Proteins

N-Terminal Percent of Prokaryotic Percent of Eukaryotic

MA* 28.24% 19.17%

MK** 10.59% 2.50%

MS* 9.41% 11.67%

MT* 7.65% 6.67%

* Methionine was removed in all of these proteins

** Methionine was not removed from any of these proteins

Once the initiation complex is formed on the mRNA, the large