Professional Documents
Culture Documents
Topics
• Eukaryotic Gene Structure
• Chromosomal Organization of Genes and Noncoding DNA
• Transposable (Mobile) DNA Elements
Goals
• Learn how genes
encoded by complex
transcription units are
expressed.
• Learn the origin,
types, and functions
of DNA in higher
organisms.
• Learn the properties
of transposons and
their roles in gene
evolution. RxFISH-painted human chromosomes.
1. Genome structure – Double
helix
1. Genome structure - Chromatin
DNA is packed into chromosomes in a
hierarchical way:
Conserved: About 5%
Biologically functional: ? (>5%)
1. Genome structure - Genes
Darwin (1809-1882) used the term “gemmule” to denote a microscopic
unit of inheritance. Major problem in his day: why do traits not
“blend out” by mixing.
(25-30%)
The Human ß-globin Gene Family
The ß-globin gene cluster on chromosome 11 is shown in Fig. 6.4a.
The ß-globin genes are expressed in different stages of life. , A,
and G are expressed during different trimesters of fetal
development (next slide). ß expression begins around birth &
continues throughout adult life. Fetal hemoglobin molecules made
with the and G or A polypeptides have a higher affinity for
O2 than maternal hemoglobin, facilitating O2 transfer to the fetus.
Epidermal growth
factor (EGF) domain
Gene Density in Genomic DNA
Higher eukaryotes contain far more noncoding DNA between
genes than bacteria and simple eukaryotes (Fig. 6.4). The region
of human genomic DNA containing the ß-globin gene cluster
shown in the figure actually is a relatively "gene-rich" region of
human DNA. Some regions known as gene-poor "deserts" also
occur. Higher eukaryotes also contain a larger amount of intron
DNA. Although one-third of human DNA is transcribed into pre-
mRNA, 95% ends up being degraded after RNA splicing
reactions. On average, the typical exon is 50-200 bp in length,
while the median length of introns is 3.3 kb in human genes.
Human Genomic DNA: Tandemly
Repeated Genes
Tandemly repeated genes also are derived by gene duplication.
Unlike gene families, the sequences of these duplicated genes
are identical or strongly conserved. In addition, they commonly
are arranged in a head-to-tail fashion in tandem arrays over a
long stretch of DNA. rRNAs and snRNAs (used in splicing
reactions, Chap. 8) are representative of this group (Table
6.1). Multiple copies of these genes are needed due to the
requirement for vast amounts of these RNAs in the cell. tRNA
and histone genes are included in this category, but these
genes typically occur in clusters and not true tandem arrays.
Nonprotein-coding Genes in Human
Genomic DNA
Thousands of genes in the human genome encode functional RNAs (Table
6.2). The functions of several of these are covered in later chapters.
Repetitious DNA
Two main categories of repetitious DNA--simple-sequence DNA
and interspersed repeats--occur in eukaryotic genomes (Table
6.1). Interspersed repeats are more common and are derived
largely from transposons. Simple-sequence DNA is less prevalent,
accounting for ~ 6% of human genomic DNA. Simple-sequence DNA
is also known as satellite DNA, due to its formation of satellite
bands during cesium chloride density gradient ultracentrifugation.
The function of this DNA is mostly obscure. It is commonly found
at the centromere and telomere regions of chromosomes.
(25-30%)
Properties of Satellite DNA
Satellite DNA is classified into 3 types
based on length. True satellite DNA
consists of 14-500 bp sequence units
that tandemly repeat over 20-100 kb
lengths of genomic DNA. Minisatellite
DNA consists of 15-100 bp sequence
units that tandemly repeat over 1-5 kb
stretches of DNA. Microsatellite DNA
consists of 1-13 bp units that can
repeat up to 150 times. Microsatellite
DNA is thought to originate from
“backward slippage” of a growing
daughter strand on its template strand
during DNA replication (Fig. 6.5).The
sequences of repeat units are highly
conserved which suggests they perform
important functions. Each category of
satellite DNA contains a number of
different repeat sequences. Simple-
sequence DNAs can serve as DNA
markers due to variations in repeat
number. Satellite DNAs are exploited in
FISH (fluorescence in situ hybridization)
chromosome staining (Fig. 6.6).
DNA Fingerprinting
DNA fingerprinting is a method for
identifying individuals based on their
minisatellite DNA (Fig. 6.7). It was
developed in the mid-80s and is
widely used in forensics, paternity
analysis, and for research purposes.
In the method, minisatellite DNA
from a genomic DNA specimen is
amplified by PCR using primers that
bind to unique sequences flanking
minisatellite repeat units. Bands
corresponding to each minisatellite
locus then are separated on gels.
Although satellite DNA is highly
conserved in sequence, the number
of tandem copies at each loci is
highly variable between individuals.
This results from unequal crossing
over during formation of gametes in
meiosis. Due to the variation in the
number of repeats at each locus,
different individuals can be readily
distinguished based on banding
patterns.
Chap. 6 Problem 3
(25-30%)
Mobile DNA Elements
Mobile DNA elements are
grouped into two classes,
DNA transposons and
retrotransposons (Fig. 6.8).
DNA transposons move
directly as DNA via a "cut-
and-paste" mechanism.
Retrotransposons move via an
RNA intermediate and a
"copy-and-paste" mechanism,
wherein the original copy of
the transposon is preserved.
Retroviruses, like HIV,
formally are a subclass of
retrotransposons that can
move between cells because
they encode viral coat
proteins. DNA transposons
predominate in bacteria;
retrotransposons are more
prevalent in eukaryotes.
Genome structure – Transposable elements
TEs are “selfish genes” which when activated can insert copies of themselves into
the genome. When this happens in the germline, these insertions are
transmitted to the next generation.
Vast majority of TEs can be classified into four families, based on the mechanism
by which they copy themselves:
- LINEs (Long Interspersed Nuclear Elements, autonomous)
- SINEs (Short Interspersed Nuclear Elements, use LINE proteins for life cycle)
- LTR elements (Long Terminal Repeats; derived from retroviruses)
- DNA transposons (replicate without RNA intermediary)
Genome structure – Transposable elements
Histogram of TEs versus age shows the activity over time. Alus have been very
active, but recently things have quited down in human.
Mobile DNA in Prokaryotes
Bacteria contain DNA transposons called insertion sequences (Fig.
6.9). IS elements are 1-2 kb DNAs that transpose within the
bacterial genome to random locations. Transposition ("jumping") is
mediated by an encoded transposase protein. Insertion usually
causes gene inactivation and is harmful. Nonetheless, E. coli
encodes ~20 types of IS elements. They are tolerated in part
due to their low transposition rate (1 in 105 - 107 cells per
generation). This rate is set by the low rate of transcription of
the transposase gene. IS elements contain inverted repeat
sequences of ~50 bp at each end of the protein-coding region
that are crucial for transposition.
Mechanism of DNA Transposon Copy
Number Increase
About 3 x 105 copies of
full-length and truncated
DNA transposons occur in
human genomic DNA (3%
of DNA). Although DNA
transposons move via a
cut-and-paste mechanism,
their copy number in the
genome will increase if
they transpose during
DNA synthesis preceding
the first meiotic division
of gametogenesis (Fig.
6.11).
LTR Retrotransposons
Eukaryotic retrotransposons fall into two major groups--LTR
retrotransposons and non-LTR retrotransposons. Together, these
sequences account for 42% of human genomic DNA.
LTRs stand for long direct
terminal repeats. LTRs consist
of 250-600 bp direct repeat
sequences located at the ends
of the retrotransposon coding
region (Fig. 6.12). LTR
retrotransposons share many
features with retroviruses.
They both encode LTRs,
reverse transcriptase, and
DNA integrase. However, LTR
retrotransposons lack coat
proteins that allow
retroviruses to move between
cells. Transposition occurs via
an RNA intermediate that is
transcribed from a promoter
in the left LTR (Fig. 6.13).
The primary transcript is
polyadenylated, forming the
retroviral genomic RNA.
Non-LTR Retrotransposons
Even more abundant in human genomic DNA are non-LTR
retrotransposon sequences. There are two main classes of non-LTR
retrotransposons, known as long interspersed elements (LINEs, ~6
kb), and short interspersed elements (SINEs, ~300 bp). LINEs
encode a reverse transcriptase (ORF2) needed for transposition
(Fig. 6.16), whereas SINEs do not. Instead SINEs are thought to
rely on LINE-encoded enzymes for transposition. LINEs are
grouped into L1, L2, and L3 families, of which only L1 is active
today. LINE sequences occur at ~9 x 105 copies per human
genome. SINEs occur at ~1.6 x 106 copies. The most abundant
SINE is the Alu element, which is named based on the fact that it
encodes an AluI restriction site. Alu elements were important for
gene duplications at the ß-globin locus (Figs. 6.4).
poly(A)
promoter
site
site
Almost all transposable elements in mammals
fall into one of four classes
Short interspersed repetitive elements: SINEs
• Example: Alu repeats
– Most abundant repeated DNA in primates
– Short, about 300 bp
– About 1 million copies
– Likely derived from the gene for 7SL RNA
– Cause new mutations in humans
• They are retrotranposons
– DNA segments that move via an RNA intermediate.
• MIRs: Mammalian interspersed repeats
– SINES found in all mammals
• Analogous short retrotransposons found in genomes of
all vertebrates.
Long interspersed repetitive elements: LINEs
• Moderately abundant, long repeats
– LINE1 family: most abundant
– Up to 7000 bp long
– About 50,000 copies
• Retrotransposons
– Encode reverse transcriptase and other enzymes required for
transposition
– No long terminal repeats (LTRs)
• Cause new mutations in humans
• Homologous repeats found in all mammals and many
other animals
Other common interspersed
repeated sequences in humans
• LTR-containing retrotransposons
– MaLR: mammalian, LTR retrotransposons
– Endogenous retroviruses
– MER4 (MEdium Reiterated repeat, family 4)
• Repeats that resemble DNA transposons
– MER1 and MER2
– Mariner repeats
– Were active early in mammalian evolution but are
now inactive
Exon Shuffling via Recombination Between
Homologous Interspersed Repeats
We previously have noted that gene evolution has involved exon
shuffling between protein-coding genes in the genome. A large
amount of shuffling has occurred due to the prevalence of
interspersed repeats in the genome. Due to sequence conservation
within these regions, crossover events can take place at these
sites (Fig. 6.18). This results in exon shuffling between
nonhomologous genes and the formation of new genes with new
combinations of protein domains. As illustrated in Fig. 6.2, such
events also have been important in exon and gene duplications.
Exon Shuffling via Transposition
Exon shuffling can also occur via cut-and-paste transpositions
mediated by DNA transposons. The mechanism by which this
occurs is illustrated in Fig. 6.19a. It requires that two copies of
the transposon flank the target exon. Both DNA transposons and
the exon will move as one piece of DNA if the transposase
happens to cleave DNA at the left inverted repeat of the
upstream transposon and at the right inverted repeat of the
downstream transposon. Gene 1 ends up losing the exon, and Gene
2 acquires the exon
Exon Shuffling via Transposition
Exons can move along with a LINE element when it transposes via
its copy-and-paste mechanism (Fig. 6.19b). When a LINE element
has a weak poly(A) signal, RNA polymerase II continues to
transcribe downstream, potentially through an exon. If this exon
has a strong poly(A) signal, then transcription stops and the RNA
is polyadenylated. Then following the mechanism in Fig. 6.17,
DNA encoding the exon and the LINE element can be incorporated
into another gene. The spliced mRNA produced from the acceptor
gene may contain the newly introduced exon. Exon shuffling is
supported by experimental evidence and the enormous amount of
interspersed repeat DNA in genomes. Over billions of years, it
has played a major role in evolution of genomes.
Genome
• The genome is all the DNA in a cell.
– All the DNA on all the chromosomes
– Includes genes, intergenic sequences, repeats
• Specifically, it is all the DNA in an organelle.
• Eukaryotes can have 2-3 genomes
– Nuclear genome
– Mitochondrial genome
– Plastid genome
• If not specified, “genome” usually refers to the
nuclear genome.
Genomics
• Genomics is the study of genomes, including large
chromosomal segments containing many genes.
• The initial phase of genomics aims to map and
sequence an initial set of entire genomes.
• Functional genomics aims to deduce information
about the function of DNA sequences.
– Should continue long after the initial genome sequences
have been completed.
Human genome
• 22 autosome pairs + 2
sex chromosomes
• 3 billion base pairs in the
haploid genome
• Where and what are the
30,000 to 40,000 genes?
• Is there anything else
interesting/important? From NCBI web site, photo from T. Ried,
Natl Human Genome Research Institute, NIH
Components of the human
Genome
• Human genome has 3.2 billion base pairs of
DNA
• About 3% codes for proteins
• About 40-50% is repetitive, made by
(retro)transposition
• What is the function of the remaining 50%?
The Genomics Revolution
• Know (close to) all the genes in a genome, and
the sequence of the proteins they encode.
• BIOLOGY HAS BECOME A FINITE
SCIENCE
– Hypotheses have to conform to what is present, not
what you could imagine could happen.
• No longer look at just individual genes
– Examine whole genomes or systems of genes
Genomics, Genetics and
Biochemistry
• Genetics: study of inherited phenotypes
• Genomics: study of genomes
• Biochemistry: study of the chemistry of living
organisms and/or cells
• Revolution lauched by full genome sequencing
– Many biological problems now have finite (albeit
complex) solutions.
– New era will see an even greater interaction among
these three disciplines
Finding the function of genes
• Genes were originally defined in terms
phenotypes of mutants
• Now we have sequences of lots of DNA
a variety of organisms, so ...
• Which portions of DNA actually do some