You are on page 1of 8

BIO353.

01 2022/23-1

Lecture 10: mRNA Splicing

The process of mRNA splicing refers to removal of intronic sequences from the original pre-mRNA that
is transcribed from a gene. As we will see, splicing offers distinct advantages for gene regulation and
genome evolution.

Slides 2/3
In the previous chapter we looked at transcription, the process of generating mRNA from the genomic
sequence of a gene. Essentially, the entire sequence from the transcription start site within the promoter
down to the polyadenylation signal is included in the primary transcript, which is also called pre-mRNA.
However, the sequences that code for a protein are interrupted by sequences that do not contain any
coding information and which are removed by the process of splicing. The removed parts of the primary
transcripts are called introns and the stretches of sequence that are spliced together and remain in the
mature mRNA are called exons. However, not all of the sequences that remain in mature eukaryotic
mRNAs have coding function. Typically, a stretch at the beginning and the end of transcript are non-
coding and are referred to as the 5’ and 3’ untranslated (UTR) or non-coding regions (NCR). The
sequence between the UTRs contains the coding sequence (CDS), which is now available in an
uninterrupted open reading frame (ORF).

Slides 4 / 5 / 6
The presence of introns and the process of splicing were discovered by Philip Sharp and his co-workers
when they were trying to understand the how sequences of the adenovirus virus behave in eukaryotic
cells. They used the R-looping technology to map viral transcripts to genomic DNA, largely to understand
the order of genes in the viral genome. When single-stranded RNA is hybridized to double-stranded
DNA with the same sequence, the RNA displaces one strand of DNA because RNA:DNA interactions
are stronger than DNA:DNA interactions. However, when they performed such an experiment for the
hexon gene transcript of the virus, they always observed unmatched RNA regions that did not hybridize
with the DNA and did not form R loops. In an attempt to find an explanation for this puzzling observation,
they used increasingly longer DNA sequences in their hybridization experiment.

Slides 7 / 8 / 9
When they finally hybridized the hexon transcript to a long stretch of single-stranded DNA, they observed
regions that the RNA:DNA hybridization product was interrupted by long DNA loops, which correspond
to sequences that are found in the DNA but which have been removed from the mature mRNA. The
sizes of the individual loops reflect the distance between the (exonic) segments that have been spliced
together.

Slides 10 / 11 / 12
The finding that DNA sequences of a gene are not found in the sequence of the mRNA transcripts from
the same gene puzzled scientists. An early question was, therefore, whether the missing sequences
were not transcribed or if they were initially transcribed and removed later. An experiment to discriminate
between these two possibilities was to hybridize nuclear RNA or cytoplasmic RNA to DNA of the same
gene. The idea is that mRNA is initially transcribed in the nucleus but then gets exported to the
cytoplasm for translation. If the introns of the gene were not transcribed, then there should not be any
difference between nuclear and cytoplasmic RNA. However, if the RNA is processed and the introns
are removed, the nuclear unprocessed RNAs should be longer. Indeed, nuclear RNA forms a single
displacement loop on double-stranded DNA, whereas cytoplasmic RNA forms alternating segments of
displaced and double-stranded DNA. The conclusion of the experiment is that the introns are initially
transcribed but removed during RNA processing.

Slide 13
Interestingly, the average number of introns increases with evolutionary complexity. Exons are very rare
in prokaryotes and lower single-celled eukaryotes but increase in number in higher order species.
Human genes have on average about 8 introns per gene. This suggests that the presence of introns
somehow contributed to the evolution and/or the functionality of genomes.
Slide 14
A record breaker is the titin gene, which codes for a structural protein in skeletal muscle cells. The gene
contains a bewildering 362 introns that separate 363 exons.

Slide 15 / 16
Generally, exons are rather short with a uniform size distribution, while introns have sizes that can be
up to 10.000-times larger in exceptional cases. The average exon has a size of only 150 bp, while exons
are typically 10- to 100-times longer.

Slide 17
Intron size is not homogeneous and while it may vary across different species, some common patterns
exist. Typically, 5’-introns that are contained within the untranslated region are longer than introns within
the coding sequence. For human genes, introns in the 5’-UTR have an median size between 2 to 3 kb,
while they are only around 1.5 kb within the coding sequence.

Slide 18
Nevertheless, the genomic region spanned by introns can amount to a substantial fraction of the gene
sequence, reaching 99.4% for the human dystrophin gene. The processed mature mRNA is only 14 kb
in size but the pre-mRNA transcript is 2.2 MB long. Thus, more than 99% of the transcript is discarded
during splicing. The consequence is that it takes RNA polymerase about 16 h to synthesize a single
transcript of the dystrophin gene instead of 5 min, which it would take if only the coding sequence were
transcribed.

Slide 19
Obviously, the presence of introns must have some advantage despite the sequence not being used for
anything. As we will see, the presence of introns allows for alternative splicing, i.e. using the combination
of different exons to code distinct proteins with different function from the same gene. Introns can also
affect gene regulation and genome evolution by recombination and exon shuffling, as we will see later.

Slide 20
Somehow, the presence of introns (or the process of splicing) affects RNA transcription and protein
translation. In this experiment, sequences coding for different genes were heterologously expressed in
eukaryotic cells from DNA constructs that did or did not contain introns. One intron was contained within
the 5’-UTR and the other within the coding sequence. The effect of the manipulation was investigated
by RNA Norther blot analysis and protein analysis.

Slide 21 / 22
As shown here for the insulin and some other genes, removal of either one of the two introns lowered
the amount of mRNA and expressed protein. The simultaneous removal of both introns had a dramatic
effect on RNA expression or stability and protein expression. We will see in a later lecture how this could
be explained but spliced exons are marked by exon-junction complexes that interact with ribosomes
and determine the stability of the mRNA and the usage of the mRNA by the ribosome for translation.

Slide 23
It is important that introns are spliced out precisely without shifting the frame of the interrupted coding
sequence. Thus, both the identity and the ends of the intron must be defined. When the sequences of
a large number of introns are compared, specific sequences can be identified that are common to all
introns. Essentially, the sequence at the beginning and the end of the intron complies with consensus
sequences in addition to another sequence stretch within the intron itself. The 5’-splice site (also referred
to as the splice donor site) often has the sequence AG/GT (where / marks the exon/intron boundary)
and the 3’-splice site (splice acceptor site) has the sequence CAG/G. The 3’-splice site typically is
preceded by a polypyrimidine tract (many pyrimidine bases, such as C and T nucleotides). In addition,
a branch site, which contains an important A base, can be found at some distance but close to the 3’-
splice site.

Slides 24 - 26
Splicing is a coordinated process that follows a hierarchy of ordered steps:

(24) The first step is the attack of the first base (usually a G) of the intron at the 5’ splice site by the
branchpoint A.
(25) The branchpoint A forms an unusual 2’-5’’phosphodiester bond with the G at the beginning of
the intron. The intron is now covalently linked to itself in the shape of a lariat structure (a noose-
shaped rope structure that cowboys use to catch cattle). The reaction is also referred to as the
1st transesterification reaction. The 3’-end of the exon is now detached from the rest of the
mRNA nd needs to be rejoined to the nect exon.

(26) This occurs during the 2nd transesterification, which releases the intron lariat and joins the 5’
exon to the next exon downstream. In this case the free 3’-hydroxyl group of the exon attacks
the 3’splice site to displace the intron during formation of the spliced product. If a gene contains
more than one intron, these steps will repeat independently at all introns.

Slides 27 / 28
The splicing reaction is catalyzed by a large RNA-protein complex, the spliceosome, which consists of
5 subunits and has almost the same size as the ribosome.

Slides 29 / 30
The spliceosomal subunits are ribonuclear proteins, meaning that they are composed of protein and
RNA components. The RNA components are the five small nuclear RNAs (= snRNAs) U1, U2 U4, U5,
and U6, which are complexed by proteins to form snRNPs (small nuclear ribonuclear proteins;
pronounce snurps). The snRNPs are named after the snRNA they contain, thus, there is the U1 snRNP
and the U6 snRNP and so on. These snRNPs mediate the recognition of the 5’- and 3’-splice sites and
the branchpoint A, in addition to bringing the reacting RNA nucleotides closer together and catalyzing
the transesterification reactions. The RNA components of the snRNPs interact with sequences in the
introns, while the proteins largely contribute to the geometry and stability of the complex.

Slides 31 - 34
The spliceosomal subunits contribute to different steps of the splicing reaction:

(31) The U1 snRNP recognizes the 5’-splice site. A short stretch of U1 snRNA is complementary to
the sequence at the beginning of the intron and can form base pairs with the 5’- splice site. This
interaction includes the G nucleotide at the beginning of the intron, which is attacked by the
branchpoint A during the 1st transesterification reaction. Thus, the G nucleotide would be
masked by the interaction with U1. Therefore, U1 is exchanged for U6, which also base pairs
with 5’-splice site but not at all nucleotides, which makes the G nucleotide available.
(32) The branch site is initially recognized by a DNA-binding protein, the branch point-binding protein
(BBP) but is exchanged later for the U2 snRNP. U2 interacts with all bases of the branch site
except for the branch point A. This makes the A nucleotide bulge out a bit, which makes it
reactive to attack the 5’-splice site.
(33) Other regions of the snRNAs may also interact by base pairing to bring the different snRNPs
into the correct orientation within the complex.
(34) The interaction of the snRNAs with the intron sequence and among each other is summarized
here. Essentially, all of the important interactions are made between RNA and RNA and the
proteins only play a minor role in the reaction.

Slides 35 – 40
Let’s follow the stepwise events that occur during the splicing reaction:

(35) Initially, U1 recognizes the 5’-splice site and the branchpoint region is recognized by BBP.
(36) Later, BBP is exchanged for U2, which bulges out the A from the branch site and RNA:RNA
interactions between U1 and U2 bring the 5’-splice site and the branch site into close proximity.
(37) The 3’-splice site is recognized by U5, which also makes RNA:RNA interactions with U1 at this
point.
(38) U1 is replaced by U6, which enables the attack of the 5’-splice site by the branchpoint A. U1 is
recycled for another reaction at another intron.
(39) Following the 1st transesterification reaction, the complex undergoes an ATP-dependent
remodeling reaction, which brings the 5’- and the 3’-splice sites together.
(40) After the second transesterification reaction, the lariat intron and the spliced mRNA are released
from the complex.
Slide 41
In addition to introns that are processed by the spliceosome, self-splicing introns also exist. Of those,
the group II introns use a mechanism that is similar to the reaction catalyzed by the spliceosome. Group
I introns are different in the sense that they use a G nucleotide that is complexed by the secondary
structure of the intron’s RNA. Because the G is not a part of the intron sequence itself, group I introns
are released as linear molecules and not in the shape of a lariat.

Slides 42 / 43
Essentially, the group II introns contain RNA sequences that make similar interactions to those mediated
by the splieosomal snRNPs. It is believed that the spliceosome developed from group II introns but that
the intron sequence was fragmented to give rise to the snRNAs.

Slides 44 / 45
The definition of the beginning and the end of the intron by the sequences of the 5’ and 3’-splice site
seems to be an easy concept but actually many such sequences can occur within a single intron, as
shown here for intron 5 of the alpha 3 subunit of the acetylcholine receptor. Thus, how does the splicing
process ensure that the right sequences are used?

Slide 46
A partial answer comes from the fact that splicing occurs concomitantly to transcription. While the mRNa
is still being synthesized, the early introns are already removed. The necessary splicing factors
(snRNPs) travel along the DNA template with the RNA pol II and are attached to the C-terminal domain.
Thus, as soon as an appropriate splice site has been transcribed, the spliceosomal subunits assemble
on the mRNA.

Slides 47 / 48
Additional specificity comes from sequences within the exons (or the intron) that either promote or inhibit
the binding of splicing factors to specific mRNA positions. Sequences within the exon that can function
as exonic splice enhancers are typically recognized by SR proteins. These SR proteins make additional
interaction with U1 or proteins bound to the 3’-splice site (protein 35) and favor the assembly of splicing
factors at sites flanking the exon but not within the intron itself.

Slide 49
SR proteins typically function as splicing activators and contain a domain that recognizes the exonic
splice enhancer sequences and a domain that is rich in arginine and serine residues (= RS domain),
which interacts with splicing factors. In addition, splicing repressors also exist, which are referred to as
heteronuclear ribonucleoproteins (hnRNPs). These proteins can bind RNA motifs but lack the RS
domain.

Slide 50
Splice activators can also bind to sequences within the intron (intronic splice enhancers) in addition to
exonic sequences. Similarly, splice repressors bind to exonic or intronic splice silencers to prevent
spliceosomal subunits to recognize splice sequences.

Slide 51
There is not just a single mechanism by which splice repressors can mediate their effect. The main
principles are shown here. They can overlap with binding sites for slicing factors either cooperatively
(top) or by themselves. Some hnRNPs also promote secondary structures of the mRNA that are not
favorable for splicing factors to bind.

Slide 52
In addition to the spliceosome described early, an alternative spliceosome complex exists that has a
different splice site definition and is referred to as the minor spliceosome or the AT/AC spliceosome.
The major spliceosome uses GT/CA sequences.

Slide 53
Many gene transcripts can be alternatively splice, meaning that different splice products containing
different exons can be generated from the same pre-mRNA. Consequently, different protein variants
that do or do not include specific domains are encoded by the same gene. In this example, the troponin
gene can give rise to two different protein variants (a, b) depending on which one of two alternative
exons (exon 3 or 4) are included in the spliced mRNA. The protein variants have different properties
and are specifically expressed in different muscle cells (smooth vs cardiac).

Slide 54
This slide summarizes the most common forms of alternative splicing. Consider a gene that contains 3
exons, which can be spliced together. Eventually, exon 2 is excluded during splicing (exon skipped).
Sometimes parts of or an entire intron can be included in the splice product. Lastly, as we have seen on
the previous slide, a 5’-exon can be spliced to either one of two similar downstream exons. The variety
can be higher if alternative promoters and polyadenylation signals are included.

Slide 55
It is important to distinguish between constitutive and regulated alternative splicing. In some cells, all or
some alternative splice products are all generated at the same time. However, regulated splicing can
affect cell behavior or development under specific conditions or time points.

Slide 56
Often alternative splicing occurs in different cell types. Thus, one cell type expresses one isoform,
whereas the other expresses the alternative isoform. SR and hnRNP proteins that regulate splicing and
exons election are involved. In a simple scenario, one cell type may express a splice repressor, whereas
the other does not. Binding of the repressor to a regulated splice site would prevent splicing.

Slide 57
Alternatively, splicing may only occur in the presence of a splice activator that binds to a splice enhancer,
whereas splicing is suppressed in another cell type that does not express the required protein.

Slide 58
An interesting gene is the Drosophila DSCAM gene (Down syndrome cell adhesion molecule). The gene
contains multiple alternative exons for multiple exons. There are 12 different isoforms of exon 4, 48 of
exon 6, 33 of exon 9, and 2 of exon 17. The number of possible combinations is staggering and a total
of 38.016 different proteins with different properties can be encoded by a single gene.

Slide 59
DSCAM has two important functions in fruit flies. Similar to antibodies, which also have a high degree
of diversity due to recombination of gene segments, DSCAM can function as a receptor in the immune
system of the fly.

Slides 60 / 61
As the name implies, DSCAM can also function as a cell adhesion molecule in the nervous system.
Cells expressing the same DSCAM isoform avoid each other, including different branches of the same
cell, whereas cells expressing different isoforms tolerate each other.

Slide 62
The question the is, how a specific exon is selected for splicing. Phylogenetic analysis of sequences
within intron 5 revealed a site that is conserved in different fly species. This site acts as a docking site
with which sequences upstream of each alternative exon 6 can interact.

Slides 63 / 64
Each alternative exon contains a selector site that can base pair with the docking site.

Slide 65 / 66
Thus, during splicing, the selector site of an exon can pair with the selector site, which results in the
selection of a given exon. The specificity of exon selection, i.e. whether exon 6.5 or exon 6.23 is used
for splicing is not well understood but, in part depends on SR and hnRNP proteins.

Slide 67
Splicing is a fundamental process, not only because it generates protein diversity, but it can also have
profound consequences for the entire organism. One such example is the sex specification in
Drosophila. Here splicing determines whether a fruit fly will be male or female.
Slide 68
The Drosophila genome contains 4 chromosome pairs, including X and Y sex chromosomes. As in
mammals, individuals with two X chromosomes are females, whereas flies with an X and a Y
chromosome are males. In mammals, sex is determined by a single region on the Y chromosome, which
changes the default state from female to male. However, in Drosophila, the sex is determined by the
ratio of autosomes to X chromosomes and the Y chromosome has no function in specifying the sex.

Slide 69
Sex in Drosophila is controlled by a splicing cascade, which starts with the sex-lethal gene (Sxl).
Expression of the gene is controlled by two alternative promoters, which are active during different
developmental time points. The promoter Pe (for establishment) is only expressed at early
developmental stages and only in females around the time of sex determination, whereas the Pm
promoter (for maintenance) is activated later. Expression form Pe is controlled by three genes, the
activators sis-a/b (for sisterless) and the inhibitor dpn (deadpan). The sis genes are found on the X
chromosome, while the dpn gene is located on autosomes. Thus, female flies with two X chromosomes
produce more sis than dpn proteins, which results in activation of Pe and expression of Sxl. Flies with
only one X chromosome make about an equal amount of sis and dpn protein and Pe is not activated.

Slide 70
The Sxl gene also contains an alternatively spliced exon (exon 3), which includes a regulated splice site
at the beginning of the exon as well as a STOP codon. Thus, when exon 3 is included in the mature
transcript, a truncated protein will be produced. However, initially, the Sxl gene is only transcribed in
female embryos from the Pe promoter. The first exon of the embryonic transcript (exon E) is spliced
directly to exon 4, thus, bypasses the stop codon contained in exon 3 and the Sxl protein is produced.
In males, the gene is not expressed and no protein is made.

Slide 71
In adult flies, Sxl is expressed from Pm and exon 1 downstream of Pm is spliced to exon 2, which can
either be spliced to exon 3 or exon 4. Whether alternative exon 3 is included in the splice product or not
depends on the presence of a splice repressor that binds to the regulated 3’-splice site at the end of
intron 2. Interestingly, the splice repressor is the Sxl protein itself. Thus, in female flies, exon 3 is skipped
and full-length Sxl protein is made. In males, which did not generate Sxl during the embryonic stage, no
Sxl protein is present and exon 3 is spliced into the transcript. Since exon 3 also contains a stop codon,
a truncated, functionless protein is made.

Slides 72 / 73 / 74
The Sxl gene is only the first of a cascade of genes that are all regulated by alternative splicing in male
and female cells.

(72) As described on the previous slides, only females produce a functional Sxl protein, which acts
as a splice repressor for exon 3.
(73) Sxl is also a splice repressor for the tra (transformer) gene similar to its function for the Sxl gene.
In the absence of the Sxl protein, a cryptic splice site in intron 1 is used, which adds additional
sequence to exon 2. This sequence again contains a stop codon. Thus, males produce a
truncated, functionless Tra protein, whereas the splice site is skipped in females and fukk-length
Tra protein is made.
(74) Tra is a splice activator of the dsx gene (doublesex) and promotes the inclusion of exon 4 into
female transcripts. Exon 4 also contains a polyA signal, thus a short but functional protein is
made in female cells, whereas a longer variant is made in males. Both sex-specific isoforms
have different target gene specificity and the female Dsx protein activates expression of female
genes and represses male genes. The male isoform represses expression of other female
genes.

Slide 76
Proteins often are composed of multiple modular domains that serve different aspects of the function of
the protein. In this theoretical example, a membrane receptor may contain an extracellular
immunoglobulin domain, a transmembrane domain, and an intracellular kinase domain. Such a receptor
would be able to detect a specific interacting protein ligand on the extracellular surface and signal
intracellularly by phosphorylating downstream target proteins. Often, these protein domains fold
independently and are encoded within a single or a group of neighboring exons.
Slide 77
The fact that different exons encode a full or part of a specific protein domain allows for the invention of
new proteins with new functions by bringing exons from different genes together into new combinations
through a process that is referred to as exon shuffling. In this example, the transmembrane domain of
the receptor protein on the previous slide has been combined with a bromo-domain of another protein
that binds specific DNA sequences and interacts with methylated histones in nucleosomes. The new
protein could, thus, recruit heterochromatin to the nuclear membrane. Such exon shuffling events have
occurred multiple times during evolution based on homologous recombination or transposition events.

Slides 78 - 81
It is believed that such exon shuffling events already occurred early in evolution. Modern day eukaryotic
introns are derived from self-splicing group 2 introns, which were mobile elements in ancestral cells.
Recombination between these mobile elements may have combined sequences coding for small
proteins or protein domains into a single new gene that now encoded a multifunctional complex protein.

Slide 82
One interesting example for exon shuffling can be found in the genes/proteins that participate in the
blood coagulation cascade. All of these proteins have related function in that they cleave other proteins
through a protease domain. However, the specific protein that each member cleaves is different. This
difference is, in part, due to different domain exons that have been reshuffled onto existing members of
the cascade.

Slide 83
Splicing brings together the relevant sequences of a primary transcript to establish the open reading
frame that is used during protein translation. Alternative splicing increases the variety of protein isoforms
that can be made from a single gene Yet, other, more stunning changes in the coding sequence of a
transcript may occur through the process of RNA editing.

Slide 84
Two independent RNA editing mechanisms exist. One is based on deamination of RNA bases, typically,
adenine and cytosine bases, which changes the meaning of the mRNA. Another mechanism inserts
additional uridine nucleotides into the mRNA based on a template sequence.

Slide 85
When we were discussing mutations in DNA, we have seen that deamination can bases and affect base
pairing. Deamination of C results in U and deamination of A results in inosine. Since they base pair with
different nucleotides, the deaminated nucleotides are translated differently (recognition of codons by
tRNAs is also dependent on base pairing). The enzymes that deaminate C nucleotides are called
cytidine deaminases, the enzyme that deaminates A is called ADAR (adenine deaminase acting on
RNA).

Slide 86
As you can see from the genetic code changing a C into a U or an A into a base that behaves like a G,
T, or U affects which amino acid is incorporated into the polypeptide synthesized from the mRNA.

Slide 87
One example where editing by cytidine deaminases plays an important role is in regulating the
expression of different apolipoprotein B (apoB) isoforms. The original mRNA transcribed from the APOB
gene has a CAA codon, which instructs the insertion of glutamine in the middle of the long apoB 100
protein. If the C is changed to a U, the sequence becomes UAA, which is a STOP codon in eukaryotes.
Thus, a truncated shorter apoB 48 protein is made.

Slide 88
ApoB proteins function in trafficking of lipoprotein particles. The slide is complicated, but fat and
cholesterol molecules absorbed in the intestine are supplied to the blood stream as chylomicrons, which
include apoB 48 and mark them for uptake by the liver. The liver then repackages these fat molecules
into VLDL lipoproteins that become LDL, both of which are decorated with apoB100 and taken up by
cells in the body. Thus, intestinal cells express apoB 48, while liver cells express apoB 100 from the
same gene but from differently edited mRNAs.
Slide 89
ADAR enzymes change A nucleotides into inosines (I), which are common in tRNAs.

Slides 90 / 91
A major target of ADAR enzymes, however, appear to be certain ion channels and receptor channels in
nerve cells. The edited receptors have different biophysical or biochemical properties, which is important
for their function in the nervous system.

Slide 92
Just one example, the glutamate receptor subunit A2 contains a CAG codon that codes for glutamine.
Deamination of the middle A results in a codon that is read as CGG and instructs the insertion of
arginine. The consequence is that the channel is no longer permeable for calcium.

Slide 93
The insertion of U nucleotides is a bit more complicated than just the simple enzymatic conversion of
existing bases. An interesting example is the cyclooxygenase II (coxII) gene. In total 4 U nucleotides
are inserted, which established the open reading frame and the codons in the mRNA that is translated.
The editing depends on a small guide RNA that instructs where and how many Us are inserted.

Slide 94
The Cytochrome C oxidase gene is even more intensively edited. The sequence on the top is the
sequence that is encoded by the gene, the one underneath shows the edited mRNA with all of te inserted
Us.

Slide 95
The editing requires a guide RNA that base pairs with the mRNA but imperfectly. This a mismatch region
is created that is cut by an endonuclease. Once opened the unmatched As in the guide RNA are
complemented by Us.

End of transcript !!! - (shf, 28.11.2022)

You might also like