You are on page 1of 22

Docking (molecular)

From Wikipedia, the free encyclopedia


In the field of molecular modeling, docking is a method which predicts the
preferred orientation of one molecule to a second when bound to each other to
form a stable complex.[1] Knowledge of the preferred orientation in turn may be
used to predict the strength of association or binding affinity between two
molecules using, for example, scoring functions.

Schematic illustration of docking a small molecule ligand (green) to a protein


target (black) producing a stable complex.

Docking of a small molecule (green) into the crystal structure (PDB: 3SN6) of
the beta-2 adrenergic G-protein coupled receptor.
The associations between biologically relevant molecules such
as proteins, nucleic acids, carbohydrates, and lipids play a central role in signal
transduction. Furthermore, the relative orientation of the two interacting
partners may affect the type of signal produced (e.g., agonism vs antagonism).
Therefore, docking is useful for predicting both the strength and type of signal
produced.
Molecular docking is one of the most frequently used methods in structure-
based drug design, due to its ability to predict the binding-conformation
of small molecule ligands to the appropriate target binding site. Characterisation
of the binding behaviour plays an important role in rational design of drugs as
well as to elucidate fundamental biochemical processes.[2]
One can think of molecular docking as a problem of “lock-and-key”, in which
one wants to find the correct relative orientation of the “key” which will open
up the “lock” (where on the surface of the lock is the key hole, which direction
to turn the key after it is inserted, etc.). Here, the protein can be thought of as
the “lock” and the ligand can be thought of as a “key”. Molecular docking may
be defined as an optimization problem, which would describe the “best-fit”
orientation of a ligand that binds to a particular protein of interest. However,
since both the ligand and the protein are flexible, a “hand-in-glove” analogy is
more appropriate than “lock-and-key”.[3] During the course of the docking
process, the ligand and the protein adjust their conformation to achieve an
overall "best-fit" and this kind of conformational adjustment resulting in the
overall binding is referred to as "induced-fit".[4]
Molecular docking research focusses on computationally simulating
the molecular recognition process. It aims to achieve an optimized conformation
for both the protein and ligand and relative orientation between protein and
ligand such that the free energy of the overall system is minimized.

Docking approaches[edit]
Two approaches are particularly popular within the molecular docking
community. One approach uses a matching technique that describes the protein
and the ligand as complementary surfaces.[5][6][7] The second approach simulates
the actual docking process in which the ligand-protein pairwise interaction
energies are calculated.[8] Both approaches have significant advantages as well
as some limitations. These are outlined below.
Shape complementarity[edit]
Geometric matching/ shape complementarity methods describe the protein and
ligand as a set of features that make them dockable.[9] These features may
include molecular surface / complementary surface descriptors. In this case, the
receptor’s molecular surface is described in terms of its solvent-accessible
surface area and the ligand’s molecular surface is described in terms of its
matching surface description. The complementarity between the two surfaces
amounts to the shape matching description that may help finding the
complementary pose of docking the target and the ligand molecules. Another
approach is to describe the hydrophobic features of the protein using turns in the
main-chain atoms. Yet another approach is to use a Fourier shape descriptor
technique.[10][11][12] Whereas the shape complementarity based approaches are
typically fast and robust, they cannot usually model the movements or dynamic
changes in the ligand/ protein conformations accurately, although recent
developments allow these methods to investigate ligand flexibility. Shape
complementarity methods can quickly scan through several thousand ligands in
a matter of seconds and actually figure out whether they can bind at the
protein’s active site, and are usually scalable to even protein-protein
interactions. They are also much more amenable to pharmacophore based
approaches, since they use geometric descriptions of the ligands to find optimal
binding.
Simulation[edit]
Simulating the docking process as such is much more complicated. In this
approach, the protein and the ligand are separated by some physical distance,
and the ligand finds its position into the protein’s active site after a certain
number of “moves” in its conformational space. The moves incorporate rigid
body transformations such as translations and rotations, as well as internal
changes to the ligand’s structure including torsion angle rotations. Each of these
moves in the conformation space of the ligand induces a total energetic cost of
the system. Hence, the system's total energy is calculated after every move.
The obvious advantage of docking simulation is that ligand flexibility is easily
incorporated, whereas shape complementarity techniques must use ingenious
methods to incorporate flexibility in ligands. Also, it more accurately models
reality, whereas shape complimentary techniques are more of an abstraction.
Clearly, simulation is computationally expensive, having to explore a large
energy landscape. Grid-based techniques, optimization methods, and increased
computer speed have made docking simulation more realistic.

Mechanics of docking[edit]
Docking flow-chart overview
To perform a docking screen, the first requirement is a structure of the protein
of interest. Usually the structure has been determined using a biophysical
technique such as x-ray crystallography or NMR spectroscopy, but can also
derive from homology modeling construction. This protein structure and a
database of potential ligands serve as inputs to a docking program. The success
of a docking program depends on two components: the search algorithm and
thescoring function.
Search algorithm[edit]
Main article: Searching the conformational space for docking
The search space in theory consists of all possible orientations
and conformations of the protein paired with the ligand. However, in practice
with current computational resources, it is impossible to exhaustively explore
the search space—this would involve enumerating all possible distortions of
each molecule (molecules are dynamic and exist in an ensemble of
conformational states) and all possible rotational and translational orientations
of the ligand relative to the protein at a given level of granularity. Most docking
programs in use account for the whole conformational space of the ligand
(flexible ligand), and several attempt to model a flexible protein receptor. Each
"snapshot" of the pair is referred to as a pose.
A variety of conformational search strategies have been applied to the ligand
and to the receptor. These include:

• systematic or stochastic torsional searches about rotatable bonds


• molecular dynamics simulations
• genetic algorithms to "evolve" new low energy conformations and where the
score of each pose acts as the fitness function used to select individuals for
the next iteration.
Ligand flexibility[edit]
Conformations of the ligand may be generated in the absence of the receptor
and subsequently docked[13] or conformations may be generated on-the-fly in
the presence of the receptor binding cavity,[14] or with full rotational flexibility
of every dihedral angle using fragment based docking.[15] Force field energy
evaluation are most often used to select energetically reasonable
conformations,[16] but knowledge-based methods have also been used.[17]
Receptor flexibility[edit]
Computational capacity has increased dramatically over the last decade making
possible the use of more sophisticated and computationally intensive methods in
computer-assisted drug design. However, dealing with receptor flexibility in
docking methodologies is still a thorny issue. The main reason behind this
difficulty is the large number of degrees of freedom that have to be considered
in this kind of calculations. Neglecting it, however, leads to poor docking
results in terms of binding pose prediction.[18]
Multiple static structures experimentally determined for the same protein in
different conformations are often used to emulate receptor
flexibility.[19] Alternatively rotamer librariesof amino acid side chains that
surround the binding cavity may be searched to generate alternate but
energetically reasonable protein conformations.[20][21]
Scoring function[edit]
Main article: Scoring functions for docking
Docking programs generate a large number of potential ligand poses, of which
some can be immediately rejected due to clashes with the protein. The
remainder are evaluated using some scoring function, which takes a pose as
input and returns a number indicating the likelihood that the pose represents a
favorable binding interaction and ranks one ligand relative to another.
Most scoring functions are physics-based molecular mechanics force fields that
estimate the energy of the pose within the binding site. The various
contributions to binding can be written as an additive equation:

The components consist of solvent effects, conformational changes in the


protein and ligand, free energy due to protein-ligand interactions, internal
rotations, association energy of ligand and receptor to form a single complex
and free energy due to changes in vibrational modes.[22] A low (negative) energy
indicates a stable system and thus a likely binding interaction.
An alternative approach is to derive a knowledge-based statistical potential for
interactions from a large database of protein-ligand complexes, such as
the Protein Data Bank, and evaluate the fit of the pose according to this inferred
potential.
There are a large number of structures from X-ray crystallography for
complexes between proteins and high affinity ligands, but comparatively fewer
for low affinity ligands as the later complexes tend to be less stable and
therefore more difficult to crystallize. Scoring functions trained with this data
can dock high affinity ligands correctly, but they will also give plausible docked
conformations for ligands that do not bind. This gives a large number of false
positive hits, i.e., ligands predicted to bind to the protein that actually don't
when placed together in a test tube.
One way to reduce the number of false positives is to recalculate the energy of
the top scoring poses using (potentially) more accurate but computationally
more intensive techniques such as Generalized Born or Poisson-
Boltzmann methods.[8]

Docking assessment[edit]
The interdependence between sampling and scoring function affects the docking
capability in predict plausible poses or binding affinities for novel compounds.
Thus, an assessment of a docking protocol is generally required (when
experimental data is available) to determine its predictive capability. Docking
assessment can be performed using different strategies, such as:

• docking accuracy (DA) calculation;


• the correlation between a docking score and the experimental response or
determination of the enrichment factor (EF);[23]
• the distance between an ion-binding moiety and the ion in the active site;
• the presence of induce-fit models.
Docking accuracy[edit]
Docking accuracy[24][25] represents one measure to quantify the fitness of a
docking program by rationalizing the ability to predict the right pose of a ligand
with respect to that experimentally observed.
Enrichment factor[edit]
Docking screens can be also evaluated by the enrichment of annotated ligands
of known binders from among a large database of presumed non-binding,
“decoy” molecules.[23]In this way, the success of a docking screen is evaluated
by its capacity to enrich the small number of known active compounds in the
top ranks of a screen from among a much greater number of decoy molecules in
the database. The area under the receiver operating characteristic (ROC) curve
is widely used to evaluate its performance.
Prospective[edit]
Resulting hits from docking screens are subjected to pharmacological validation
(e.g. IC50, affinity or potency measurements). Only prospective studies
constitute conclusive proof of the suitability of a technique for a particular
target.[26]
Benchmarking[edit]
The potential of docking programs to reproduce binding modes as determined
by X-ray crystallography can be assed by a range of docking benchmark sets.
For small molecules, several benchmark data sets for docking and virtual
screening exist e.g. Astex Diverse Set consisting of high quality protein−ligand
X-ray crystal structures[27] or the Directory of Useful Decoys (DUD) for
evaluation of virtual screening performance.[23]
An evaluation of docking programs for their potential to reproduce peptide
binding modes can be assessed by Lessons for Efficiency Assessment of Docking
and Scoring(LEADS-PEP).[28]

Applications[edit]
A binding interaction between a small molecule ligand and an enzyme protein
may result in activation or inhibition of the enzyme. If the protein is a receptor,
ligand binding may result in agonism or antagonism. Docking is most
commonly used in the field of drug design — most drugs are
small organic molecules, and docking may be applied to:

• hit identification – docking combined with a scoring function can be used to


quickly screen large databases of potential drugs in silico to identify
molecules that are likely to bind to protein target of interest (see virtual
screening).
• lead optimization – docking can be used to predict in where and in which
relative orientation a ligand binds to a protein (also referred to as the binding
mode or pose). This information may in turn be used to design more potent
and selective analogs.
• Bioremediation – Protein ligand docking can also be used to predict
pollutants that can be degraded by enzymes

Targeting signal transduction:1


With the advance in the molecular understanding of cancers and proliferative
disorders new approaches to managing these diseases may become feasible. It
has been recognized that a key feature of these diseases is the pathological
alteration in the molecular machineries of signalling pathways. This
recognition which began to emerge in the early 1980s induced us to explore
the possibility of targetting the aberrant signalling pathways for disease
therapy.

Targeting signal transduction as a strategy to treat inflammatory diseases

Abstract

Inflammatory diseases are a major burden on humanity, despite recent


successes with biopharmaceuticals. Lack of responsiveness and resistance
to these drugs, delivery problems and cost of manufacture of
biopharmaceuticals mean that the search for new anti-inflammatory agents
continues. Progress in our understanding of inflammatory signalling
pathways has identified new targets, notably in pathways involving NF-κB,
p38 MAP kinase, T lymphocyte activation and JAK/STAT. Other targets
such as transcription factor complexes and components of pathways
activated by TNF, Toll-like receptors and Nod-like receptors also present
possibilities, and might show efficacy without being limited by effects on
host defence. The challenge is to place a value on one target relative to
another, and to devise strategies to modulate them.
Disruption intracelluar signalling:

Disruption of intracellular signalling by alterations in the cancer genome.


A simplified signaling pathway is depicted to highlight known examples of
bona fide oncogenes that are subjected to dysregulation by various mechanisms.
It is clear that a signaling pathway can be disrupted at multiple points, and a
variety of genomic and epigenomic alterations can contribute to this, ultimately
leading to cancer.
Novel treatment of excitotoxicity: targeted disruption of intracellular
signalling from glutamate receptor:

Glutamate signalling plays key physiological roles in excitatory


neurotransmission and CNS plasticity, but also mediates excitotoxicity, the
process responsible for triggering neurodegeneration through glutamate receptor
overactivation. Excitotoxicity is thought to be a key neurotoxic mechanism in
neurological disorders, including brain ischemia, CNS trauma and epilepsy.
However, treating excitotoxicity using glutamate receptor antagonists has not
proven clinically viable, necessitating more sophisticated approaches.
Increasing knowledge of the composition of the postsynaptic density at
glutamatergic synapses has allowed us to extend our understanding of the
molecular mechanisms of excitotoxicity and to dissect out the distinct signalling
pathways responsible for excitotoxic damage. Key molecules in these pathways
are physically linked to the cytoplasmic face of glutamate receptors by
scaffolding proteins that exhibit binding specificity for some receptors over
others. This imparts specificity to physiological and pathological glutamatergic
signalling. Recently, we have capitalized on this knowledge and, using targeted
peptides to selectively disrupt intracellular interactions linked to glutamate
receptors, have blocked excitotoxic signalling in neurones. This therapeutic
approach circumvents the negative consequences of blocking glutamate
receptors, and may be a practical strategy for treating neurological disorders that
involve excitotoxicity.
Central dogma of molecular biology
The central dogma of molecular biology is an explanation of the flow of
genetic information within a biological system. It was first stated by Francis
Crick in 1958[1]

“ The Central Dogma. This states that once ‘information’ has passed into
protein it cannot get out again. In more detail, the transfer of
information from nucleic acid to nucleic acid, or from nucleic acid to
protein may be possible, but transfer from protein to protein, or from
protein to nucleic acid is impossible. Information means here the precise
determination of sequence, either of bases in the nucleic acid or of
amino acid residues in the protein. ”
— Francis Crick, 1956
and re-stated in a Nature paper published in 1970:[2]

Information flow in biological systems

“ The central dogma of molecular biology deals with the detailed residue-
by-residue transfer of sequential information. It states that such
information cannot be transferred back from protein to either protein or
nucleic acid. ”
— Francis Crick
The central dogma has also been described as "DNA makes RNA and RNA
makes protein,"[3] originally termed the sequence hypothesis and made as a
positive statement by Crick. However, this simplification does not make it clear
that the central dogma as stated by Crick does not preclude the reverse flow of
information from RNA to DNA, only ruling out the flow from protein to RNA
or DNA. Crick's use of the word dogma was unconventional, and has been
controversial.
The dogma is a framework for understanding the transfer
of sequence information between information-carrying biopolymers, in the most
common or general case, in living organisms. There are 3 major classes of such
biopolymers: DNA and RNA (both nucleic acids), and protein. There are 3×3 =
9 conceivable direct transfers of information that can occur between these. The
dogma classes these into 3 groups of 3: 3 general transfers (believed to occur
normally in most cells), 3 special transfers (known to occur, but only under
specific conditions in case of some viruses or in a laboratory), and 3 unknown
transfers (believed never to occur). The general transfers describe the normal
flow of biological information: DNA can be copied to DNA (DNA replication),
DNA information can be copied into mRNA(transcription), and proteins can be
synthesized using the information in mRNA as a template (translation).[2]

• 9External links

Biological sequence information[edit]


Main article: Primary structure
The biopolymers that comprise DNA, FNA, RNA and (poly)peptides are linear
polymers (i.e.: each monomer is connected to at most two other monomers).
The sequence of their monomers effectively encodes information. The transfers
of information described by the central dogma ideally are
faithful, deterministic transfers, wherein one biopolymer's sequence is used as a
template for the construction of another biopolymer with a sequence that is
entirely dependent on the original biopolymer's sequence.

General transfers of biological sequential information[edit]

Table of the 3 classes of information transfer suggested by the dogma

General Special Unknown

DNA → DNA RNA → DNA protein → DNA

DNA → RNA RNA → RNA protein → RNA

RNA → protein DNA → protein protein → protein

DNA replications[edit]
Main article: DNA replication
In the sense that DNA replication must occur if genetic material is to be
provided for the progeny of any cell, whether somatic or reproductive, the
copying from DNA to DNA arguably is the fundamental step in the central
dogma. A complex group of proteins called the replisome performs the
replication of the information from the parent strand to the complementary
daughter strand.
The replisome comprises:

• a helicase that unwinds the superhelix as well as the double-stranded


DNA helix to create a replication fork
• SSB protein that binds open the double-stranded DNA to prevent it from
reassociating
• RNA primase that adds a complementary RNA primer to each template
strand as a starting point for replication
• DNA polymerase III that reads the existing template chain from its 3' end
to its 5' end and adds new complementary nucleotides from the 5' end to
the 3' end of the daughter chain
• DNA polymerase I that removes the RNA primers and replaces them with
DNA.
• DNA ligase that joins the two Okazaki fragments with phosphodiester
bonds to produce a continuous chain.
This process typically takes place during S phase of the cell cycle.
Transcription[edit]

Main article: Transcription (genetics)


Transcription is the process by which the information contained in a section
of DNA is replicated in the form of a newly assembled piece of messenger
RNA (mRNA). Enzymes facilitating the process include RNA
polymerase and transcription factors. In eukaryoticcells the primary
transcript is pre-mRNA. Pre-mRNA must be processed for translation to
proceed. Processing includes the addition of a 5' cap and a poly-A tail to the
pre-mRNA chain, followed by splicing. Alternative splicing occurs when
appropriate, increasing the diversity of the proteins that any single mRNA
can produce. The product of the entire transcription process (that began with
the production of the pre-mRNA chain) is a mature mRNA chain.
Translation[edit]
Main article: Translation (genetics)
The mature mRNA finds its way to a ribosome, where it gets translated.
In prokaryotic cells, which have no nuclear compartment, the processes of
transcription and translation may be linked together without clear separation.
In eukaryotic cells, the site of transcription (the cell nucleus) is usually
separated from the site of translation (the cytoplasm), so the mRNA must be
transported out of the nucleus into the cytoplasm, where it can be bound by
ribosomes. The ribosome reads the mRNA triplet codons, usually beginning
with an AUG (adenine−uracil−guanine), or initiator methionine codon
downstream of the ribosome binding site. Complexes of initiation
factors and elongation factors bring aminoacylated transfer RNAs (tRNAs)
into the ribosome-mRNA complex, matching the codon in the mRNA to the
anti-codon on the tRNA. Each tRNA bears the appropriate amino
acid residue to add to the polypeptide chain being synthesised. As the amino
acids get linked into the growing peptide chain, the chain begins folding into
the correct conformation. Translation ends with a stop codon which may be
a UAA, UGA, or UAG triplet.
The mRNA does not contain all the information for specifying the nature of
the mature protein. The nascent polypeptide chain released from the
ribosome commonly requires additional processing before the final product
emerges. For one thing, the correct folding process is complex and vitally
important. For most proteins it requires other chaperone proteins to control
the form of the product. Some proteins then excise internal segments from
their own peptide chains, splicing the free ends that border the gap; in such
processes the inside "discarded" sections are called inteins. Other proteins
must be split into multiple sections without splicing. Some polypeptide
chains need to be cross-linked, and others must be attached to cofactors such
as haem (heme) before they become functional.
everse transcription[edit]

Unusual flow of information highlighted in green


Main article: Reverse transcription

Reverse transcription is the transfer of information from RNA to DNA (the


reverse of normal transcription). This is known to occur in the case
of retroviruses, such as HIV, as well as in eukaryotes, in the case
of retrotransposons and telomere synthesis. It is the process by which genetic
information from RNA gets transcribed into new DNA.
RNA replication[edit]
RNA replication is the copying of one RNA to another. Many viruses replicate
this way. The enzymes that copy RNA to new RNA, called RNA-dependent
RNA polymerases, are also found in many eukaryotes where they are involved
in RNA silencing.[4]
RNA editing, in which an RNA sequence is altered by a complex of proteins
and a "guide RNA", could also be seen as an RNA-to-RNA transfer.
Direct translation from DNA to protein[edit]
Direct translation from DNA to protein has been demonstrated in a cell-free
system (i.e. in a test tube), using extracts from E. coli that contained ribosomes,
but not intact cells. These cell fragments could synthesize proteins from single-
stranded DNA templates isolated from other organisms (e,g., mouse or toad),
and neomycin was found to enhance this effect. However, it was unclear
whether this mechanism of translation corresponded specifically to the genetic
code.[5][6]

Transfers of information not explicitly covered in the theory[edit]


Posttranslational modification[edit]
Main article: Posttranslational modification
After protein amino acid sequences have been translated from nucleic acid
chains, they can be edited by appropriate enzymes. Although this is a form of
protein affecting protein sequence, not explicitly covered by the central dogma,
there are not many clear examples where the associated concepts of the two
fields have much to do with each other.
Inteins[edit]
Main article: Intein
An intein is a "parasitic" segment of a protein that is able to excise itself from
the chain of amino acids as they emerge from the ribosome and rejoin the
remaining portions with a peptide bond in such a manner that the main protein
"backbone" does not fall apart. This is a case of a protein changing its own
primary sequence from the sequence originally encoded by the DNA of a gene.
Additionally, most inteins contain a homing endonuclease or HEG domain
which is capable of finding a copy of the parent gene that does not include the
intein nucleotide sequence. On contact with the intein-free copy, the HEG
domain initiates the DNA double-stranded break repair mechanism. This
process causes the intein sequence to be copied from the original source gene to
the intein-free gene. This is an example of protein directly editing DNA
sequence, as well as increasing the sequence's heritable propagation.
Methylation[edit]
Main article: Epigenetics
Variation in methylation states of DNA can alter gene expression levels
significantly. Methylation variation usually occurs through the action of
DNA methylases. When the change is heritable, it is considered epigenetic.
When the change in information status is not heritable, it would be a somatic
epitype. The effective information content has been changed by means of the
actions of a protein or proteins on DNA, but the primary DNA sequence is not
altered.
Prions[edit]
Main article: Prion
Prions are proteins of particular amino acid sequences in particular
conformations. They propagate themselves in host cells by
making conformational changes in other molecules of protein with the same
amino acid sequence, but with a different conformation that is functionally
important to the cell. Once the protein has been transconformed to the prion
folding it changes function. In turn it can convey information into new cells and
reconfigure more functional molecules of that sequence into the alternate prion
form. In some types of prion in fungi this change is continuous and direct; the
information flow is Protein → Protein.
Some scientists such as Alain E. Bussard and Eugene Koonin have argued that
prion-mediated inheritance violates the central dogma of molecular
biology.[7][8] However,Rosalind Ridley in Molecular Pathology of the
Prions (2001) has written that "The prion hypothesis is not heretical to the
central dogma of molecular biology—that the information necessary to
manufacture proteins is encoded in the nucleotide sequence of nucleic acid—
because it does not claim that proteins replicate. Rather, it claims that there is a
source of information within protein molecules that contributes to their
biological function, and that this information can be passed on to other
molecules."[9]
RNA and Drug Discovery
Sheela Tandon & VK Vohra
S&T Knowledge Resource Centre, CDRI, Lucknow-226001

Central dogma vs drug designing:

Ribonucleic acid (RNA) a macromolecule present in all the living cells, transporter of
genetic information from DNA to protein that determines the structure and function of the
cell, catalyzes chemical reactions and can alter the expression of proteins which may lead to
various diseases.
Living cells store their hereditary information in the form of double-stranded
deoxyribonucleic acid (DNA) molecules. The DNA in genomes does not direct protein
synthesis itself but instead uses RNA as intermediary molecules when the particular cell
needs a specific protein. Nucleotide sequence of the appropriate portion of the immensely
long DNA molecule in a chromosome is first copied into m-RNA a process called
transcription. The copies of m-RNA segments of the DNA are used directly as templates to
direct the synthesis of the protein in a process called translation.
The flow of genetic information in cells is therefore from DNA to RNA to protein. All
cells from bacteria to humans, express their genetic information in this way – a principal so
fundamental that it is termed as the “Central Dogma” of molecular biology. The process of
regulation of gene expression is – that how cells “know” to make the right proteins at the
right time in right amounts is the major focus of current research in molecular biology.
Despite the universality of the “Central Dogma”, there are important variations in the
way information flows from DNA to protein. Principal among these is that RNA transcripts
(pre m-RNA) in eukaryotic cells are subject to a series of processing steps in the nucleus
before the formation of mature m-RNA, which serves as template for protein synthesis.
The protein coding sequences in the eukaryotic genes are typically interrupted by non-
coding intervening sequences, discovered in 1977. This feature of eukaryotic genes came as a
surprise to scientists who had been until that time, familiar only with bacterial genes,
typically consisting of continuous stretches of coding DNA that is directly transcribed into m-
RNA. In humans and other complex metazoans, the vast majority of protein-coding genes
contain many segments (introns) that are part of the primary transcript (pre m-RNA) but are
not included in mature m-RNA. The removal of introns and joining together of the sequences
(exons) included in the final mature m-RNA is accomplished by pre-m-RNA splicing.
The identification of exons and the execution of splicing reaction is mediated by the
spliceosomes, a molecular complex composed of five snRNP (small nuclear RNA proteins),
and a range of non-snRNP associated protein factors.
Alternative splicing is a process by which the exons of the pre-m-RNA transcripts
produced by transcription of a gene are reconnected in multiple ways. The resulting m-RNAs
may be translated into different protein isoforms; thus, a single gene may code for multiple
proteins. Alternative splicing occurs as a normal phenomenon in eukaryotes, where it greatly
increases the diversity of proteins that can be encoded by the genome. In humans, over 80%
of genes are alternatively spliced. There are numerous modes of alternative splicing observed,
of which the most common is exon skipping. In this mode, a particular exon may be included
in m-RNAs under some conditions or in particular tissues, and omitted from the m-RNA in
others.
The splicing of pre-m-RNA to m-RNA is a critical step in the expression of the majority
of mammalian genes. Spliceosome, catalyzes the excision of intervening intron sequences
and joining of the exon sequences. A typical human and mouse gene contains eight to ten
exons, which can be joined in different arrangements by alternative splicing (AS). Recent
computational studies have estimated that one- to two-thirds of human and mouse genes
contain at least one alternative exon. It is widely assumed that AS is a key step in the
generation of proteomic diversity in more complex organisms. AS can increase the coding
capacity of the genome without increasing the number of genes.
Alternative splicing is known to play numerous critical roles in regulatory pathways in
metazoans, including those controlling cell growth, cell death, differentiation and
development, and its mis-regulation has been implicated in many life-threatening human
diseases. Many human gene mutations affect the splicing pattern of that gene. For example, a
mutation in the sequence at an intron/exon junction that is recognized by the spliceosomes
can cause this junction to be ignored. This causes splicing to occur to the next exon in line,
leaving out the exon next to the mutation. This exon skipping usually results in an m-RNA
that codes for a non-functional protein. Exon skipping and other errors in splicing are seen in
many human genetic diseases (Table 1).
Mutations that disrupt any of the components of RNPs, either RNA or proteins or the
factors required for their assembly can be deleterious to cells and cause disease. To identify
physiologically and diseases-relevant AS events and to determine where and when these
occur, what their specific roles are, and how they are regulated is a priority research area.
In this post-genomic era of biological sciences, it is more imperative than ever to utilize
human DNA sequencing data in the process of drug design, which starts with target
identification and validation. For decades, the pharmaceutical industry has been designing
small molecules, peptides, and antibodies to inhibit clinically-relevant, human protein targets,
many of which were identified and validated in the pre-genomic era. However, for a
multitude of reasons, many clinically-relevant, human proteins are not druggable. Drug
researchers continue to search for novel therapeutic modalities that can inhibit with greater
potency, efficacy, and can be developed in less time and more cost-effectively. The most
recent mission has been to target non-protein biomolecules—the most common of which is
RNA—with inhibitory nucleic acids. However, this attempt is not a new one. The use of
antisense nucleic acids to inhibit protein translation from complementary, clinically-relevant
RNA in human cells has been in existence for many years. Other therapeutic modalities in
this category include aptamers, ribozymes, and RNAi (a small inhibitory RNA molecule, or
siRNA).
There are a number of scientific and economical reasons for this trend shift in target
identification. RNA offers a unique way to get at many drug targets that are currently un-
druggable, but are very well validated. Some of therapeutic approaches that use or target
RNAs are –

• Antisense RNA
• RNA interference
• Small Molecules
• RNA Aptamers
• micro-RNAs

Antisense RNA
Antisense therapy is a form of treatment for genetic disorders or infections. When the
genetic sequence of a particular gene is known to be causative of a particular disease, it is
possible to synthesize a strand of nucleic acid (DNA, RNA or a chemical analogue) that will
bind to the messenger RNA (m-RNA) produced by that gene and inactivate it, effectively
turning that gene "off". This is because m-RNA has to be single stranded for it to be
translated. Alternatively, the strand might be targeted to bind a splicing site on pre-m-RNA
and modify the exon content of an m-RNA.

Table 1: The Affects of Alternative Splicing on Disease


Disorder Gene Missense Nonsense Translationally
Silent
Acute intermittent Porphobilinogen R28R(C→G, 3)
porphyria deaminase
Breast and ovarian BRCA1 E139K (G→T,18)
cancer
Carbohydrate-deficient PMM2 E139K (G→A,5)
glycoprotein type 1a
Cerbotendinous Sterol-27- E60X (G→T,3); G112G (G→T, 2)
xanthomatosis hydroxylase
Cystic fibrosis CFTR R75X (C→T,3);
R553X (C→T,11);
W1228X(G→A,20);

Ehlers-danlos syndrome Lysyl hydroxylase Y511X (C→A,14);


type V1
Fanconi anemia FANCG Q356X (C→T,8)

Frontotemporal Tau S305N(G→A, L284L (T→C10)


dementia (FTDP-17) 10) S305S (T→C10)
N297K
(T→G,10)

Hemophilia A Factor VIII E1978X (G→T,19)


R2116X(C→T,22)
HPRT deficiency Hypoxanthine G40V(G→T,2)
phosphoribosyl R48H(G→A,3)
transferase
A161E
(C→A,6)
P184L(C→T,8)
D194Y(G→T,8)
E197K(G→A,8)
E197V(A→T,8)

Leigh’s Pyuvate G185G(A→G,6)


encephalomyelopathy dehydrogenase E1α
Marfan syndrome Fibrilin-1 121181(C→T,51)
Metachromatic Arylsulfatase A T4091 (C→T,8)
leukodystrophy(juvenile
form)
Neurofibromatosis type NF1 R304X(C→T,7)
1 Q756X(C→T,14)
Y2264X(C→A,37)
OCT deficiency Ornithine L304F(G→T,9)
carbamoyltransferase
Porphyria cutanea tarda Uroporphyrinogen E314E(G → A,9)
decarboxylase
Sandhoff disease Hexosaminidase P404L(C →
T,11)
Severe combined Adenosine R142Q(G → R142X(C → T,5)
immunodeficiency deaminase A,5)
Spinal muscle atrophy SMN1 W102X(G →A,3)
Spinal muscle atrophy SMN2 F280F(C→T,7)
Tyrosinemia type1 Fumaryl acetoacetate Q279R(A→G,8) N232N(C→T,8)
hydrolase

(Trends in Genetics, Vol.18, No. 4, April 2002, p.186)

This synthesized nucleic acid is termed an "anti-sense" oligonucleotide because its base
sequence is complementary to the gene's messenger RNA (m-RNA), which is called the
"sense" sequence.
Antisense drugs are being researched to treat cancers (including lung cancer, colorectal
carcinoma, pancreatic carcinoma, malignant glioma and malignant melanoma), diabetes,
ALS, Duchenne muscular dystrophy and diseases such as asthma and arthritis with an
inflammatory component. Most potential therapies have not yet produced significant clinical
results, though one antisense drug, fomivirsen, has been approved by the US FDA as a
treatment for Cytomegalovirus retinitis.
RNA Interference
The capacity to selectively eliminate an m-RNA of a disease causing allele or to prevent
translation of a deleterious protein by

RNAi (RNA Interference) presents a wide range of targets for therapeutic modulation. RNAi
relies on the base pairing interaction of 21-23 nucleotide RNAs, a size sufficient to uniquely
target an m-RNA or even a specific splice variant, and provides a versatile and potent tool.
RNAi-based strategies are applicable to all diseases in which decreasing expression of an
RNA, whether from a mutant allele or an aberrantly expressed m-RNA, would have
therapeutic effects. Great progress has been made toward translating the expertise of RNAi
from an extensively used experimental tool to an effective and safe treatment. The main
challenges again are optimal delivery to the appropriate tissues and cells, avoiding the
cellular antiviral response to double-stranded RNA, and achieving the optimal balance of
high potency without off-target effects.

Table2: Selected RNA-based Therapies in Development

Company Programme Indication Status


Antisense
Isis ISIS301012 High cholesterol Phase II
ISIS 113715 Diabetes Phase II
OncoGenex, Isis OGX-011 Cancer Phase II
Eli Lilly, Isis LY 2181308 Cancer Phase II
AVI BioPharma Resten Restenosis Phase II
AVI-5126 CABG Phase I/II
AVI-4065 Hepatitis C Phase II
Topigen TPI-ASM8 Asthma Phase I
Lorus Therapeutics GTI-2040 Renal cell carcinoma Phase II
Aptamer
Archemix ARC1779 Acute coronary syndrome,percutaneous Phase I
coronary intervention

Antisoma,Archemix AS 1411 Renal cancer, acute myeloid leukaemia Phase II

Small-interfering RNA

Opko Health Bevasiranib (C Wet AMD Phase III


and 5)
Allergen AGN 211745 Wet AMD Phase II
(Sirna-027)
Silence Therapeutics, RTP 801i Wet AMD Phase I
Quark Biotech, Pfizer

Alnylam ALN-RSV01 RSV infections Phase II

RSV=respiratory syncytal virus; AMD= age related macular degeneration


(Nature Reviews: Drug Discovery, Vol.6, Nov., 2007,p. 863)

RNA as Target for Small Molecules


Alternative splicing is an attractive target for pharmacological intervention with small
molecules. AS splicing of most of the introns is strongly dependent an serine-argenine rich
(SR) proteins and hnRNP proteins. Small molecules that affect their activities or their relative
amounts in the nucleus can profoundly modify splicing.
RNA Aptamers
Aptamers are nucleic acids or species that have been engineered through repeated rounds
of in vitro selection SELEX(systematic evolution of ligands by exponential enrichment) to
bind to a specific target molecule. RNA aptamers have been shown to bind to proteins and
perturb their function with a very high specificity and affinity making their potential high for
use as therapeutic drugs and research tools. A system has been designed for in vitro selection
(SELEX) of an RNA aptamer to maximize its binding capacity for a specified protein, which
then is applied to a novel expression system that uses specific genetic constructs, designs and
promoters along with transgenic techniques to produce either mono-or multivalent aptamers
used as conditional alleles in vivo.

micro-RNA (mi-RNA)
mi-RNAs are believed to regulate the expression of approximately 30% of all human
genes. Thus, in contrast to antisense and RNAi, which target single genes, targeting mi-RNAs
has the potential of addressing whole disease pathways.
The normal function of the cell depends on accurate expression of various protein-coding
and non-coding RNAs. These RNAs participate in transcription and translation. The RNPs
are the functional forms of the corresponding RNAs and their normal activity depends on
both the specific composition and the precise arrangement of their protein constituents. As
there are numerous RNAs and a very large number of RNA-binding proteins, the biogenesis
of RNPs must be orchestrated with great fidelity. Disrupted functions of RNAs and RNPs are
the cause of numerous maladies.
Reversal of defective protein or restoration of normal protein production can be achieved
more efficaciously by eliminating or redirecting the splicing of pre-m-RNA.
RNA-based strategies offer a series of novel therapeutic applications including altered
processing of the target pre-m-RNA transcript, reprogramming of genetic defects through m-
RNA repair, and the targeted silencing of allele-or isoforms-specific gene transcripts.

You might also like