Professional Documents
Culture Documents
Docking of a small molecule (green) into the crystal structure (PDB: 3SN6) of
the beta-2 adrenergic G-protein coupled receptor.
The associations between biologically relevant molecules such
as proteins, nucleic acids, carbohydrates, and lipids play a central role in signal
transduction. Furthermore, the relative orientation of the two interacting
partners may affect the type of signal produced (e.g., agonism vs antagonism).
Therefore, docking is useful for predicting both the strength and type of signal
produced.
Molecular docking is one of the most frequently used methods in structure-
based drug design, due to its ability to predict the binding-conformation
of small molecule ligands to the appropriate target binding site. Characterisation
of the binding behaviour plays an important role in rational design of drugs as
well as to elucidate fundamental biochemical processes.[2]
One can think of molecular docking as a problem of “lock-and-key”, in which
one wants to find the correct relative orientation of the “key” which will open
up the “lock” (where on the surface of the lock is the key hole, which direction
to turn the key after it is inserted, etc.). Here, the protein can be thought of as
the “lock” and the ligand can be thought of as a “key”. Molecular docking may
be defined as an optimization problem, which would describe the “best-fit”
orientation of a ligand that binds to a particular protein of interest. However,
since both the ligand and the protein are flexible, a “hand-in-glove” analogy is
more appropriate than “lock-and-key”.[3] During the course of the docking
process, the ligand and the protein adjust their conformation to achieve an
overall "best-fit" and this kind of conformational adjustment resulting in the
overall binding is referred to as "induced-fit".[4]
Molecular docking research focusses on computationally simulating
the molecular recognition process. It aims to achieve an optimized conformation
for both the protein and ligand and relative orientation between protein and
ligand such that the free energy of the overall system is minimized.
Docking approaches[edit]
Two approaches are particularly popular within the molecular docking
community. One approach uses a matching technique that describes the protein
and the ligand as complementary surfaces.[5][6][7] The second approach simulates
the actual docking process in which the ligand-protein pairwise interaction
energies are calculated.[8] Both approaches have significant advantages as well
as some limitations. These are outlined below.
Shape complementarity[edit]
Geometric matching/ shape complementarity methods describe the protein and
ligand as a set of features that make them dockable.[9] These features may
include molecular surface / complementary surface descriptors. In this case, the
receptor’s molecular surface is described in terms of its solvent-accessible
surface area and the ligand’s molecular surface is described in terms of its
matching surface description. The complementarity between the two surfaces
amounts to the shape matching description that may help finding the
complementary pose of docking the target and the ligand molecules. Another
approach is to describe the hydrophobic features of the protein using turns in the
main-chain atoms. Yet another approach is to use a Fourier shape descriptor
technique.[10][11][12] Whereas the shape complementarity based approaches are
typically fast and robust, they cannot usually model the movements or dynamic
changes in the ligand/ protein conformations accurately, although recent
developments allow these methods to investigate ligand flexibility. Shape
complementarity methods can quickly scan through several thousand ligands in
a matter of seconds and actually figure out whether they can bind at the
protein’s active site, and are usually scalable to even protein-protein
interactions. They are also much more amenable to pharmacophore based
approaches, since they use geometric descriptions of the ligands to find optimal
binding.
Simulation[edit]
Simulating the docking process as such is much more complicated. In this
approach, the protein and the ligand are separated by some physical distance,
and the ligand finds its position into the protein’s active site after a certain
number of “moves” in its conformational space. The moves incorporate rigid
body transformations such as translations and rotations, as well as internal
changes to the ligand’s structure including torsion angle rotations. Each of these
moves in the conformation space of the ligand induces a total energetic cost of
the system. Hence, the system's total energy is calculated after every move.
The obvious advantage of docking simulation is that ligand flexibility is easily
incorporated, whereas shape complementarity techniques must use ingenious
methods to incorporate flexibility in ligands. Also, it more accurately models
reality, whereas shape complimentary techniques are more of an abstraction.
Clearly, simulation is computationally expensive, having to explore a large
energy landscape. Grid-based techniques, optimization methods, and increased
computer speed have made docking simulation more realistic.
Mechanics of docking[edit]
Docking flow-chart overview
To perform a docking screen, the first requirement is a structure of the protein
of interest. Usually the structure has been determined using a biophysical
technique such as x-ray crystallography or NMR spectroscopy, but can also
derive from homology modeling construction. This protein structure and a
database of potential ligands serve as inputs to a docking program. The success
of a docking program depends on two components: the search algorithm and
thescoring function.
Search algorithm[edit]
Main article: Searching the conformational space for docking
The search space in theory consists of all possible orientations
and conformations of the protein paired with the ligand. However, in practice
with current computational resources, it is impossible to exhaustively explore
the search space—this would involve enumerating all possible distortions of
each molecule (molecules are dynamic and exist in an ensemble of
conformational states) and all possible rotational and translational orientations
of the ligand relative to the protein at a given level of granularity. Most docking
programs in use account for the whole conformational space of the ligand
(flexible ligand), and several attempt to model a flexible protein receptor. Each
"snapshot" of the pair is referred to as a pose.
A variety of conformational search strategies have been applied to the ligand
and to the receptor. These include:
Docking assessment[edit]
The interdependence between sampling and scoring function affects the docking
capability in predict plausible poses or binding affinities for novel compounds.
Thus, an assessment of a docking protocol is generally required (when
experimental data is available) to determine its predictive capability. Docking
assessment can be performed using different strategies, such as:
Applications[edit]
A binding interaction between a small molecule ligand and an enzyme protein
may result in activation or inhibition of the enzyme. If the protein is a receptor,
ligand binding may result in agonism or antagonism. Docking is most
commonly used in the field of drug design — most drugs are
small organic molecules, and docking may be applied to:
Abstract
“ The Central Dogma. This states that once ‘information’ has passed into
protein it cannot get out again. In more detail, the transfer of
information from nucleic acid to nucleic acid, or from nucleic acid to
protein may be possible, but transfer from protein to protein, or from
protein to nucleic acid is impossible. Information means here the precise
determination of sequence, either of bases in the nucleic acid or of
amino acid residues in the protein. ”
— Francis Crick, 1956
and re-stated in a Nature paper published in 1970:[2]
“ The central dogma of molecular biology deals with the detailed residue-
by-residue transfer of sequential information. It states that such
information cannot be transferred back from protein to either protein or
nucleic acid. ”
— Francis Crick
The central dogma has also been described as "DNA makes RNA and RNA
makes protein,"[3] originally termed the sequence hypothesis and made as a
positive statement by Crick. However, this simplification does not make it clear
that the central dogma as stated by Crick does not preclude the reverse flow of
information from RNA to DNA, only ruling out the flow from protein to RNA
or DNA. Crick's use of the word dogma was unconventional, and has been
controversial.
The dogma is a framework for understanding the transfer
of sequence information between information-carrying biopolymers, in the most
common or general case, in living organisms. There are 3 major classes of such
biopolymers: DNA and RNA (both nucleic acids), and protein. There are 3×3 =
9 conceivable direct transfers of information that can occur between these. The
dogma classes these into 3 groups of 3: 3 general transfers (believed to occur
normally in most cells), 3 special transfers (known to occur, but only under
specific conditions in case of some viruses or in a laboratory), and 3 unknown
transfers (believed never to occur). The general transfers describe the normal
flow of biological information: DNA can be copied to DNA (DNA replication),
DNA information can be copied into mRNA(transcription), and proteins can be
synthesized using the information in mRNA as a template (translation).[2]
• 9External links
DNA replications[edit]
Main article: DNA replication
In the sense that DNA replication must occur if genetic material is to be
provided for the progeny of any cell, whether somatic or reproductive, the
copying from DNA to DNA arguably is the fundamental step in the central
dogma. A complex group of proteins called the replisome performs the
replication of the information from the parent strand to the complementary
daughter strand.
The replisome comprises:
Ribonucleic acid (RNA) a macromolecule present in all the living cells, transporter of
genetic information from DNA to protein that determines the structure and function of the
cell, catalyzes chemical reactions and can alter the expression of proteins which may lead to
various diseases.
Living cells store their hereditary information in the form of double-stranded
deoxyribonucleic acid (DNA) molecules. The DNA in genomes does not direct protein
synthesis itself but instead uses RNA as intermediary molecules when the particular cell
needs a specific protein. Nucleotide sequence of the appropriate portion of the immensely
long DNA molecule in a chromosome is first copied into m-RNA a process called
transcription. The copies of m-RNA segments of the DNA are used directly as templates to
direct the synthesis of the protein in a process called translation.
The flow of genetic information in cells is therefore from DNA to RNA to protein. All
cells from bacteria to humans, express their genetic information in this way – a principal so
fundamental that it is termed as the “Central Dogma” of molecular biology. The process of
regulation of gene expression is – that how cells “know” to make the right proteins at the
right time in right amounts is the major focus of current research in molecular biology.
Despite the universality of the “Central Dogma”, there are important variations in the
way information flows from DNA to protein. Principal among these is that RNA transcripts
(pre m-RNA) in eukaryotic cells are subject to a series of processing steps in the nucleus
before the formation of mature m-RNA, which serves as template for protein synthesis.
The protein coding sequences in the eukaryotic genes are typically interrupted by non-
coding intervening sequences, discovered in 1977. This feature of eukaryotic genes came as a
surprise to scientists who had been until that time, familiar only with bacterial genes,
typically consisting of continuous stretches of coding DNA that is directly transcribed into m-
RNA. In humans and other complex metazoans, the vast majority of protein-coding genes
contain many segments (introns) that are part of the primary transcript (pre m-RNA) but are
not included in mature m-RNA. The removal of introns and joining together of the sequences
(exons) included in the final mature m-RNA is accomplished by pre-m-RNA splicing.
The identification of exons and the execution of splicing reaction is mediated by the
spliceosomes, a molecular complex composed of five snRNP (small nuclear RNA proteins),
and a range of non-snRNP associated protein factors.
Alternative splicing is a process by which the exons of the pre-m-RNA transcripts
produced by transcription of a gene are reconnected in multiple ways. The resulting m-RNAs
may be translated into different protein isoforms; thus, a single gene may code for multiple
proteins. Alternative splicing occurs as a normal phenomenon in eukaryotes, where it greatly
increases the diversity of proteins that can be encoded by the genome. In humans, over 80%
of genes are alternatively spliced. There are numerous modes of alternative splicing observed,
of which the most common is exon skipping. In this mode, a particular exon may be included
in m-RNAs under some conditions or in particular tissues, and omitted from the m-RNA in
others.
The splicing of pre-m-RNA to m-RNA is a critical step in the expression of the majority
of mammalian genes. Spliceosome, catalyzes the excision of intervening intron sequences
and joining of the exon sequences. A typical human and mouse gene contains eight to ten
exons, which can be joined in different arrangements by alternative splicing (AS). Recent
computational studies have estimated that one- to two-thirds of human and mouse genes
contain at least one alternative exon. It is widely assumed that AS is a key step in the
generation of proteomic diversity in more complex organisms. AS can increase the coding
capacity of the genome without increasing the number of genes.
Alternative splicing is known to play numerous critical roles in regulatory pathways in
metazoans, including those controlling cell growth, cell death, differentiation and
development, and its mis-regulation has been implicated in many life-threatening human
diseases. Many human gene mutations affect the splicing pattern of that gene. For example, a
mutation in the sequence at an intron/exon junction that is recognized by the spliceosomes
can cause this junction to be ignored. This causes splicing to occur to the next exon in line,
leaving out the exon next to the mutation. This exon skipping usually results in an m-RNA
that codes for a non-functional protein. Exon skipping and other errors in splicing are seen in
many human genetic diseases (Table 1).
Mutations that disrupt any of the components of RNPs, either RNA or proteins or the
factors required for their assembly can be deleterious to cells and cause disease. To identify
physiologically and diseases-relevant AS events and to determine where and when these
occur, what their specific roles are, and how they are regulated is a priority research area.
In this post-genomic era of biological sciences, it is more imperative than ever to utilize
human DNA sequencing data in the process of drug design, which starts with target
identification and validation. For decades, the pharmaceutical industry has been designing
small molecules, peptides, and antibodies to inhibit clinically-relevant, human protein targets,
many of which were identified and validated in the pre-genomic era. However, for a
multitude of reasons, many clinically-relevant, human proteins are not druggable. Drug
researchers continue to search for novel therapeutic modalities that can inhibit with greater
potency, efficacy, and can be developed in less time and more cost-effectively. The most
recent mission has been to target non-protein biomolecules—the most common of which is
RNA—with inhibitory nucleic acids. However, this attempt is not a new one. The use of
antisense nucleic acids to inhibit protein translation from complementary, clinically-relevant
RNA in human cells has been in existence for many years. Other therapeutic modalities in
this category include aptamers, ribozymes, and RNAi (a small inhibitory RNA molecule, or
siRNA).
There are a number of scientific and economical reasons for this trend shift in target
identification. RNA offers a unique way to get at many drug targets that are currently un-
druggable, but are very well validated. Some of therapeutic approaches that use or target
RNAs are –
• Antisense RNA
• RNA interference
• Small Molecules
• RNA Aptamers
• micro-RNAs
Antisense RNA
Antisense therapy is a form of treatment for genetic disorders or infections. When the
genetic sequence of a particular gene is known to be causative of a particular disease, it is
possible to synthesize a strand of nucleic acid (DNA, RNA or a chemical analogue) that will
bind to the messenger RNA (m-RNA) produced by that gene and inactivate it, effectively
turning that gene "off". This is because m-RNA has to be single stranded for it to be
translated. Alternatively, the strand might be targeted to bind a splicing site on pre-m-RNA
and modify the exon content of an m-RNA.
This synthesized nucleic acid is termed an "anti-sense" oligonucleotide because its base
sequence is complementary to the gene's messenger RNA (m-RNA), which is called the
"sense" sequence.
Antisense drugs are being researched to treat cancers (including lung cancer, colorectal
carcinoma, pancreatic carcinoma, malignant glioma and malignant melanoma), diabetes,
ALS, Duchenne muscular dystrophy and diseases such as asthma and arthritis with an
inflammatory component. Most potential therapies have not yet produced significant clinical
results, though one antisense drug, fomivirsen, has been approved by the US FDA as a
treatment for Cytomegalovirus retinitis.
RNA Interference
The capacity to selectively eliminate an m-RNA of a disease causing allele or to prevent
translation of a deleterious protein by
RNAi (RNA Interference) presents a wide range of targets for therapeutic modulation. RNAi
relies on the base pairing interaction of 21-23 nucleotide RNAs, a size sufficient to uniquely
target an m-RNA or even a specific splice variant, and provides a versatile and potent tool.
RNAi-based strategies are applicable to all diseases in which decreasing expression of an
RNA, whether from a mutant allele or an aberrantly expressed m-RNA, would have
therapeutic effects. Great progress has been made toward translating the expertise of RNAi
from an extensively used experimental tool to an effective and safe treatment. The main
challenges again are optimal delivery to the appropriate tissues and cells, avoiding the
cellular antiviral response to double-stranded RNA, and achieving the optimal balance of
high potency without off-target effects.
Small-interfering RNA
micro-RNA (mi-RNA)
mi-RNAs are believed to regulate the expression of approximately 30% of all human
genes. Thus, in contrast to antisense and RNAi, which target single genes, targeting mi-RNAs
has the potential of addressing whole disease pathways.
The normal function of the cell depends on accurate expression of various protein-coding
and non-coding RNAs. These RNAs participate in transcription and translation. The RNPs
are the functional forms of the corresponding RNAs and their normal activity depends on
both the specific composition and the precise arrangement of their protein constituents. As
there are numerous RNAs and a very large number of RNA-binding proteins, the biogenesis
of RNPs must be orchestrated with great fidelity. Disrupted functions of RNAs and RNPs are
the cause of numerous maladies.
Reversal of defective protein or restoration of normal protein production can be achieved
more efficaciously by eliminating or redirecting the splicing of pre-m-RNA.
RNA-based strategies offer a series of novel therapeutic applications including altered
processing of the target pre-m-RNA transcript, reprogramming of genetic defects through m-
RNA repair, and the targeted silencing of allele-or isoforms-specific gene transcripts.