You are on page 1of 12

Abstract

There are several enzymes used for bioremediation of xenobiotics such


as phenols, anilines, etc. Their broad substrate specificity offers a wide opportunity for
screening pollutants in order to predict potential targets for degradation. However, there are
some concerns that the products of biodegradation may be more persistent or toxic than the
parent compound. Present study utilizes protein-ligand docking of Franconibacter sp. DJ34
with different xenobiotics as a tool to achieve the said. Herein we report the modelled
structures of franconibacter sp. DJ34 and their validation using Root Mean Square Deviation
(RMSD) and Ramachandran plot analyses. Further, the molecular interactions of the
modelled enzyme with different xenobiotics were analysed using molecular docking
technique. The results of molecular docking showed that the long chain alkanes, medium
chain alkanes and short chain alkanes had the binding affinities of -10.7, -9.0 and -10.8
kcal/mol, respectively. A correlation between the binding affinity of the enzyme with
pollutants and the enzyme activity was also demonstrated. The results showed that the
binding affinities of this enzyme with xenobiotics could be used as a potential tool in
bioremediation. The results obtained could be used in virtual screening of pollutants based on
the molecular interactions with the substrate, and aid in developing systems biology models
of alkane monooxygenase for bioremediation applications. The structure of alkane
monoxygenase enzyme was modelled using homology modelling . Two-dimensional
structures of pollutants were downloaded from the NCBI PubChem, which were
further converted into three-dimensional structures using OPENBABEL. Protein-
ligand docking was carried out using autodock vina . Nearly 70 and 60% of the
selected datasets showed the best average score for aliphatic and aromatic
pollutants respectively, suggesting thereby that it might be able to oxidize these
pollutants. Moreover, in few cases like anthracene, phenanthrene, etc., there is
experimental data to support this hypothesis. Similar kind of work would be
helpful to find putative pollutants for other biodegradative enzymes.

Introduction

Petroleum hydrocarbons are important energy resources used by industry and in our daily life,
whose production contributes highly to environmental pollution. To control such risk,
bioremediation constitutes an environmentally friendly alternative technology that has been
established and applied. It constitutes the primary mechanism for the elimination of hydrocarbons
from contaminated sites by natural existing populations of microorganisms. Petroleum or crude oil is
a complex mixture of hydrocarbons. Annually, millions of tons of crude petroleum oil enter the
marine environment from either natural or artificial sources. Hydrocarbon-degrading bacteria are
able to assimilate and metabolize hydrocarbons present in petroleum. The effects of environmental
conditions on the microbial degradation of hydrocarbons and the effects of hydrocarbon
contamination on microbial communities are areas of great interest(Rahman et al. 2004).
Bioremediation is a strategy to utilize biological activities to the greatest extent possible for the rapid
elimination of environmental pollutants. Stimulation of the growth of indigenous microorganisms,
biostimulation and inoculation with foreign oil-degrading bacteria are promising means of
accelerating the detoxification of a polluted site with minimal impact on the ecological
systems(Cappello et al. 2006). The growth of microorganisms on hydrocarbons presents particular
problems because hydrocarbons are immiscible in water. Many bacteria are able to emulsify
hydrocarbons in solution by producing surface active agents such as biosurfactants that increase the
adhesion of cells to the substrate. Biosurfactants reduce the surface tension by accumulating at the
interface of immiscible fluids, increasing the surface area of insoluble compounds, which leads to
increased bioavailability and subsequent biodegradation of the hydrocarbons(Batista et al. 2006).
Alkanes are major components of crude oil. Alkane hydroxylase is a key enzyme involved in alkane
degradation. This enzyme, which introduces an oxygen atom derived from molecular oxygen into the
alkane substrate, plays an important role in crude oil bioremediation(Van Beilen et al. 2003). Alkane
hydroxylase genes are classified into three groups based on phylogenetic analysis. The group (I)
alkane hydroxylases, encoded by alk-B genes, catalyze the degradation of short-chain n-alkanes (C6–
C12). The group (II) alkane hydroxylases, encoded by alk-M genes, catalyze the degradation of
medium-chain n-alkanes (C8–C16), and the group (III) alkane hydroxylases, encoded by alk-B genes,
catalyze the degradation of long-chain n-alkanes (>C16)(Kohno et al. 2002). Examples of strains
capable of growing with nalkanes as a sole carbon source include Alcanivorax borkumensis SK2,
which shows growth on C6 to C16 nalkanes(Hara et al. 2004); Rhodocuccus sp. Q15, which grows on
C8 to C32(Whyte et al. 2002); and Acinetobacter sp. M- 1, which can degrade C13 to C44 n-
alkanes(Sakai et al. 1994). One well-studied system for aerobic nalkane degradation is the alk system
of Pseudomonas putida GPo1(Baptist et al. 1963; Eggink et al. 1987b), which is encoded by genes
found on the octane (OCT) plasmid(Chakrabarty et al. 1973; van Beilen et al. 2001). The first step of
n-alkane degradation by this system is catalyzed by AlkB, an integral membrane protein that carries
out a terminal hydroxylation of the n-alkane(Kok et al. 1989). The electrons needed to carry out this
step are delivered to AlkB via a rubredoxin reductase (AlkT) and two rubredoxins (AlkF and AlkG)(van
Beilen et al. 2002b). The resulting alcohol is further converted to a fatty acid via a pathway involving
an alcohol dehydrogenase (AlkJ), an aldehyde dehydrogenase (AlkH), and an acyl CoA synthetase
(AlkK), after which, it enters the β-oxidation pathway(van Beilen et al. 2001). Enzyme systems
homologous to the alk system of P. putida GPo1 have been found in several bacterial species(van
Beilen et al. 2002b), and it has been shown that alkB homologues, generally named alkB or alkM,
sometimes occur as two or more paralogues within the same strain(Whyte et al. 2002; van Beilen et
al. 2004). For example, A. borkumensis carries two alkB homologues, alkB1 and alkB2, which have
been shown to play a role in the degradation of C6 to C12 n-alkanes(Hara et al. 2004). Recently, van
Beilen et al. (2005) showed that the substrate range of AlkB from P. putida GPo1 and AlkB1 from A.
borkumensis AP1 is determined by a specific amino acid in the respective proteins, W55 in the P.
putida AlkB and W58 in the A. borkumensis AlkB1. They showed that if these tryptophanes were
changed to a less bulky residue, e.g., serine or cysteine, the enzymes could catalyze the
hydroxylation of n-alkanes with chain lengths of C14 and C16, whereas the wild-type enzymes could
only degrade nalkanes shorter than C13(van Beilen et al. 2005)

Mechanism for alkane degradation

CHEMOTAXIS TO LINEAR ALKANES

Chemotaxis facilitates the movement of microorganisms toward or away from chemical gradients in
the environment, and this process plays a role in biodegradation by bringing cells into contact with
degradation substrates (Parales and Harwood, 2002; Parales et al., 2008). Alkanes are sources of
carbon and energy for many bacterial species and have been shown to function as chemoattractants
for certain microorganisms. A bacterial Flavimonas oryzihabitans isolate that was obtained from soil
contaminated with gas oil was shown to be chemotactic to gas oil and hexadecane (Lanfranconi et
al., 2003). Similarly, Pseudomonas aeruginosa PAO1 is chemotactic to hexadecane (Smits et al.,
2003). The tlpS gene, which is located downstream of the alkane hydroxylase gene alkB1 in the PAO1
genome, is predicted to encode membranebound methyl-accepting chemotaxis proteins (MCP) that
may play a role in alkane chemotaxis (Smits et al., 2003), although no experimental evidence exists.
Similarly, the gene alkN is predicted to encode an MCP that could be involved in alkane chemotaxis
in P. putida GPo1 (van Beilen et al., 2001). Our recent investigation of the genome sequence of
Alcanivorax dieselolei B-5 (Lai et al., 2012) identified the alkane chemotaxis machinery of
Alcanivorax, which consists of eight cytoplasmic chemotaxis proteins that transmit signals from the
MCP proteins to the flagellar motors (Figure 1). This chemotaxis machinery is similar to that of
Escherichia coli (Parales and Ditty, 2010). However, further investigation is necessary to confirm the
mechanism of alkane chemotaxis in A. dieselolei B-5.

n-ALKANE UPTAKE IN BACTERIA

Although the genes and proteins that enable the passage of aromatic hydrocarbons across the
bacterial outer membrane have been identified (van den Berg, 2005; Mooney et al., 2006; Hearn et
al., 2008, 2009), the active transport mechanisms involved in alkane uptake remain unclear. Previous
reviews(Rojo, 2009) discussed the observation that direct uptake of alkane molecules from the
water phase is only possible for low molecular weight alkanes, which are sufficiently soluble to
facilitate efficient transport into cells. For medium- and long-chain n-alkanes, microorganisms may
gain access to these compounds by adhering to hydrocarbon droplets (which is facilitated by the
hydrophobic cell surface) or by surfactant-facilitated access, as reviewed by Rojo (2009). Surfactants
have been reported to increase the uptake and assimilation of alkanes, such as hexadecane, in liquid
culture (Beal and Betts, 2000; Noordman and Janssen, 2002), but their exact role in alkane uptake is
not fully understood. Bacteria that are capable of oil degradation usually produce and secrete
surfactants of diverse chemical nature that allow alkane emulsification (Yakimov et al., 1998; Peng et
al., 2007, 2008; Qiao and Shao, 2010; Shao, 2010). Based on our understanding of biosurfactant
structure and the mechanism of outer membrane transport, we speculate that biosurfactants may
be excluded from entering the cell and remain in the extracellular milieu. In P. putida, alkL in the alk
operon is postulated to play an important role in alkane transport into the cell (van Beilen et al.,
2004; Hearn et al.,2009). Transcriptome analysis of A. borkumensis Sk2 revealed that the alkane-
induced gene blc, encoding the outer membrane lipoprotein Blc, might be involved in alkane uptake
because it contains a so-called lipocalin domain (Sabirova et al., 2011). When this domain contacts
organic solvents, a small hydrophobic pocket forms and catalyzes the transport of small hydrophobic
molecules. More recently, our genome analysis (Lai et al., 2012) and closer examination of A.
dieselolei B-5indicated that three outer membrane proteins that belong to the long-chain fatty acid
transporter protein (FadL) family are involved in alkane transport (unpublished). The FadL homologs
are present in many bacteria that are involved in the biodegradation of xenobiotics (van den
Berg,2005), which are usually hydrophobic and probably enter cells by a mechanism similar to that
employed for long-chain (LC) fatty acids by FadL in E. coli. DEGRADATION PATHWAYS OF n-ALKANES
The initial terminal hydroxylation of n-alkanes can be carried out by enzymes that belong to different
families. Microorganisms degrading short-chain length alkanes (C2–C4, where the subindex indicates
the number of carbon atoms of the alkane molecule) have enzymes related to methane
monooxygenases (van Beilen and Funhoff, 2007). Strains degrading medium-chain length alkanes
(C5–C17) frequently contain soluble cytochrome P450s and integral membrane non-heme iron
monooxygenases, such as AlkB (Rojo, 2009; Austin and Groves, 2011). Interestingly, alkane
hydroxylases of long-chain length (LC-) alkanes (>C18) are unrelated to the above alkane
hydroxylases as characterized recently. One such hydroxylase, AlmA, is an LCalkane monooxygenase
from Acinetobacter. A second hydroxylase is LadA, which is a thermophilic soluble LC-alkane
monooxygenase from Geobacillus (Feng et al., 2007; Throne-Holst et al., 2007; Wentzel et al., 2007).
The almA gene, which encodes a putative monooxygenase belonging to the flavin-binding family,
was identified from Acinetobacter sp. DSM 17874 (Throne-Holst et al., 2007; Wentzel et al., 2007).
This gene encodes the first experimentally confirmed enzyme that is involved in the metabolism of
LC n-alkanes of C32 and longer. We provided the first evidence that the AlmA of the genus
Alcanivorax functions as an LC-alkane hydroxylase, and found that the gene almA in both A.
hongdengensis A-11-3 and A. dieselolei B-5 strains expressed at high levels to facilitate the effi- cient
degradation of LCn-alkanes (Liu et al., 2011;Wang and Shao, 2012a). The almA gene sequences were
present in several bacterial genera capable of LC n-alkane degradation, including Alcanivorax,
Marinobacter, Acinetobacter, and Parvibaculum (Wang and Shao, 2012b). In addition, similar genes
are found in other genera in GenBank, such as Oceanobacter sp. RED65, Ralstonia spp.,
Mycobacterium spp., Photorhabdus sp., Psychrobacter spp., and Nocardia farcinica IFM10152.
However, few of these genes have been functionally characterized. A unique LC-alkane hydroxylase
from the thermophilic bacterium Geobacillus thermodenitrificans NG80-2 has been characterized.
This enzyme is called LadA and oxidizes C15–C36 alkanes, generating the corresponding primary
alcohols (Feng et al., 2007). The LadA crystal structure has been identified, revealing that LadA
belongs to the bacterial luciferase family, which is two-component, flavin-dependent oxygenase (Li
et al., 2008). LadA is believed to oxidize alkanes by a mechanism similar to that of other flavoprotein
monooxygenases, and its ability to recognize and hydroxylate LC-alkanes most likely results from the
way in which it captures the alkane(Li et al., 2008). Therefore, the hydroxylases involved in LC-alkane
degradation appear to have evolved specifically, which is in contrast with other alkane
monooxygenases such as AlkB and P450. Interestingly, branched-chain alkanes are thought to be
more difficult to degrade than linear alkanes (Pirnik et al., 1974). However, Alcanivorax bacteria
efficiently degrade branched alkanes (Hara et al., 2003). In A. borkumensis SK2, isoprenoid
hydrocarbon (phytane) strongly induces P450 (a) and alkB2 (Schneiker et al., 2006). In a previous
report, we found that both pristane and phytane activate the expression of alkB1 and almA in A.
dieselolei B-5 (Liu et al., 2011). In A. hongdengensis A-11-3, we recently found that pristane
selectively activates the expression of alkB1, P450-3 and almA (Wang and Shao, 2012a). However,
the metabolic pathways that mediate this activity are poorly understood, although they may involve
the ω- or β-oxidation of the hydrocarbon molecule (Watkinson and Morgan, 1990). REGULATION OF
ALKANE-DEGRADATION PATHWAYS The expression of the bacterial genes involved in alkane
assimilation is tightly regulated. Alkane-responsive regulators ensure that alkane degradation genes
are induced only in the presence of the appropriate hydrocarbons. Many microorganisms(Rojo,
2009; Austin and Groves, 2011) contain several sets of alkane degradation systems, each one being
active on a particular kind of alkane or being expressed under specific physiological conditions. In
these cases, the regulatory mechanisms should assure an appropriate differential expression of each
set of enzymes. The regulators that have been characterized belong to different families, including
LuxR/MalT, AraC/XylS, and other non-related families (Table 1).

GLOBAL REGULATION OF THE ALKANE DEGRADATION PATHWAY


The
expression of alkane degradation pathway genes is often down regulated by complex global
regulatory controls that ensure that the genes are expressed only under the appropriate
physiological conditions or in the absence of any preferred compounds (Rojo, 2009). Two global
regulatory networks exist. One network relies on the global regulatory protein Crc (Yuste and Rojo,
2001), while the other network receives information from cytochrome o ubiquinol oxidase (Cyo),
which is a component of the electron transport chain (Dinamarca et al., 2002, 2003). The Crc is an
RNA-binding protein that interacts with the 5 end of the alkS mRNA,inhibiting translation (Moreno et
al.,2007). A recent study further showed that Crc inhibits the induction of the alkane degradation
pathway by limiting not only the translation of their transcriptional activators but also that of genes
involved in the entire alkane degradation pathway in P. putida (Hernández-Arranz et al., 2013). In
addition, results of this study suggests that Crc follows a multi-step strategy in many cases, targeting
uptake, transcription regulation, and/or the production of the associated pathways’ catabolic
enzymes (Hernández-Arranz et al., 2013). However, when cells grow in a minimal salt medium
containing succinate as the carbon source, the activity of Crc is low; instead, Cyo terminal oxidase
play a key role in the global control that inhibits the induction of the alkane degradation genes
(Yuste and Rojo, 2001; Dinamarca et al., 2003). Cyo is one of the five terminal oxidases that have
been characterized in P. putida. Inactivation of the Cyo terminal oxidase partially relieves the
repression exerted on the alkane degradation pathway under several conditions, while inactivation
of any of the other four terminal oxidases does not (Dinamarca et al., 2002; Morales et al., 2006).
Cyo affects the expression of many other genes, and this enzyme has been proposed to be a
component of a global regulatory network that transmits information regarding the activity of the
electron transport chain to coordinate respiration and carbon metabolism (Petruschka et al., 2001;
Morales et al., 2006). The expression of the cyo genes encoding the subunits of Cyo terminal oxidase
varies depending on oxygen levels and carbon source, and there is a clear correlation between Cyo
levels and the extent of alkane degradation pathway repression (Dinamarca et al., 2003).
Materials and methods:
Modelling franconibacter sp DJ34

BASIC PROTOCOL MODELING of Cronobacter sp. Strain DJ34 FROM Isolated from Crude Oil-
Containing Sludge from the Duliajan Oil Fields, Assam, India

BASED ON A SINGLE TEMPLATE USING MODELLER. MODELLER is a computer program for


comparative protein structure modeling (Sali and Blundell, 1993; Fiser et al., 2000). In the simplest
case, the input is an alignment of a sequence to be modeled with the template structures, the
atomic coordinates of the templates, and a simple script file. MODELLER then automatically
calculates a model containing all non-hydrogen atoms, within minutes on a modern PC and with no
user intervention. Apart from model building, MODELLER can perform additional auxiliary tasks,
including fold assignment, alignment of two protein sequences or their profiles (Marti-Renom et al.,
2004), multiple alignment of protein sequences and/or structures (Madhusudhan et al., 2006;
Madhusudhan et al., 2009), calculation of phylogenetic trees, and de novo modeling of loops in
protein structures (Fiser et al., 2000).

Necessary Resources Hardware:

A computer running RedHat Linux (PC, Opteron or EM64T/Xeon64 systems) or other version of
Linux/Unix (x86/x86_64 Linux), Apple Mac OSX (10.6 or later), or Microsoft Windows (XP or later)

Software:

The MODELLER 9.13 program, downloaded and installed from


http://salilab.org/modeller/download_installation.html (see Support Protocol)

Files:

Sample files required to complete this protocol can be downloaded from http://salilab.org/
modeller/tutorial/basic-example.tar.gz (Unix/Linux) or http://salilab.org/modeller/tutorial/ basic-
example.zip (Windows)

Background to Cronobacter sp. Strain DJ34 — Very few strains of Cronobacter spp. have been
reported from hydrocarbon or industrial waste-contaminated habitats (3, 4). Recently, a Gram-
negative, facultative anaerobic, hydrocarbon degrading strain, Cronobacter sp. DJ34, from crude oil-
containing sludge in the Duliajan oil fields, Assam, India was isolated. The 16S rRNA gene sequence
of the isolate has been deposited in NCBI GenBank under the accession no. KM054665, which
showed 99% sequence similarity with Cronobacter pulveris strain E444 (accession no. EF059835).
Strain DJ34 showed multiple heavy metal resistances, growth under a wide range of pH,
temperature, and salinity conditions, biosurfactant production, and an ability to utilize various
electron acceptors during anaerobic growth.

Searching structures related to dj34

Conversion of sequence to PIR file format: It is first necessary to convert the target sequence into a
format that is readable by MODELLER. MODELLER uses the PIR format to read and write sequences
and alignments. The first line of the PIR-formatted sequence consists of >P1; followed by the
identifier of the sequence. the sequence is identified by the code dj34. The second line, consisting of
ten fields separated by colons, usually shows details about the structure. In the case of sequences
with no structural information, only two of these fields are used: the first field should be sequence
(indicating that the file contains a sequence without a known structure) and the second should
contain the model file name (dj34 in this case). The rest of the file contains the sequence of dj34,
with an asterisk (*) marking its end. The standard uppercase single-letter amino acid codes are used
to represent the sequence.

Searching for suitable template structures:

A search for potentially related sequences of known structure can be performed using the
profile.build() command of MODELLER (file build_profile.py). The command uses the local dynamic
programming algorithm to identify related sequences (Smith and Waterman, 1981). In the simplest
case, the command takes as input the target sequence and a database of sequences of known
structure (file pdb_95.pir) and returns a set of statistically significant alignments. The script,
build_profile.py, does the following:

1. Initializes the “environment” for this modeling run by creating a new environ object (called env
here).

2. Creates a new sequence_db object, calling it sdb, which is used to contain large databases of
protein sequences.

3. Reads a file, in text format, containing nonredundant PDB sequences, into the sdb database. The
sequences can be found in the file pdb_95.pir. This file is also in the PIR format. Each sequence in
this file is representative of a group of PDB sequences that share 95% or more sequence identity to
each other and have less than 30 residues or 30% sequence length difference.

4. Writes a binary machine-independent file containing all sequences read in the previous step.

5. Reads the binary format file back in for faster execution.

6. Creates a new “alignment” object (aln), reads the target sequence dj34 from the file dj34.ali, and
converts it to a profile object (prf). Profiles contain similar information to alignments, but are more
compact and better for sequence database searching.

7. prf.build() searches the sequence database (sdb) with the target profile (prf). Matches from the
sequence database are added to the profile.

8. prf.write() writes a new profile containing the target sequence and its homologs into the specified
output file. The equivalent information is also written out in standard alignment format.

Execute the script using the command:

python build_profile.py > build_profile.log (or, if Python is not installed on your machine, with
mod9.13 build_profile.py). At the end of the execution, a log file is created (build_profile.log).
MODELLER always produces a log file. Errors and warnings in log files can be found by searching for
the _E> and _W> strings, respectively.

Selecting a template—

In the file build_profile.prf The second column reports the code of the PDB sequence that was
aligned to the target sequence. The eleventh column reports the percentage sequence identities
between TvLDH and the PDB sequence normalized by the length of the alignment (indicated in the
tenth column). In general, a sequence identity value above ~25% indicates a potential template,
unless the alignment is too short. Thee PDB sequences show very significant similarities to the query
sequence, with E-values equal to 0. As expected, the hits correspond to alkane monooxygenase
(1bdm:A, 5mdh:A, 1b8p:A, 1civ:A, 7mdh:A, and 1smk:A). To select the appropriate template for the
target sequence, the alignment.compare_structures() command will first be used to assess the
sequence and structure similarity between the three possible templates. In compare.py, the
alignment object aln is created and MODELLER is instructed to read into it the protein sequences
and information about their PDB files. The command malign()calculates their multiple sequence
alignment, which is subsequently used as a starting point for creating a multiple structure alignment
by malign3d(). Based on this structural alignment, the compare_structures() command calculates the
RMS and DRMS deviations between atomic positions and distances, differences between the main-
chain and side-chain dihedral angles, percentage sequence identities, and several other measures.
Finally, the id_table() command writes a file (family.mat) with pairwise sequence distances that can
be used as input to the dendrogram() command (or the clustering programs in the PHYLIP package;
Felsenstein, 1989). dendrogram() calculates a clustering tree from the input matrix of pairwise
distances, which helps visualizing differences among the template candidates.

Aligning dj34 with the template—

To align the sequence with the structure of 1bdm:A is to use the align2d() command in MODELLER
(Madhusudhan et al., 2006). Although align2d() is based on a dynamic programming algorithm
(Needleman and Wunsch, 1970), it is different from standard sequence-sequence alignment
methods because it takes into account structural information from the template when constructing
an alignment.

Model building—

Once a target-template alignment is constructed, MODELLER calculates a 3-D model of the target
completely automatically, using its automodel class. The script in Figure 5.6.9 will generate five
different models of TvLDH based on the 1bdm:A template structure and the alignment in file TvLDH-
1bdmA.ali (file modelsingle.py).

Evaluating a model—

several models are calculated for the same target, the best model can be selected by picking the
model with the lowest value of the MODELLER objective function or the DOPE (Shen and Sali, 2006)
or SOAP (Dong et al., 2013) assessment scores, which are reported at the end of the log file.

Validation of the modeled structure


The overall stereochemical quality of the modeled structures of dj34 was analyzed using
Ramachandran plot analysis and RMSD analysis. The RAMPAGE server
(http://mordred.bioc.cam.ac.uk/~rapper/rampage.php) was used to analyze the
Ramachandran plot [22, 23]. The root mean square deviation (RMSD) was calculated
between the structure of the template and the newly modeled enzyme to compare their
structures. The RMSD of the modeled structure with its template was obtained by
superimposing the modelled enzyme with its respective template structure using PYMOL
viewer.
ligands : -

certain compounds were selected from EPA. Their structures were downloaded from pubchem. Links
are given below .

1.

2.
The sdf structures are then converted to pdb suitable for bioremediation.

Docking protocol :

Molecular docking using autodock vina

AutoDock Vina (version 1.1.2) [34] is used in this project to conduct molecular docking.
Target protein structures are converted to the required PDBQT format using MGL Tools
(version 1.5.4) [31]. Open Babel (version 2.3.1) [49] is used to add polar hydrogens and
partial charges to ligand atoms as well as to convert these molecules to the PDBQT format.
The default box size is calculated following the protocol outlined by the authors of Vina [34].
Briefly, an initial docking box is calculated from the coordinates of a bound ligand in the
crystal structure, and the box dimensions in x, y and z are increased by 10 Å. Additionally,
one of the two directions in each dimension is randomly chosen and further increased by 5 Å.
Finally, if the box size in any dimension is smaller than 22.5 Å, it is extended to this value. In
this study, an experimental binding site is defined as the geometric center of a ligand bound
to the target protein, whereas the computationally predicted binding pocket center is obtained
from eFindSite [9]. Docking simulations using predicted pockets start with a random ligand
conformer generated by obconformer from Open Babel [49]; moreover, the ligand is
randomly spun around all axes in order to avoid providing the docking program with any
structural information on the native binding pose. All ligands are also translated so that their
geometric centers overlap with predicted pocket centers.

Protein data bank benchmark dataset


The benchmarking dataset, referred to as the PDB-bench, is used to optimize box sizes in
order to yield the highest docking accuracy. PDB-bench was compiled from the Protein Small
Molecule Database [50] and the Protein Data Bank (PDB) [51] by including only proteins
50–600 residues in length with the redundancy removed at 40 % pairwise sequence identity
using PISCES [52]. The length constraints are imposed due to the subsequent use of protein
threading, however, these do not exclude pharmacologically relevant molecules such as G-
protein coupled receptors (GPCRs) and protein kinases. Furthermore, we selected those
proteins for which at least three weakly homologous and structurally related ligand-bound
templates were detected by meta-threading using eThread [3]. We note that weak homology
is defined by the maximum sequence identity of 40 %, and the structural similarity of ≥0.4
TM-score [53] as reported by Fr-TM-align [54]. Furthermore, only non-covalently bound
small organic compounds with 6–100 heavy atoms were selected. As the result, a
representative and non-redundant PDB-bench comprises 3,659 experimental structures of
protein-ligand complexes; this dataset is available at www.brylinski.org/content/docking-
box-size.

Optimal box size and ligand radius of gyration


In order to optimize the search space, we perform a series of docking calculations for each
target using a cubic box whose edge lengths range from 2 to 36 Å with a small incremental
step size of 2 Å to ensure a fine-grained sampling. Next, we analyze docking accuracy as a
function of the size of a query compound size by calculating the ratio of the radius of
gyration of a ligand (Rg) to the box size. Rg is defined as follows:
(1)
where N is the total number of ligand heavy atoms, the vector r⇀kr⇀k corresponds to the
Cartesian coordinates of each heavy atom, and r⇀centerr⇀center represents the geometric
center of a ligand.

By default, we calculate Rg for a single low-energy conformer generated for each query


compound by obconformer from Open Babel [49]. For comparison, we also calculated the
average values of Rg ± standard deviation using sets of 100 random rotamers generated by
obrotamer (Open Babel [49]) for PDB-bench ligands.

Directory of useful decoys, enhanced dataset


DUD-E, an enhanced version of the DUD dataset [55], comprises a diverse set of 101
proteins including many pharmacologically important targets such as ion channels and
GPCRs [56]. DUD-E features 22,886 experimentally validated active compounds with an
average number of 224 ligands per each protein target, and over 1,000,000 decoy molecules
at an approximate ratio of 50 per 1 active compound. These decoys have similar chemical
properties yet different topologies than the corresponding active compounds. Therefore, the
DUD-E dataset allows performing rigorous and unbiased tests of docking algorithms, scoring
functions and virtual screening tools [57, 58]. Similar to the PDB-bench dataset, we carried
out docking calculations using experimental pocket centers calculated from 101
representative complex structures included in DUD-E (the D101 set). Furthermore, we
evaluate the accuracy of virtual screening for a subset of 77 proteins whose binding sites
were successfully predicted by eFindSite (the D77 set). A binding site prediction is
considered successful when the distance between the predicted and experimental pocket
center is below 8 Å.

Evaluation metrics for molecular docking and virtual screening


Docking accuracy is assessed by the root-mean-square deviation (RMSD) from the crystal
structure calculated over ligand heavy atoms [59], and the fraction of recovered protein-
ligand contacts. Specific interatomic contacts between ligand and protein heavy atoms are
identified using the LPC program [60]. In addition, we use the fraction of non-specific
contacts between ligand heavy atoms and protein residues, where all atoms belonging to the
same residue are equivalent. More accurate docking predictions are characterized by lower
RMSD values as well as higher fractions of specific and non-specific contacts compared to
those less accurate.

Virtual screening results are assessed by several commonly used evaluation metrics.
Enrichment factors EF1 % and EF10 % count the fraction of actives in the top 1 and 10 % of the
ranked library, respectively. In order to address the “early recognition problem”, we use the
Boltzmann-Enhanced Discrimination of Receiver Operating Characteristics (BEDROC20)
score that calculates 80 % of the enrichment from the top 8 % of the ranked library [61]. In
addition, we evaluate the area under the enrichment curve (AUC) that determines the
discriminative capability by measuring the distribution of actives over the entire library.
Finally, we calculate ACT-50 %, which corresponds to the top fraction of the ranked library
that contains half of the active compounds.

Docking protocol :

The pdb files of both ligand( dodecane) and enzyme(dj34) were converted to pdbqt using
MGL tools and open babel.pdbqt are the files which stores the atomic coordinates,
partial charges and AutoDock atom types, for both the receptor and the ligand.

For long chain alkanes :

Modifying receptor

1. The pdb file of enzyme was opened in autodock tools.


File -> read molecule -> select your enzyme (here dj34)
2. All water molecules are then removed.
Edit -> delete water
3. Add polar hydrogens
Edit -> hydrogen -> add -> merge non polar
4. Add gasteiger charges
Edit -> charges -> compute gasteiger
5. Now this file is ready for docking it is saved in the form of pdbqt
Grid -> macromolecule -> choose -> save as pdbqt.

Modifying ligand

1. The pdb file of ligand was opened in autodock tools.

A set of 6 compounds ( Table 1) 3 aliphatic chains and 3 from the EPA’s (U.S. Environmental
Protection Agency) Chemical Releases and Transfers List, were selected available for various
industries

[URL http://www.epa.gov/compliance/resources/publications/assistance/sectors/notebooks/]. Five


industries namely textile, pulp and paper, pharmaceutical, organic chemical and agricultural
pesticide were selected from EPA using the following criteria. Only land disposals, water discharges
and underground injection chemicals were considered. Metal ions and gases were not included. Only
small molecules (substrates and pollutants) with rotatable bonds ranging from 0 to 15 were selected
because greater number of rotatable bonds may result in incorrect prediction in docking [13].
Compounds with molecular weights ranging from 50 to 600 g/mol were chosen. Another set of 71
substrates (Supplementary Table 2) of laccase were taken from the Brenda Database [14]. Tyrosine
is a known non-substrate for laccase enzyme. In case of Bacillus subtilis, in addition to tyrosine, few
other nonsubstrates have also been reported (Table 3) [22]. These nonsubstrates were thus taken as
negative controls. X-ray crystal structures for laccase enzymes with PDB IDs 1gyc (Trametes
versicolor) [15] and 1uvw (Bacillus subtilis) [16] (resolution 1.7 and 2.5 A˚ , respectively) were taken
from the Brookhaven Protein Data Bank [17]. Trametes versicolor and Bacillus subtilis laccase
structures were co-crystallized with isopropyl alcohol and ABTS, respectively. The catalytic binding
site of laccase enzyme was determined with the help of Insight II (Accelrys Insight II San Diego, CA).
In these crystal structures, all amino acids with at least one atom lying within 10 A˚ distance of any
atom bound to either substrate (ABTS or isopropyl alcohol) were considered to be a part of the
active site pocket. Hetero atoms including cofactors and ligands were removed from the protein
complex except for the copper ion at the active site. Hydrogens were added at appropriate
geometries taking into account the protonation states. Atomic Gasteiger charges were used for the
small molecules, and amber charges were used for protein atoms. Water molecules within the active
site were considered. Protein-ligand docking was carried out using GOLD v3.0 [Genetic Optimisation
for Ligand Docking] [13]. GOLD calculations were performed as previously described [18]. Docking
procedure was performed using both scoring functions (Goldscore and Chemscore). Laccase enzyme
is a copper metalloprotein, and as there are no copper parameters incorporated in the docking
software, copper metal ion geometries were added to the ‘gold.parm’ file. Prior to docking, the
protein and the ligands were fully minimized using the Discover module of Insight II. Twodimensional
structures of selected datasets were downloaded from the NCBI PubChem Database [URL:
http://pubchem.ncbi.nlm.nih.qov/]. Three-dimensional structures were generated using CORINA
[URL: http://www2.chemie.uni-erlanqen.de/ software/corina/index.html].

You might also like