Professional Documents
Culture Documents
Research Repository
A practical guide to molecular docking and homology modelling for medicinal chemists
Published in:
Current Topics in Medicinal Chemistry
DOI:
10.2174/1568026617666170130110827
Licence:
Other
Recommended citation(APA):
Lohning, A. E., Levonis, S. M., Williams-Noonan, B., & Schweiker, S. S. (2017). A practical guide to molecular
docking and homology modelling for medicinal chemists. Current Topics in Medicinal Chemistry, 17(18), 2023-
2040. https://doi.org/10.2174/1568026617666170130110827
General rights
Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners
and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.
For more information, or if you believe that this document breaches copyright, please contact the Bond University research repository
coordinator.
Anna Elizabeth Lohning: Faculty of Health Sciences and Medicine, Bond University, Gold Coast, 4229, Queensland, Australia
Stephan M. Levonis: Faculty of Health Sciences and Medicine, Bond University, Gold Coast, 4229, Queensland, Australia
Billy Williams-Noonan: Faculty of Health Sciences and Medicine, Bond University, Gold Coast, 4229, Queensland, Australia
Stephanie S. Schweiker: Faculty of Health Sciences and Medicine, Bond University, Gold Coast, 4229, Queensland, Australia
Corresponding Author:
Running Title
Byline
Anna E. Lohning. PhD, Stephan M. Levonis, PhD, Billy Williams-Noonan, Stephanie S. Schweiker, PhD.
Elucidating details of the relationship between molecular structure and a particular biological end point is essential for successful
rationally-based drug discovery. Molecular docking is a widely accepted tool for lead identification however navigating the
intricacies of the software can be daunting. Our objective was therefore to provide a step-by-step guide for those interested in
incorporating contemporary basic molecular docking and homology modelling into their design strategy. Three molecular
docking programs, AutoDock4, SwissDock and Surflex-Dock, were compared in the context of a case study where a set of
steroidal and non-steroidal ligands were docked into the human androgen receptor (hAR) using both rigid and flexible target
atoms. Metrics for comparison included how well each program predicted X-ray structure orientation via root mean square
deviation (rmsd), predicting known actives via ligand ranking and comparison to biological data where available. Benchmarking
metrics were discussed in terms of identifying accurate and reliable results. For cases where no three dimensional structure exists
we provided a practical example for creating a homology model using Swiss-Model. Results showed an rmsd between X-ray
ligands from wild-type and mutant receptors and docked poses were 4.15Å and 0.83Å (SwissDock), 2.69Å and 8.80Å
(AutoDock4) and 0.39Å and 0.71Å (Surflex-Dock) respectively. Surflex-Dock performed consistently well in pose prediction
(less than 2Å) while AutoDock4 predicted known active non-steroidal antiandrogens most accurately. Introducing flexibility into
target atoms produced the largest degree of change in ligand ranking in Surflex-Dock. We produced a viable homology model of
the P2X1 purireceptor for subsequent docking analysis.
SwissProt initially searches the PDB for three dimensional The case study involved docking a set of androgen
structures that match your query sequence. The user selects a modulators into the wild type and mutant human androgen
template from the returned matches. In general, the most receptor sites. The database comprised known actives, decoys
suitable template will be that with the highest level of and a novel set of androgen receptor inhibitors.[50] Large
sequence identity. SwissProt operates in three modes, datasets of active compounds as well as decoys specifically
automated, alignment and DeepView Project mode. for the androgen receptor, designed for 3D virtual screening,
were sourced from the database of useful decoys (DUD).[51]
Accuracy of 3D template structure is reflected in the Figure 1: Basic workflow for molecular docking.
resolution (in Å) whereby the lower the better. This ranges
from excellent (~1Å) to poor (>3.5Å). Resolution represents A basic molecular docking workflow is presented in Figure
the average uncertainty for all atoms. Uncertainty increases
with disorder in the protein crystal during the crystallography
process. Note that temperature (or B) factors represent the
uncertainty for individual atoms. A high temperature factor
represents a low empirical electron density for the atom. A
range of acceptable values signifying reasonable position
confidence would be 30Å2-60Å2.
2 Upload ligand files via Prepare single or multiple ligand files in .mol2
the ‘Submit Docking’ format. Alternatively ZINC database identifiers
tab. can be used to access a large ligand library.
(see note below on Add hydrogens, check chirality, protonation
multiple ligand docking) state and topology. Prior energy minimization
of the ligand is not required.
2 Prepare >Applications>docking We recommend using We selected the 4DW0 for a template based on the highest
Target suite>dock ligands
>Select docking mode*.
GeomX for more
exhaustive &
global model quality estimation (GMQE) score of 0.68. This is a
>Define protein accurate docking. quality estimation which combines properties from the target-
1. Extract ligand from .pdb Trade off with time. template alignment. The resulting 3D structure, shown in
Figure 4 (and corresponding sequence alignment in 7 Inspect the alignment (See Notes – Dealing with Gaps in
Supplemental Figure S3) shows sections of secondary and check for gaps. Sequences)
To analyse gaps more Analyse superimposition for any
structure coloured blue for high quality areas and orange for closely:- unresolved residues in the template.
low quality areas. Residues in the alignment can be selected >View Project in Identify positions of any gaps in either
DeepView. (see notes template/query sequence that may impact
interactively which automatically identifies their 3D position on Deepview) on the overall reliability of the model.
in the model. Tip: Comparing secondary structure
alignment is useful. For example, gaps in
loops are usually permissible. Gaps in -
sheets may be more hazardous to the
overall 3D model structure.
4 Select Navigate to the SwissModeld site to start a Dealing with Gaps in the Sequences
>Start Modelling new modelling project.
Gaps in the template sequence are due to insertions (or
deletions) during the sequence alignment process whereas
5 Upload sequence Templates will provide a number of gaps in the query sequence means the two residues flanking
(FASTA format) then possible 3D structures and their sequence
select identities to the query sequence. the gap, while peptide bonded in the 3D model, could be far
>Search for apart in the sequence.
Templates
6 View Output. Select templates to compare When residues in the query sequence are untemplated, that
Compare by selecting superimpositions in the viewer. Note the is, there are no corresponding template residues, these are
the top few matches. %Coverage column. Many sequences and
targets contain only fragments of the entire given a high temperature factor in the 3D model and can be
sequence. It is important to match relevant easily selected for and visualized. Luckily long regions of
domains of your protein for your model
such as that containing the binding site.
untemplated residues can be visually easy to spot as they are
often represented by hairpin loops extending out from the
more compact protein structure.
The consensus between six algorithms of Surflex-Dock Figure 9: MD results from Surflex-Dock for mutant hAR site (PDB ID:
provides an additional layer of interpretation of the results. 2OZ7) in flexible mode. Partial least squares correlation of molecular weight
Total scores and ranking should only be assessed alongside a (r2=0.307) and rotatable bonds (r2=0.214) with total score and coloured by
visual inspection of the docked poses where predicted binding clogP.
interactions can be checked for validity, comparison to those
Interestingly, the decoy ligand, ZINC12848678, scored
that may be important for native ligands or existing inhibitors
second position after the X-ray ligand in the mutant hAR site.
and also if the volume of the grid is sufficient to encapsulate
This relatively small, charged ligand appears to prefer the
the binding pocket. The Cscores are set out in Supplemental
larger site created by the mutation and can take advantage of
hydrogen bonding groups around the site providing plenty of summarized from key features which may be of assistance in
possible polar interactions for ligand stabilization. the selection of a suitable MD program (Table 8). While the
results from our MD case study showed variability in the
Figure 10 shows the orientation of testosterone in the wild metrics used to compare the various programs, produce a clear
type hAR site overlaid with the docked position of winner in terms of reliability of results. Benchmarking
antiandrogen, bicalutamide, and also a triazole non-steroidal between the most commonly used algorithms, such as that
hAR inhibitor, compound 39. The lipophilic moieties are used by AD4 and Surflex-Dock, has shown a good level of
closely aligned in the steroid binding site. The more polar ends performance. From our results we found both AutoDock and
of the non-steroidal inhibitors orient towards a narrow Surflex-Dock produce high quality results. Our results
channel, the end of which is capped by a key helix with a showed AutoDock to be superior for active pose prediction
role in receptor activation. while Surflex-Dock was more consistently able to predict X-
ray ligand orientation. In overall usability however, especially
for multiple ligand docking, we found Surflex-Dock to be
most useful.
Where available, biological activity data for relevant Operating Server/cros Cross platform Cross platform.
System s platform.
ligands was collected and set out in supplementary Table S1.
This data, along with physical and chemical properties of the Applicability Single or Robust and Robust and accurate
ligands, can be further used in a quantitative structure-activity multi accurate single single target– multiple
relationship (QSAR) analysis – a useful technique in design target target/single ligand modelling.
ligand (option
optimization. It must be noted that a robust docking analysis for multiple
would often incorporate the use of a large training dataset of ligand)
actives and decoys which would then be further optimized
through ligand enrichment and visualization of cluster ForceField CHARMM CHARMM AMBER
(a range of options are
orientations and ranking prior to use with a test dataset.
provided).
A limitation of this study was the small ligand database. Advantages Intuitive Freeware, Good level of docking
This was specifically made small for illustrative purposes. online robust, gold accuracy, short run
Furthermore a thorough docking analysis of the hAR would server, standard. times, handles large
widely Benchmarking ligand libraries, GUI,
take advantage of the entire available dataset of active and utilised in shows a good high level of
decoys for the hAR from which incorporating a ligand literature. level of docking parameter control and
enrichment protocol would facilitate a higher likelihood of the accuracy. support.
algorithm’s capacity to select high affinity ligands.
Limitations Server- Long run times, Commercially
based. max. 32 torsion available. Less
Min. angles, limited commonly used in
parameter support via Industry for
control, forum. Separate comparative purposes.
support via GUI for virtual
help screening.
forum.
6. Acknowledgements
The authors wish to acknowledge the valuable advice of
Dr. Luke Henderson in the preparation of this manuscript.
4. Conclusion
7. References
1. DiMasi, J.A., R.W. Hansen, and H.G. Grabowski, The price of innovation: new estimates of drug development costs. J
Health Econ, 2003. 22(2): p. 151-85.
2. Blundell, T.L., Structure-based drug design. Nature, 1996. 384(6604 Suppl): p. 23-27.
3. Stjernschantz, E.O., Chris, Improved Ligand-Protein Binding Affinity Predictions Using Multiple Binding Modes.
Biophys J., 2010. 98(1): p. 2682–2691.
4. Khalili-Araghi, F., Gumbart, James, Wen, Po-Chao, Sotomayor, Marcos. Tajkhorshid, Emad, Schulten,Klaus Molecular
dynamics simulations of membrane channels and transporters. Curr Opin Struct Biol. , 2009. 19(2).
5. Maffeo, C., Bhattacharya, Swati, Yoo, Jejoong, Wells, David, Aksimentiev,Aleksei, Modeling and Simulation of Ion
Channels. Chem. Rev., 2012. 112(12): p. 6250–6284.
6. Wereszczynski J.1, McCammon J.A., Statistical mechanics and molecular dynamics in evaluating thermodynamic
properties of biomolecular recognition. Q Rev Biophys., 2012. 45(1): p. 1-25.
7. Ross, G.A., Morris, G.M., Biggin,P. C., One Size Does Not Fit All: The Limits of Structure-Based Models in Drug
Discovery. J. Chem. Theory Comput., 2013. 9: p. 4266–4274
8. Matteo Aldeghi, A.H., Michael J. Bodkin, Stefan Knapp, Philip C. Biggin, Accurate calculation of the absolute free
energy of binding for drug molecules. Chem. Sci., 2016. 7: p. 207-218.
9. Kleinjung, J. and F. Fraternali, Design and application of implicit solvent models in biomolecular simulations. Curr
Opin Struct Biol, 2014. 25: p. 126-34.
10. Zhang, H., T. Tan, and D. van der Spoel, Generalized Born and Explicit Solvent Models for Free Energy Calculations
in Organic Solvents: Cyclodextrin Dimerization. J Chem Theory Comput, 2015. 11(11): p. 5103-13.
11. Gleeson, M.P. and D. Gleeson, QM/MM Calculations in Drug Discovery: A Useful Method for Studying Binding
Phenomena? Journal of Chemical Information and Modeling, 2009. 49(3): p. 670-677.
12. Anuradha Roy, P.R.M., Sitta Sittampalam, Rathnam Chaguturu, Open Access High Throughput Drug Discovery in the
Public Domain: A Mount Everest in the Making. Curr Pharm Biotechnol., 2010. 11(7): p. 764–778.
13. Charifson, P.S., Corkery, J., Murcko, M.A., Walters, W.P., Consensus scoring: A method for obtaining improved hit
rates from docking databases of three-dimensional structures into proteins. J Med Chem., 1999. 42(25): p. 5100-9.
14. Halperin, I., Ma, Buyong, Wolfson, Haim, Nussinov,Ruth Principles of Docking: An Overviewof Search Algorithms and
a Guide to Scoring Functions. PROTEINS: Structure, Function, and Genetics, 2002. 47: p. 409–443.
15. Steinbrecher, T., Labahn, Andreas Towards Accurate Free Energy Calculations in Ligand Protein-Binding Studies.
Current Medicinal Chemistry, 2010. 17: p. 767-785.
16. Warren GL, A.C., Capelli AM, Clarke B, LaLonde J, Lambert MH, Lindvall M, Nevins N, Semus SF, Senger S, Tedesco
G, Wall ID, Woolven JM, Peishoff CE, Head MS., A Critical Assessment of Docking Programs and Scoring Functions.
J Med Chem., 2006. 49.
17. Philippe Ferrara, H.G., Daniel J. Price, Gerhard Klebe, Charles L. Brooks, Assessing scoring functions for protein-ligand
interactions. J. Med. Chem., 2004. 47.
18. Neves, M.A.C., Docking and scoring with ICM: the benchmarking results and strategies for improvement. J Comput
Aided Mol Des., 2006. 26(6): p. 675–686.
19. Morris, G.M., Huey, R., Lindstrom, W., Sanner, M. F., Belew, R. K., Goodsell, D. S. and Olson, A. J., Autodock4 and
AutoDockTools4: automated docking with selective receptor flexiblity. J. Computational Chemistry, 2009. 16: p. 2785-
91.
20. Grosdidier, A., V. Zoete, and O. Michielin, SwissDock, a protein-small molecule docking web service based on EADock
DSS. Nucleic Acids Res, 2011. 39(Web Server issue): p. W270-7.
21. Grosdidier A1, Z.V., Michielin O., Fast docking using the CHARMM force field with EADock DSS. J Comput Chem.,
2011. 32(10): p. 2149-59.
22. Russell Spitzer, A.N.J., Surflex-Dock: Docking Benchmarks and Real-World Application. J Comput Aided Mol Des.,
2012. 26(2): p. 687–699.
23. Martin Karplus, A.M., Nature Structural Biology 9, 646 - 652 (2002) 2002. 9: p. 646 - 652.
24. Ou-Yang, S.S., Lu, J., KONG, X., Liang, Z. Luo, C., Jiang, H., Review - Computational drug discovery. Acta
Pharmacologica Sinica, 2012. 33: p. 1131–1140.
25. Yuriev, E., Challenges and advances in structure-based virtual screening. Future Medicinal Chemistry, 2014. 6(1): p.
5-7.
26. Trott, O. and A.J. Olson, Software News and Update AutoDock Vina: Improving the Speed and Accuracy of Docking
with a New Scoring Function, Efficient Optimization, and Multithreading. Journal of Computational Chemistry, 2010.
31(2): p. 455-461.
27. Goodsell, D.S. and A.J. Olson, Automated Docking of Substrates to Proteins by Simulated Annealing. Proteins-Structure
Function and Genetics, 1990. 8(3): p. 195-202.
28. International, T., SYBYL-X 2.1.1. 2014. p. Molecular Modelling Suite.
29. Berman, H.M., et al., The Protein Data Bank. Nucleic Acids Research, 2000. 28(1): p. 235-242.
30. Salon, J.A., Lodowski, D.T., Palczewski, K., The Significance of G Protein-Coupled Receptor Crystallography for Drug
Discovery. Pharmacol Rev. , 2011. 63(4): p. 901-937.
31. Sen, S., et al., Small molecule annotation for the Protein Data Bank. Database (Oxford), 2014. 2014: p. bau116.
32. Lütteke, T., von der Lieth, C.W., pdb-care (PDB CArbohydrate REsidue check): a program to support annotation of
complex carbohydrate structures in PDB files. BMC Bioinformatics, 2004. 5(69).
33. Sanner, M.F., Python: a programming language for software integration and development. J Mol Graph Model, 1999.
17(1): p. 57-61.
34. Sanner, M.F., A.J. Olson, and J.C. Spehner, Reduced surface: an efficient way to compute molecular surfaces.
Biopolymers, 1996. 38(3): p. 305-20.
35. Dallakyan, S. and A.J. Olson, Small-molecule library screening by docking with PyRx. Methods Mol Biol, 2015. 1263:
p. 243-50.
36. Russell Spitzer, A.N.J., Surflex-Dock: Docking Benchmarks and Real-World Application. J Comput Aided Mol Des.,
2012. 26(6): p. 687–699.
37. Lua, R.C., Wilson, S.J., Konecki, D.M., Wilkins, A.D., Venner, E., Morgan, D.H., Lichtarge, O., UET: a database of
evolutionarily-predicted functional determinants of protein sequences that cluster as functional sites in protein
structures. Nucleic Acids Research, 2016. 44: p. D308–D312.
38. Schwede, T., Protein Modelling: What Happened to the “Protein Structure Gap”? Structure, 2013. 21(9).
39. Moult, J., Fidelis, K., Kryshtafovych, A., Schwede, T.,Tramontano, A., Critical assessment of methods of protein
structure prediction (CASP) — round x. Proteins:, 2014. 82(0 2): p. 1–6.
40. Frenkela, Z.M., Frenkela, Z.M., Trifonova, E.N. Snir, S., Structural relatedness via flow networks in protein sequence
space. Journal of Theoretical Biology, 2008. 260(3): p. 438–444.
41. di Luccio1 E. Koehl, P.K., A quality metric for homology modeling: the H-factor. BMC Bioinformatics 2011, 12:48,
2011. 12(48).
42. Xiang, X., Advances in Homology Protein Structure Modeling. Curr. Protein Pept. Sci., 2006. 7(4): p. 217–227.
43. Chothia, C., Lesk, A.M., The relation between the divergence of sequence and structure in proteins. EMBO J., 1986.
5(4): p. 823–826.
44. Venselaar, H., Joosten, R.P., Vroling, B., Baakman, C.A. B., Hekkelman, M.L., Krieger, E., Vriend, G., Homology
modelling and spectroscopy, a never-ending love story. Eur Biophys J., 2010. 39(4): p. 551–563.
45. Fong, J.H., Marchler-Bauer, A., Protein subfamily assignment using the Conserved Domain Database. BMC Research
Notes, 2008. 1(114).
46. Marco Biasini, S.B., Andrew Waterhouse, Konstantin Arnold, Gabriel Studer, Tobias Schmidt, Florian Kiefer, Tiziano
Gallo Cassarino, Martino Bertoni, Lorenza Bordoli, Torsten Schwede, SWISS-MODEL: modelling protein tertiary and
quaternary structure using evolutionary information. Nucleic Acids Research, 2014. 42 (Web server Issue): p. W252-
W258.
47. Arnold, K.B., L., Kopp, J.,Schwede, T., The SWISS-MODEL Workspace: A web-based environment for protein structure
homology modelling. Bioinformatics, 2006. 22: p. 195-201.
48. Bordoli, L., Kiefer, F., Arnold, K., Benkert, P., Battey, J. and Schwede, T., Protein structure homology modelling using
SWISS-MODEL Workspace. Nature Protocols, 2009. 4: p. 1-13.
49. Benkert, P., S.C. Tosatto, and D. Schomburg, QMEAN: A comprehensive scoring function for model quality assessment.
Proteins, 2008. 71(1): p. 261-77.
50. Altimari, J.M., Niranjan, Birunthi, Risbridger, Gail P., Schweiker,Stephanie S., Lohning,Anna E., Henderson, Luke C.,
Preliminary investigations into triazole derived androgen receptor antagonists. Bioorganic & Medicinal Chemistry,
2014. 22(9): p. 2692-2706.
51. Huang, N., B.K. Shoichet, and J.J. Irwin, Benchmarking sets for molecular docking. J Med Chem, 2006. 49(23): p. 6789-
801.
52. Herraez, A., Biomolecules in the Computer Jmol TO THE RESCUE. BIOCHEMISTRY AND MOLECULAR
BIOLOGY EDUCATION, 2006. 34(4): p. 255-261.
53. Pettersen EF1, G.T., Huang CC, Couch GS, Greenblatt DM, Meng EC, Ferrin TE., UCSF Chimera--a visualization
system for exploratory research and analysis. J Comput Chem., 2004. 25(13): p. 1605-12.
54. Jain, A.N., Morphological similarity: a 3D molecular similarity method correlated with protein-ligand recognition. J
Comput Aided Mol Des, 2000. 14(2): p. 199-213.
55. Kellenberger, E., Rodrigo, Jordi, Muller, Pascal, Rognan, Didier, Comparative Evaluation of Eight Docking Tools for
Docking and Virtual Screening Accuracy. PROTEINS: Structure, Function, and Bioinformatics, 2004. 57: p. 225–242.
56. Spitzer, R., Jain, Ajay N., Surflex-Dock: Docking Benchmarks and Real-World Application. J Comput Aided Mol Des.,
2012. 26(6): p. 687–699.
57. National Centre for Biotechnology Informtation http://www.ncbi.nlm.nih.gov/ (Accessed Nov.21.
58. Sievers, F., Wilm, A., Dineen, D., Gibson, T.J. , Karplus, K., Li, W., Lopez, R., McWilliam, H., Remmert, M., Söding,
J., Thompson, J.D., Higgins, D.G., Fast, scalable generation of high-quality protein multiple sequence alignments using
Clustal Omega. . Molecular Systems Biology, 2011. 7: p. 1-6.
59. Notredame, C., Higgins, D.G., Heringa, J., T-Coffee: A novel method for fast and accurate multiple sequence alignment.
. Journal of Molecular Biology, 2000. 302(1): p. 205-217.
60. Swiss-Model Homology Modelling Server. [cited 2015 Dec.13 ]; Available from: http://swissmodel.expasy.org/.
61. Irwin, J.J. and B.K. Shoichet, ZINC - A free database of commercially available compounds for virtual screening.
Journal of Chemical Information and Modeling, 2005. 45(1): p. 177-182.
62. Eisermann, K., Wang, Dan, Jing, Yifeng, Pascal, Laura E.,Wang, Zhou, Androgen receptor gene mutation,
rearrangement, polymorphism. Transl Androl Urol., 2013. 2(3): p. 137–147.