You are on page 1of 11

Drug Discovery Today d Volume 27, Number 5 d May 2022 REVIEWS

INFORMATICS (ORANGE)

Bridging informatics and medicinal


inorganic chemistry: Toward a database of
metallodrugs and metallodrug candidates
José L. Medina-Franco a,⇑, Edgar López-López a,b, Emma Andrade a, Lena Ruiz-Azuara c,
Angelo Frei d, Davy Guan e, Johannes Zuegg e, Mark A.T. Blaskovich e
a
DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico
b
Departamento de Química y Programa de Posgrado en Farmacología, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico
Nacional, Mexico City 07000, Mexico
c
Medicinal Inorganic Chemistry, School of Chemistry, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico
d
Department of Chemistry, Imperial College London, Molecular Sciences Research Hub, White City Campus, Wood Lane, London, W12 0BZ, UK
e
Centre for Superbug Solutions, Institute for Molecular Bioscience, The University of Queensland, St. Lucia, Queensland 4072, Australia

Metallodrug discovery has evolved in recent years, yielding several compounds in the clinic for
therapeutic and medical imaging diagnostic applications. As reviewed here, several research groups in
well-established medicinal inorganic chemistry groups are consistently generating high-quality SAR
data representing an ideal starting point in the use of computational methods to advance the
development of new drugs. Although there are representative chemical structures of metallodrugs in
public databases annotated with biological activity, there is currently no public compound database
dedicated to metallodrugs. Here, we also discuss the significance, viability, applications and challenges
of developing a public compound database of metallodrugs – with consistent representation of
metallodrug structure being a crucial obstacle. A curated metallo-compound database would substan-
tially benefit metallodrug discovery and development.

Keywords: Chemical space; Chemoinformatics; Compound databases; Metal-based compound; Metallodrugs; Medicinal
inorganic chemistry; Structure–activity relationships

Introduction and expand therapeutic opportunities against neglected tropical


Metals play a significant part in the natural processes of living diseases.6 Unfortunately, the initial applications of metal com-
organisms. The use of metal-containing compounds to modulate plexes to treat cancer based on their cytotoxicity has led to per-
biological processes in a therapeutic context began many years ceptions of general toxicity across the entire class, which, when
ago with the discovery (and approval for clinical use in 1978) coupled with the lack of organometallic chemists based in phar-
of cisplatin, a potent anticancer drug. Since then, metal-based maceutical companies, has hampered more widespread investi-
compounds have emerged as unprecedentedly powerful thera- gations. Fig. 1 shows representative metallodrugs used in the
peutic agents for a range of diseases, in addition to their applica- clinic, and there are several others used in diagnosis, as recently
tion as diagnostic imaging agents.1–3 Indeed, some conditions reviewed.7 In the current COVID-19 pandemic, the potential
are only treatable with metal-based drugs. In addition to cancer, applications of metal-based compounds as antiviral drugs have
metal-based antibiotics are emerging as a promising alternative been analyzed.8
to organic compounds to combat antimicrobial drug resistance4,5 In addition to metal-based compounds approved for clinical
use, several compounds are under investigation in clinical trials,

⇑ Corresponding author.Medina-Franco, J.L. (medinajl@unam.mx)

1359-6446/Ó 2022 Elsevier Ltd. All rights reserved.


1420 www.drugdiscoverytoday.com https://doi.org/10.1016/j.drudis.2022.02.021
Drug Discovery Today d Volume 27, Number 5 d May 2022 INFORMATICS (ORANGE)

INFORMATICS
Drug Discovery Today

FIGURE 1
Chemical structures of representative metallodrugs approved for clinical use. Extensive reviews of metallodrugs approved for clinical use and clinical
development are cited in the manuscript.

as summarized in Table 1. Of note, data in the table only con- Metal-based compounds continue attracting the attention of
sider the approved metal-based compounds for clinical use in scientists from many fields, including inorganic, organic and
DrugBank (approved by health agencies in the USA and medicinal chemistry as well as biology, medicine and biophysics.
Canada).9 However, there are other multinational organizations From a structural and mechanistic point of view, metals can par-
with similar but not identical criteria not included here. For ticipate in biological processes as structural carriers or in the core
instance, there are two metal-containing compounds (bismuth of chemical reactions.10–12 Beyond the reactivity of some metal-
subsalicylate and bismuth potassium citrate) in clinical trials lodrugs or their imaging properties, metal-based pharmaceuticals
for the treatment of COVID-19. Table 1 suggests that metal- have unique features not found in organic compounds. For
based compounds are a widely studied alternative in treating instance, they can access complex 3D geometries13 not otherwise
and diagnosing different types of diseases, for example, in addi- possible to reach with standard organic compounds, expanding
tion to various types of cancer, they are being used for chronic the universe of viable pharmacophores available to modulate
degenerative, infectious and neglected diseases. Furthermore, biological processes. In other terms, metal-based compounds
compounds for radiotherapy and radioimaging are in the early can access a broader range of the 3D chemical space. This is nota-
stages of the clinical trial process. Compounds containing Pt, ble because higher 3D character has repeatedly been correlated
Au, GdIII and 99mTc are the most widely studied in clinical trials with better clinical success rates in organic drug candidates.14,15
so far.10 However, other metals (e.g., transition metals such as Fig. 2a illustrates the growing number of peer-reviewed publi-
Hg, Fe and Ag and metalloids – elements that have properties cations related to the widespread interest in metal-based com-
that are intermediate between those of metals and nonmetals – pounds: these have emerged as novel chemical alternatives to
such as Si and Sb) have also made their way into compounds solve complex problems in multidisciplinary sciences such as
with clinical potential. molecular biochemistry, nuclear inorganic chemistry, environ-

www.drugdiscoverytoday.com 1421
INFORMATICS (ORANGE) Drug Discovery Today d Volume 27, Number 5 d May 2022

TABLE 1
10
Representative clinical trials with metal-based compounds. Adapted, with permission, from.
Metal/ Agent Indication Current clinical Approved for
radioisotope Phase extensive
(additional post- clinical useb
approval
studies or current
status)a
Conventional therapy
Pt Cisplatin Different types of cancer IV Yes
Oxaliplatin III Yes
Carboplatin IV Yes
INFORMATICS

Lobaplatin IV No
Nedaplatin IV Yes
Picoplatin III No
Iproplatin III –
Satraplatin III No
Au Aurothiomalate Rheumatoid Arthritis IV Yes
Auranofin Different types of cancer, lymphoma, I/II No
pain and antimicrobial activity
Aurothiosulfate Rheumatoid arthritis IV –
Aurothiopropanolsulfonate IV –
Aurothioglucose – Yes
Hg Thiomersal Skin antimicrobial IV Yes
Fe Sodium nitroprusside Vasodilator IV Yes
Ferroquine Malaria II –
Sb Sodium stibogluconate Leishmaniasis IV Yes
Meglumine antimoniate IV No
Mo Tetrathiomolybdate Wilson disease III No
Zr Zirconium cyclosilicate Hyperkalemia III / IV Yes
Bi Bismuth subcitrate Peptic ulcer and gastro-esophageal reflux disease IV Yes
Bismuth subsalicylate Antidiarrheal and anti-inflammatory agent IV Yes
Ag Silver sulfadiazine Topical antibiotic used for burns IV Yes
Radiotherapy
131
Cs Cesium Blu Radiotherapy drugs for different diseases II –
166
Ho QuiremSpheres II –
177
Lu Lutathera IV Yes
186
Re Rhenium-SCT III No
90
Y SIR-Spheres TheraSpheres II –
186
Re Rhenium-186 HEDP Palliative cancer radiotherapy III No
223
Ra Radium-223 chloride III/IV Yes
153
Sm Samarium-153 lexidronam II Yes
225 225
Ac Ac-lintuzumab Different types of tumors and cancer I (Withdrawn) No
211 211
At At-BC8-B10 I/II (Recruiting) No
213 213
Bi Bi MOAB M195 II No
67 67
Cu Cu SARTATE II No
177 177
Lu Lu PSMA-617/177Lu-DOTA- II/III No
girentuximab
188 188
Lu Re-P2045 II No
90
Y 90
Y-daclizumab II –
Photodynamic therapy
GdIII Motexafin gadolinium Brain metastases III No
LuIII Motexafin lutetium Breast and prostate cancer; peripheral I No
atherosclerosis
PdII Padeliporfin Prostate cancer III (Recruiting) No
SnIV Rostaporfin Age-related macular degeneration; metastatic II No
breast cancer
AlIII Sulfonated aluminum Esophagus, oral, skin and stomach cancer – –
phthalocyanine
RuII TLD1433 Bladder cancer I/II (Recruiting) No
Imageology
GdIII Gadofosveset trisodium Blood vessels IV Yes
(Discontinued)
Gadoterate dimeglumine Central nervous system IV Yes
Gadoxetate disodium Liver IV Yes
Gadobutrol Central nervous system IV Yes

1422 www.drugdiscoverytoday.com
Drug Discovery Today d Volume 27, Number 5 d May 2022 INFORMATICS (ORANGE)

TABLE 1 (CONTINUED)
Metal/ Agent Indication Current clinical Approved for
radioisotope Phase extensive
(additional post- clinical useb
approval
studies or current
status)a
Gadopentetate Central nervous system IV Yes
and blood vessels (Discontinued)
Gadobenate dimeglumine Central nervous system IV Yes
Gadodiamide Central nervous system IV Yes
Gadoversetamide Central nervous system and liver IV Yes

INFORMATICS
(Discontinued)
Gadoteridol Central nervous system and IV Yes
extracranial/extraspinal tissues
Gadopiclenol Central nervous system III –
MnII Manganese chloride Liver I/IV (Recruiting) Yes
Mangafodipir Liver and myocardium II Yes
FeIII Iron oxide nanoparticles Liver II/III (Recruiting) –
Ferric ammonium citrate Gastrointestinal system – Yes
Ferumoxsil Gastrointestinal system – Yes
Radioimagenology
64
Cu 64
Cu DOTATATE Neuroendocrine tumors III –
51
Cr 51
Cr EDTA Glomerular filtration rate and red – –
blood cell survival monitoring
67 67
Ga Ga citrate Inflammation and tumors I/III (Recruiting) Yes
68 68
Ga Ga DOTATATE Neuroendocrine tumors IV Yes
198
Au Metallic 198Au colloid Hepatic tumor – –
111
In 111
In-arcitumonab Neuroendocrine tumors and – –
prostate cancer
197
Hg 197
Hg-chlormerodin Brain and kidney – –
203
Hg 203
Hg-chlormerodin Brain and kidney – –
82
Rb 82
RbCl Myocardium – Yes
85
Sr 85
SrCl2 Bone – –
201
Tl 201
TlCl Myocardium – Yes
99m
Tc 99m
Tc-arcitumomab Colorectal cancer – No
99m
Tc-bicisate Stroke localization – –
99m
Tc-depreotide Lungs – –
99m
Tc-disofenin Liver – Yes
99m
Tc-exametazime Inflammatory bowel disease I Yes
99m
Tc-tagged albumin Pulmonary perfusion III –
99m
Tc-hynic-octreotide Neuroendocrine tumors II –
99m
Tc-mebrofenin Liver and bile gland I Yes
99m
Tc-medronate Bones III Yes
99m
Tc-mertiatide Kidneys I –
99m
Tc-nofetumomab Lungs – Yes
99m
Tc-oxidronate Bones – Yes
99m
Tc-pentetate Brain, kidneys and lungs III –
99m
Tc-pyrophosphate Bones and myocardium – Yes
99m
Tc-tagged erythrocytes Blood pool imaging – –
99m
Tc-sestamibi Myocardium IV Yes
Na99m
TcO4 Brain and thyroid – –
99m
Tc-succimer Kidneys – –
99m
Tc-sulesomab Bone inflammation – –
99m
Tc-sulfur colloid Liver, spleen and esophagus IV Yes
99m
Tc-tetrofosmin Myocardium IV Yes
99m
Tc-tilmanocept Lymphatic system IV Yes
a
Searched on https://clinicaltrials.gov/ct2/home.
b
Searched on https://go.drugbank.com/ (16 Jul 2021).

mental, materials and nanosciences (Fig. 2b). Metal-containing cancer, chemotherapy, antimicrobial and antibacterial activity,
compounds are also playing an increasingly important part in photodynamic therapy, luminescence, biocompatibility, molec-
toxicology, pharmacology and biomedical engineering studies. ular modeling and drug design), biological effects (e.g., DNA
Fig. 2c shows a bibliometric map that displays the key concepts damage, antioxidant activity, oxidative stress, genotoxicity and
related to the use of metal-based compounds for treatment of dis- autophagy) and limitations for use (e.g., cytotoxicity, drug deliv-
eases, highlighting ideas pertaining to their applications (e.g., ery and pharmacokinetics). Notably, ‘cytotoxicity’ is one of the

www.drugdiscoverytoday.com 1423
INFORMATICS (ORANGE) Drug Discovery Today d Volume 27, Number 5 d May 2022
INFORMATICS

Drug Discovery Today

FIGURE 2
The growing number of papers related to discipline that study metallodrugs from 1960 to the present. (a) Using the portal Web of Science the keywords:
‘metals in medicine’, ‘medicinal inorganic chemistry’ and ‘bioinorganic chemistry’ were searched. (b) Percentage of works related to different scientific
disciplines. (c) Bibliometric map based on PubMed data created by VOSviewer (https://www.vosviewer.com/) based on keyword co-occurrence during the
period 1960 to date (August 2021). The following keywords were used: ‘metallodrug’, ‘metal-based drug’, ‘metallopharmaceutical’, ‘bioorganometallic’,
‘organometallic pharmaceutical’ and ‘metallodrug candidate’. A total of 6277 publications were identified, grouped into ten clusters (represented with
different colors) that show the relationship between the 165 specified keywords. The relative size of the squared box is related to the number of publications
containing the keyword.

more prominent concepts, presumably reflecting the role of keyword ‘metal-based’. For example, it is frequent to find publi-
metal complexes in anticancer therapy but also concerns about cations with the concurrent terms ‘metal-based’ and ‘therapy’,
their toxicity. Indeed, as commented above, medicinal inorganic ‘therapeutics’, ‘diagnosis’ and others.
chemistry is still not valued as it should be within medicinal Generating basic knowledge of inorganic compounds with
chemistry because of the misleading association of metals as potential biological applications is a promising and exciting
toxic agents.7 Fig. S1 (see supplementary material online) shows endeavor, especially for the design of therapeutic agents. Mul-
the frequency of disciplines and publications associated with the tidisciplinary research has led to the clinical use of metal-based

1424 www.drugdiscoverytoday.com
Drug Discovery Today d Volume 27, Number 5 d May 2022 INFORMATICS (ORANGE)

compounds. According to a review by Miranda, the mechanism field. There have been sparsely documented attempts to build
of action of metallodrugs can be classified into three categories: such a database, but it is currently not accessible.20
conventional therapy; radiotherapy; and photodynamic ther- Herein, we discuss the need, significance, viability, potential
apy.7,16 In turn, conventional therapy includes the following applications and challenges of developing and maintaining a
mechanisms: direct bonding between the metallic center and public compound database of metallodrugs that eventually can
the biological target; the release of a biologically active ligand; be implemented as a user-friendly and web-searchable tool. The
redox activity; and a unique or unknown mode of action. The hypothesis of the database is that there is enough quality infor-
potential for multiple mechanisms of action is another reason mation published in peer-reviewed journals and public resources
for the increased interest in metal complexes, along with their (including patents) to make it feasible to initiate the construction
3D structure. of a dedicated database of metal-containing compounds anno-
As reviewed in this manuscript, although a considerable por- tated with biological activity. The following section surveys pub-

INFORMATICS
tion of the chemical space is being generated and explored by lic compound databases annotated with biological activity for
metal-based complexes,11 they are scarcely represented in com- the possibility of computational chemogenomics based on
mon public databases such DrugBank,9 DrugCentral,17 metal-based drugs. It emphasizes the quantification of the
ChEMBL,18 PubChem19 and others (see below). Thus, we propose metal-based molecules present in those databases. This is fol-
that a database dedicated to collect and maintain the rich SAR of lowed by a proposal for a strategy to develop an annotated com-
metal-containing drugs would significantly help to advance the pound database dedicated to metallodrugs, highlighting

Drug Discovery Today

FIGURE 3
(a) Overview of the research areas a metallodrug database can impact. (b) Representative chemoinformatic- and computational-based applications of the
proposed metallodrug database. (c) Work plan to develop, update and maintain the metallodrug database.

www.drugdiscoverytoday.com 1425
INFORMATICS (ORANGE) Drug Discovery Today d Volume 27, Number 5 d May 2022

potential applications and challenges. The last section presents sive, are not dedicated to drug discovery. The Cambridge Struc-
concluding remarks. tural Database (CSD) with nearly 1 million structures in total
contains probably the largest number of metal complex struc-
tures (e.g., 48% of the CSD compounds contain a transition
Molecular databases and their importance in drug metal) albeit without any biological data.32
discovery A notable public biomolecular database is the Protein Data
Chemical space is an essential concept in drug discovery and Bank (PDB).33 The origins of PDB date back to 1971 and began
computer-aided drug design in general.21 Methods commonly with seven structures. At the time of writing, PDB
used in computer-aided drug discovery can be roughly classified contains > 187844 entries. With time, as more structures and
into two major approaches: structure-based and ligand-based. extensive functionalities were added, this public resource trans-
The former uses the 3D structure of the molecular targets (either formed into a primary public resource of biomolecules to per-
INFORMATICS

an experimental structure or a computer-generated model). In form molecular simulations, with journal submission often
the structure-based approaches, characterizing the target–ligand required to include deposition of structural data into the PDB.
interactions at the molecular level is crucial.22–24 Ligand-based These and other public compound databases typically used in
methods rely on the SAR data available for the ligands to guide drug discovery, including those small-molecule databases anno-
the design. Computer-aided drug discovery has made significant tated with biological activity, have been extensively
contributions to developing drugs in clinical use but still faces reviewed.34,35 Among the several uses for organization and data
major challenges, as recently discussed.25 In any scenario, it is mining, large compound databases annotated with biological
imperative to access high-quality data that can be conveniently activity (e.g., ChEMBL) are used routinely to perform structure–
accessed in molecular databases. property (activity) relationship studies. Noteworthy, the need
Owing to the large size of the chemical space, compound for databases and their application in machine learning has been
databases (typically accessible in a digital form such as web- recently discussed.36,37 This is illustrated in Fig. 3a and 3b, which
servers or as relational databases) are fundamental to systemati- show schematic applications of compound databases in drug dis-
cally explore what space has been studied and track its ‘expan- covery. A public database facilitates data reproducibility and
sion’ – its continued growth as the number of enumerated and helps data sharing, which is particularly relevant with the move
virtual libraries rapidly increase, as witnessed by the availability toward increased ‘open access’ requirements for publically
of ultra-large chemical libraries.21 Indeed, the current number funded projects. Chemical databases annotated with biological
of compounds in physical and electronic databases reveals the activity pave the way to develop computational chemoge-
vast number of compounds already readily available or which nomics.38 Thus far, the chemogenomics space of small organic
could theoretically be made (i.e., virtual databases).26 However, molecules has been extensively explored but a systematic
most of the attention of the medicinally relevant chemical space chemogenomics exploration of metallodrugs is missing.
and the enumeration of new compounds to explore and popu- Table S2 (see supplementary material online) shows large pub-
late the chemical space27 has been focused on traditional small lic compounds databases used for drug discovery with the num-
organic molecules. The importance of compound databases for ber of currently present metal-based compounds. For this survey,
different aspects of the drug discovery process and research is the following databases were searched: DrugBank (https://
reflected by the continued publication of new or updated data- go.drugbank.com/), DrugCentral (https://drugcentral.org/),
bases in a large number of scientific and peer-reviewed journals ChEMBL (https://www.ebi.ac.uk/chembl/), PubChem (https://
as surveyed in Table S1 (see supplementary material online). As pubchem.ncbi.nlm.nih.gov/) and the Developmental Therapeu-
commented on above, a compound database dedicated to metal- tics Program of the National Cancer Institute database (NCI-
lodrugs and metal-based compounds with potential therapeutic DTP; https://dtp.cancer.gov/dtpstandard/dwindex/index.jsp).39
applications would benefit the systematic exploration of the Indeed, as discussed here, the NCI-DTP database represents one
chemical space covered by metallodrugs. of the primary sources in the public domain with a significant
Chemical databases have been a cornerstone in chemoinfor- amount of chemical and biological information. Fig. S2 (see sup-
matics28 advancing many areas as highlighted in a review by plementary material online) shows screenshots of the graphical
López–López.29 A representative example is the Chemical user interfaces of representative molecular databases used in drug
Abstracts Service (CAS) Registry System (introduced in 1965), discovery. The survey of metal-based compounds in the public
which, at the time of writing (August 2021), contains 185 million databases (as of August 2021) indicates that PubChem has the
organic and inorganic substances (including alloys, coordination most significant number of metal-based compounds, followed
compounds, mixtures, minerals, polymers and salts published by NCI-DTP, ChEMBL, DrugCentral and DrugBank. Of note, in
since the beginning of the 1800 s), and about 70 million protein PubChem, there are several biological activities reported for each
and nucleic acid sequences30. Another example is the Reaxys molecule. In all public databases, the proportion of metal-
database that puts together physical and chemical data (includ- containing compounds is < 10%. The survey also showed that
ing reaction data) on chemical substances. Reaxys contains data almost all metal-containing compounds in ChEMBL and NCI-
in organic, inorganic and organometallic chemistry dating as far DTP had associated biological activity data. Finally, the review
back as 1771. It contains information for > 148 million sub- of the public databases revealed that there are unique metal-
stances and 55 million reactions.31 The CAS Registry System containing molecules in NCI-DTP, not present in other public
and Reaxys are commercial databases and, although comprehen- compound collections. Of note, a 2015 survey of the NCI-DTP

1426 www.drugdiscoverytoday.com
Drug Discovery Today d Volume 27, Number 5 d May 2022 INFORMATICS (ORANGE)

collection of anticancer metal or metalloid compounds revealed matic entry and indexation of this class of compounds needs
about 1100 molecules. The current analysis of the NCI-DTP dis- to be addressed. For such a database to be useful and, hence,
covered growth of  100% of these type of compounds in the used, it needs to be intuitively searchable by structure and sub-
past 7 years.39 structure of compounds. As is also discussed in a later section,
Overall, the analysis reveals that there is a small percentage of there is currently no suitable string-based representation for
metal-containing compounds in these databases and, most metal-based compounds because they often exceed standard
importantly, they are mixed with organic compounds and bio- valency rules and involve different binding modes than organic
logics (e.g., antibodies and peptides in some databases). Yet, molecules.
the number of publications in peer-reviewed journals (Figs. 2
and 3) suggests a substantial quantity of new metal-containing A metallodrug database road map
compounds annotated with biological activity under basic Highlights of its construction and implementation

INFORMATICS
research that are not being captured by these major public (accessibility)
databases. Like other public compound databases, developing a compre-
Of note, the major compound databases now readily available hensive database of metal-containing compounds annotated
all began with a limited number of entries (e.g., molecules or bio- with biological activity is a major research program that will take
molecules) and grew significantly with time owing to the interest several years just to establish. To be useful, continued mainte-
in, and importance of, the data. A notable example is PDB which nance and update is essential as new information and SAR are
began with seven structures in 1971, and it currently generated constantly. However, one has to start somewhere,
has > 187844 structures. Whereas the construction, curation and a proof-of-concept first version of the database was recently
and maintenance of compound databases is time-consuming, proposed by the authors.40 The proposed process (milestones) to
they often provide significant benefits over time. Hence, it is create a metallodrug database (e.g., metallodrug-DB) is illustrated
anticipated that a bespoke metal-based compound database in Fig. 3c and discussed below.
annotated with biological activity, as suggested here, would Because metallodrugs cover a large spectrum of therapeutic
evolve over the years to become a valuable primary source of data applications (conventional therapy, radiotherapy and photody-
and information to support and advance medicinal inorganic namic therapy), the first version of the database could focus on
chemistry efforts. a specific and well defined area, for instance on metallodrugs
In 2007 a report was published with an attempt to build a approved for clinical use and under clinical development (e.g.,
metallodrug database (called Database on Metallopharmaceuti- Fig. 1 and Table 1).
cals).20 However, that database is not currently available. To The second version of the database could then be populated
the best of our knowledge, there is no public database dedicated with metal-based compounds tested in one or more biological
to metallodrugs annotated with bioactivity data. Advances to assays. It is important to include data of experimentally tested
build a proof-of-concept database were recently discussed (ter- active and inactive compounds. The lack of information on inac-
med D-InoDB or DIFACQUIM-Inorganic database).40 tive compounds is a common issue found in many public chem-
An example of the potential power of a curated database and ical repositories that impact the development and confidence of
the difficulties in setting one up is provided by The Community predictive models. This would be one of the most time-
for Open Antimicrobial Drug Discovery (CO-ADD, co-add.org). consuming endeavors owing to the need for extensive manual
CO-ADD is a crowd-sourcing initiative that offers free antimicro- intervention. Important points to consider while assembling
bial testing of any submitted chemical entity. Since its inception the compound database, typically considered for building com-
in 2015, >300 000 compounds have been screened, with struc- pound databases, include but are not limited to:
tures and associated antimicrobial activity eventually published
in an open database. CO-ADD includes not only the results of  extracting the metal-based compound annotated with biolog-
active compounds but also the results of experimentally tested ical activity and currently available in public databases (sur-
inactive compounds. Notably, a recent analysis identified nearly veyed in Table S2, see supplementary material online);
1000 metal-containing compounds (now > 1000) in the library  curating and standardizing the information (this step includes
which showed promising antimicrobial activity.4 However, the elimination of duplicate compounds across databases) –
throughout the analysis of these compounds, we encountered data curation of compound datasets is one of the most crucial
several problems in how these structures were digitally processed points as extensively discussed41;
and stored. Often the correct structure of metal-containing com-  checking data quality and ensuring as much as possible the
pounds could not be displayed properly and their only identifica- reliability of the data, including compound identity and pur-
tion was by their molecular formula. Because molecular formulae ity (that there is sufficient information about compound char-
are not unique identifiers for compounds, many hours were acterization) – check also the reliability of the biological
required to manually draw the structures of relevant compounds. activity data, one first approach is filtering information based
Furthermore, the representation of organometallic structures on the quality of the data source: for example make sure the
within chemical structure drawing programs is not standardized information is linked to a peer-reviewed journal with an acces-
(e.g., particularly the representative of metal–ligand bonds in sible Digital Object Identifier (DOI number) and that the pub-
complexes), so is subject to individual preferences. For a future, lication reports the compound characterization –this
large-scale metal-compound database to be feasible, the auto- approach is time-consuming but significantly helps to accu-

www.drugdiscoverytoday.com 1427
INFORMATICS (ORANGE) Drug Discovery Today d Volume 27, Number 5 d May 2022

rately curate the information to be included in the database, Similar to the compound databases already in the public domain,
such an approach has been used for building and maintaining the proposed collection of a dedicated database of metal-
public compound databases of natural products42,43; containing molecules would be a structured, homogenized and
 supplying updates: the first versions of the proposed database common starting point to perform bibliographic searches to pro-
can focus on compounds used in therapy, updated versions of vide the data for performing calculations. The compound data-
the database can incorporate molecules used for diagnosis and base would open several investigative opportunities including,
dietary supplements; as with any public compound database, but not limited to, systematic virtual screening, qualitative and
the metallodrugs database must be continuously updated, quantitative SAR analysis, activity landscape analysis, including
adding new data and published information; ideally, journal the identification of activity cliffs, and the development of pre-
publishers would encourage or require authors submitting dictive and classification models.40 The database could also sys-
articles with relevant information to deposit relevant informa- tematically explore the medicinally relevant metallodrug
INFORMATICS

tion in the database; as part of the actualization of the con- chemical space, replicating all the applications that the concept
tents of the database, experimental results from omics of chemical space has in drug discovery.21 In summary, a metal-
approaches should be considered; future developments of lodrug database would contribute to the foundlings of metallo-
the proposed database can include structural features of the drug informatics and contribute to applying computational
adducts formed of the metal-containing compounds with chemogenomics approaches for metal-based drugs.
their targets if the information is experimentally available.
Challenges for the metallodrug database
To make the metallodrug database public and widely accessi- As with any other compound database, particularly in the public
ble, it would be ideal to develop a dedicated website. The data- domain, a major challenge lies in the ongoing curation and
base itself should comply with the FAIR Data Principles.44 The maintenance, keeping the information and the associated web
initiative can be a multidisciplinary effort involving research server updated. As commented on above, the initiative can be
groups, experts in metallodrug discovery and development, group based or, preferably, community based with the participa-
chemoinformatics and machine learning.45 tion of experts in metallodrug discovery and computational
methods. It would be imperative to secure funding to sustain a
Characterization and molecular vectorization: Systematic public and long-lasting repository. Funding sources such as the
representation USA National Institutes of Health (NIH) could support these
The database can be initially characterized using basic and gen- endeavors (e.g., NIH funding calls PAR-20-089 ‘Biomedical Data
eral descriptions of its contents. For instance, annotate each Repository’ and PAR-20-097 ‘Biomedical Knowledgebase’).49
compound with a unique identifier, chemical name, source From the computational perspective, a significant challenge
(e.g., DOI to original publication), biological activity ‘signature’ of the metallodrug database is developing a standardized repre-
or profile (i.e., the biological activity reported for each com- sentation of the compounds in the database. The Simplified
pound). If known, annotate the general mechanism of action Molecular-Input Line Entry System (SMILES) is a commonly used
of the metal-based compound. Importantly, indicate whether notation system for the representation of organic compounds
the compound acts as a prodrug. Certainly, metallodrugs are fre- that does not adequately describe inorganic compounds as
quently prodrugs that are activated by redox reactions or ligand implemented in existing software and databases. The current
substitution.10 Additional relevant information would include a SMILES standard is unable to represent coordinate bonds
set of chemical descriptors of general interest. Basic and general between metal centers and aromatic groups. For example, fer-
annotations of broad interest include the identity of the metal, rocene: [CH–]1C = CC = C1.[CH–]1C = CC = C1.[Fe + 2], can only
coordination number and symmetry. The general and basic be represented by the three individual fragments without defined
information will be helpful to distinguish, in the database coor- orientation or interaction between fragments. Although
dination, organometallic and inorganic compounds that are attempts have been proposed to represent such metal com-
known to have marked differences in chemical and physico- plexes,50 such representations are not supported by most soft-
chemical properties, geometries and bonding. As discussed in ware packages, owing to violation of the valance rule. In
the next subsection, arguably, this is one of the most significant addition, the definition of stereoisomeric centers is limited to
challenges because the commonly used descriptors such as tetrahedral centers such as C-atoms with four ligands, which is
molecular fingerprints have been developed for organic com- inadequate for the more complex and diverse orientations found
pounds. However, the compound database itself will serve as a in metal complexes (e.g., octahedral, square planar, etc.). Even
framework to compute, for instance, quantum mechanics more-complex notation systems such as HELM,51 which are
descriptors in an automated manner.46 To this end, the proposed designed for complex biomolecular structures, fail to fully repre-
database could be linked to existing databases such as sent metal complex structure.
metalPDB47 or the IoChem-BD repository.48 Currently, metal complexes can only be accurately repre-
sented using a 3D representation of the whole compounds, using
Opportunities and potential applications of the metallodrug formats such as Molfile (also known as SDF of CT file), which can
database represent the correct orientation of ligands around metal centers
As discussed above and outlined in Fig. 3a and 3b, the metallo- but still are unable to define coordinate bonds. For digital pro-
drugs database could be the foundation for a large number of cessing of such metal complexes a ‘metal intelligent’ software is
chemoinformatic-based and computational-based applications. required to identify any nonstandard bonds properly. Currently,

1428 www.drugdiscoverytoday.com
Drug Discovery Today d Volume 27, Number 5 d May 2022 INFORMATICS (ORANGE)

most databases only contain metal-containing compounds that The public database can be the starting point for developing pre-
can be represented using standard organic valance rules, while dictive machine learning models and becoming a reference
rejecting all metal complexes that fall outside.52 To accommo- framework for de novo generation of metal-based compounds,
date metal complexes in such databases current structure defini- thus expanding the traditional chemical space.
tion formats would need to be expanded, or different notation Here, based on a survey of the literature and existing public
systems used.53 A grand challenge in this regard is addressing databases commonly used in drug discovery, we highlight the
metal-containing compounds with no previous standard repre- novelty and potential of a dedicated compound database of
sentation. Thus far, as discussed with the CO-ADD database, an metal-based compounds annotated with biological activity. The
automatic entry and indexation of metal compounds would be metallodrug database (e.g., metallodrug-DB) would accelerate
needed because current approaches require manual curation. metallodrug discovery and development for a range of therapeu-
Additional efforts to represent chemical structures, including tic areas, such as cancer, infections and neglected diseases.

INFORMATICS
metal-containing compounds, to derive regression models or Specifically, the compound database would speed up the devel-
data-driven analysis have been recently discussed by Musil opment of medicinal inorganic chemistry by enabling the appli-
et al. in a comprehensive review.54 cation of chemoinformatic and machine learning techniques.
A further limitation in the chemoinformatic analysis of metal- Curation and maintenance of the compound database would
containing compounds is the lack of proper and efficient descrip- not be trivial but is feasible as evidenced by the evolution of
tors of the metal center of such compounds, because most PDB. In the future, virtual libraries of metal-based compounds
descriptors such as molecular fingerprint have only been devel- with potential biological activity can be enumerated and enrich
oped and validated for organic molecules. Indeed, most the metal-based medicinally relevant chemical space.
chemoinformatic analysis of metal complexes use computational
expensive quantum–mechanical semi-empirical methods to cal-
culate molecular descriptors. To include metal complexes in
Declaration of Competing Interest
machine learning analysis current fingerprint representations The authors declare that they have no known competing
would need to be expanded to account for the 3D structure financial interests or personal relationships that could have
and extended electronic properties of metal-containing drugs. appeared to influence the work reported in this paper.
Because structure notation and representation are the corner-
stone in chemoinformatics and serve as a reference for defining Acknowledgements
the chemical space of a specific dataset of molecules (depending
E.L-L. and E.A. thank the Consejo Nacional de Ciencia y Tec-
on the particular application), implementing a systematic struc-
nología (CONACyT), Mexico, for the scholarships No. 762342
ture representation will be a pivotal point in implementing a
(No. CVU: 894234) and 594606, respectively. J.Z and M.A.T.B.
widely useful database. However, the database of metallodrugs
thank the Wellcome Trust (Strategic Grant 104797/Z/14/Z) and
and metallodrug candidates itself as proposed here is the starting
the University of Queensland for funding the antimicrobial
point to develop such consistent representation. The most cru-
CO-ADD database initiative, and D.G., J.Z. and M.A.T.B. the
cial challenges outlined here are summarized in Table S3 (see
NHMRC Ideas Grant 2004356 for ongoing database analysis
supplementary material online).
and model development. Discussions with Karina Martinez and
insights from Yesenia Cruz are acknowledged.
Concluding remarks
Although the pharmaceutical industry relies almost entirely on
organic and biological compounds, metallodrugs offer unique Appendix A. Supplementary material
opportunities to expand the medicinally relevant chemical space Representative peer-reviewed journals that publish new develop-
and discover completely new classes of therapeutics. Transition ments in compound databases with applications in drug discov-
metals have become increasingly important in biology and med- ery (Table S1); metal-based compounds in public compound
icine, as demonstrated by the increasing number of scientific databases frequently used in drug discovery (Table S2); frequency
publications over the past few years and the broad diversity of of disciplines and publications associated (e.g., in combination)
applications in relevant therapeutic areas. A better understand- with the keyword ‘metal-based’ (Fig. S1); screenshots of the
ing of metal-based compounds as regulators of biological pro- graphical user interfaces of representative molecular databases
cesses is paramount should these compounds be used as used in drug discovery (Fig. S2); crucial challenges discussed in
therapeutic agents. Such knowledge would be accelerated by the manuscript to develop the database of metallodrugs and met-
the development of a dedicated and publically accessible metal- allodrug candidates (Table S3). Supplementary data to this article
based compound database, as would global efforts to address can be found online at https://doi.org/10.1016/j.drudis.2022.02.
major therapeutic indications using metal-based medicines. 021.

References
1. K.D. Mjos, C. Orvig, Metallodrugs in medicinal inorganic chemistry, Chem Rev 2. A. Chylewska, M. Biedulska, P. Sumczynski, M. Makowski,
114 (2014) 4540–4563. Metallopharmaceuticals in therapy – a new horizon for scientific research, Curr
Med Chem 25 (2018) 1729–1791.

www.drugdiscoverytoday.com 1429
INFORMATICS (ORANGE) Drug Discovery Today d Volume 27, Number 5 d May 2022

3. J.C. García-Ramos, R. Galindo-Murillo, F. Cortés-Guzmán, L. Ruiz-Azuara, Metal- 29. E. López-López, J. Bajorath, J.L. Medina-Franco, Informatics for chemistry,
based drug-DNA interactions, J Mex Chem Soc 57 (2013) 245–259. biology, and biomedical sciences, J Chem Inf Model 61 (2021) 26–35.
4. A. Frei, J. Zuegg, A.G. Elliott, et al., Metal complexes as a promising source for 30. CAS Registry https://www.cas.org/cas-data/cas-registry.
new antibiotics, Chem Sci 11 (2020) 2627–2639. 31. REAXYS https://www.elsevier.com/solutions/reaxys.
5. L.J. Stephens, M.V. Werrett, A.C. Sedgwick, S.D. Bull, P.C. Andrews, 32. Cambridge Crystallographic Data Centre http://www.ccdc.cam.ac.uk/.
Antimicrobial innovation: a current update and perspective on the antibiotic 33. H.M. Berman, J. Westbrook, Z. Feng, et al., The Protein Data Bank, Nucl Acids Res
drug development pipeline, Fut Med Chem 12 (2020) 2035–2065. 28 (2000) 235–242.
6. Y.C. Ong, S. Roy, P.C. Andrews, G. Gasser, Metal compounds against neglected 34. A. Bender, Compound bioactivities go public, Nat Chem Biol 6 (2010) 309.
tropical diseases, Chem Rev 119 (2019) 730–796. 35. A.J. Moura Barbosa, A. Del Rio, Freely accessible databases of commercial
7 V.M. Miranda, Medicinal inorganic chemistry: an updated review on the status of compounds for high- throughput virtual screenings, Curr Top Med Chem 12
metallodrugs and prominent metallodrug candidates, Rev Inorg Chem (2021), (2012) 866–877.
000010151520200030, https://doi.org/10.1515/revic-2020-0030. 36. A. Tkatchenko, Machine learning for chemical discovery, Nat Commun 11
8. D. Cirri, A. Pratesi, T. Marzo, L. Messori, Metallo therapeutics for COVID-19. (2020) 4125.
Exploiting metal-based compounds for the discovery of new antiviral drugs, 37 F.i. Saldívar-González, V.D. Aldas-Bulos, J.L. Medina-Franco, F. Plisson, Natural
INFORMATICS

Expert Opin Drug Discov 16 (2021) 39–46. product drug discovery in the artificial intelligence era, Chem Sci 13 (2022) 1526–
9. D.S. Wishart, Y.D. Feunang, A.C. Guo, et al., DrugBank 5.0: a major update to the 1546.
DrugBank database for 2018, Nucl Acids Res 46 (2018) D1074–D1082. 38. J. Bajorath, A perspective on computational chemogenomics, Mol Inf 32 (2013)
10. E.J. Anthony, E.M. Bolitho, H.E. Bridgewater, et al., Metallodrugs are unique: 1025–1028.
opportunities and challenges of discovery and development, Chem Sci 11 (2020) 39. R. Huang, A. Wallqvist, D.G. Covell, Anticancer metal compounds in NCI's
12888–12917. tumor-screening database: putative mode of action, Biochem Pharmacol 69
11. E. Meggers, Exploring biologically relevant chemical space with metal (2005) 1009–1039.
complexes, Curr Opin Chem Biol 11 (2007) 287–292. 40. J. Medina-Franco, Y. Cruz-Lemus, Y. Percastre-Cruz, Chemoinformatic resources
12. E. Meggers, Targeting proteins with metal complexes, Chem Commun (Camb) for organometallic drug discovery, Comp Mol Biosci 10 (2020) 1–11.
(2009) 1001–1010. 41. D. Fourches, E. Muratov, A. Tropsha, Trust, but verify II: a practical guide to
13. C.N. Morrison, K.E. Prosser, R.W. Stokes, A. Cordes, N. Metzler-Nolte, S.M. chemogenomics data curation, J Chem Inf Model 56 (2016) 1243–1252.
Cohen, Expanding medicinal chemistry into 3D space: metallofragments as 3D 42. A.C. Pilon, M. Valli, A.C. Dametto, et al., NuBBEDB: an updated database to
scaffolds for fragment-based drug discovery, Chem Sci 11 (2020) 1216–1225. uncover chemical and biological information from Brazilian biodiversity, Sci Rep
14. F. Lovering, J. Bikker, C. Humblet, Escape from flatland: Increasing saturation as 7 (2017) 7215.
an approach to improving clinical success, J Med Chem 52 (2009) 6752–6756. 43. B.A. Pilon-Jimenez, F.I. Saldivar-Gonzalez, B.I. Diaz-Eufracio, J.L. Medina-Franco,
15. F. Lovering, Escape from Flatland 2: complexity and promiscuity, BIOFACQUIM: a Mexican compound database of natural products, Biomolecules
MedChemComm 4 (2013) 515–519. 9 (2019) 31.
16. E. Boros, P.J. Dyson, G. Gasser, Classification of metal-based drugs according to 44. M.D. Wilkinson, M. Dumontier, I.J. Aalbersberg, et al., The FAIR guiding
their mechanisms of action, Chem 6 (2020) 41–60. principles for scientific data management and stewardship, Sci Data 3 (2016)
17. O. Ursu, J. Holmes, C.G. Bologa, et al., DrugCentral 2018: an update, Nucleic 160018.
Acids Res 47 (2019) D963–D970. 45. M. Koohi-Moghadam, H. Wang, Y. Wang, et al., Predicting disease-associated
18. D. Mendez, A. Gaulton, A.P. Bento, et al., ChEMBL: towards direct deposition of mutation of metal-binding sites in proteins using a deep learning approach, Nat
bioassay data, Nucl Acids Res 47 (2019) D930–D940. Mach Intell 1 (2019) 561–567.
19. S. Kim, J. Chen, T. Cheng, et al., PubChem 2019 update: improved access to 46. B. Huang, O.A. von Lilienfeld, Ab initio machine learning in chemical compound
chemical data, Nucl Acids Res 47 (2019) D1102–D1109. space, Chem Rev 121 (2021) 10001–10036.
20. CSIR – Unit for Research and Development of Information Products “Jopasana”. 47. V. Putignano, A. Rosato, L. Banci, C. Andreini, MetalPDB in 2018: a database of
Database on metallopharmaceuticals; 2007. Available at http://14.143.190.243/ metal sites in biological macromolecular structures, Nucleic Acids Res 46 (2018)
dsir/sites/default/files/2019-2009/metallo.pdf. D459–D464.
21 J.L. Medina-Franco, N. Sánchez-Cruz, E. López-López, B.I. Díaz-Eufracio, Progress 48. M. àlvarez-Moreno, C. de Graaf, N. López, F. Maseras, J.M. Poblet, C. Bo,
on open chemoinformatic tools for expanding and exploring the chemical space, Managing the computational chemistry big data problem: the ioChem-BD
J Comp Aided Mol Des (2021), https://doi.org/10.1007/s10822-021-00399-1. platform, J Chem Inf Model 55 (2015) 95–103.
22. G. Sciortino, J.-D. Maréchal, E. Garribba, Integrated experimental/computational 49. National Institutes of Health, USA. https://datascience.nih.gov/biomedical-data-
approaches to characterize the systems formed by vanadium with proteins and repositories-andknowledgebases.
enzymes, Inorg Chem Front 8 (2021) 1951–1974. 50. M. Quirós, S. Grazulis, S. Girdzijauskaite², A. Merkys, A. Vaitkus, Using SMILES
23. L. Riccardi, V. Genna, M. De Vivo, Metal–ligand interactions in drug design, Nat strings for the description of chemical connectivity in the Crystallography Open
Rev Chem 2 (2018) 100–112. Database, J Cheminf 10 (2018) 23.
24. P. Janos, A. Spinello, A. Magistrato, All-atom simulations to studying 51. T. Zhang, H. Li, H. Xi, R.V. Stanton, S.H. Rotstein, HELM: a hierarchical notation
metallodrugs/target interactions, Curr Opin Chem Biol 61 (2021) 1–8. language for complex biomolecule structure representation, J Chem Inf Model
25. J.L. Medina-Franco, Grand challenges of computer-aided drug design: the road 52 (2012) 2796–2806.
ahead, Front Drug Discov 1 (2021) 728551. 52. D. Fourches, E. Muratov, A. Tropsha, Trust, but verify: on the importance of
26. T. Hoffmann, M. Gastreich, The next level in chemical space navigation: going chemical structure curation in cheminformatics and QSAR modeling research, J
far beyond enumerable compound libraries, Drug Discov Today 24 (2019) 1148– Chem Inf Model 50 (2010) 1189–1204.
1156. 53. B.J. Bucior, A.S. Rosen, M. Haranczyk, et al., Identification schemes for metal–
27. C.W. Coley, Defining and exploring chemical spaces, Trends Chem 3 (2021) organic frameworks to enable rapid search and cheminformatics analysis, Cryst
133–145. Growth Des 19 (2019) 6682–6697.
28. J. Gasteiger, Chemistry in times of artificial intelligence, ChemPhysChem 21 54. F. Musil, A. Grisafi, A.P. Bartók, C. Ortner, G. Csányi, M. Ceriotti, Physics-
(2020) 2233–2242. inspired structural representations for molecules and materials, Chem Rev 121
(2021) 9759–9815.

1430 www.drugdiscoverytoday.com

You might also like