Chemosphere: Pietro Cozzini, Francesca Cavaliere, Giulia Spaggiari, Gianluca Morelli, Marco Riani

Chemosphere 292 (2022) 133422
Contents lists available at ScienceDirect
Chemosphere
journal homepage: www.elsevier.com/locate/chemosphere
Computational methods on food contact chemicals: Big data and in silico

screening on nuclear receptors family
Pietro Cozzini a, *, Francesca Cavaliere a, Giulia Spaggiari a, Gianluca Morelli b, Marco Riani b
a
Molecular Modelling Lab, Department of Food and Drug, University of Parma, Parco Area Delle Scienze 17/A, 43124, Parma, Italy
b
Department of Economics and Management and Interdepartmental Center of Robust Statistics, University of Parma, Via J. F. Kennedy 6, 43100, Parma, Italy
H I G H L I G H T S G R A P H I C A L A B S T R A C T
• Molecular docking and robust consensus

scoring are useful to identify possible
food and water dangerous molecules.
• Endocrine disruptor prediction using in
silico methods to save time and cost.
• Database and big data approaches to
accelerate hazard identification.
A R T I C L E I N F O A B S T R A C T
Handling Editor: A. Gies According to Eurostat, the EU production of chemicals hazardous to health reached 211 million tonnes in 2019.
Thus, the possibility that some of these chemical compounds interact negatively with the human endocrine
Keywords: system has received, especially in the last decade, considerable attention from the scientific community. It is
Computational chemistry obvious that given the large number of chemical compounds it is impossible to use in vitro/in vivo tests for
Consensus prediction
identifying all the possible toxic interactions of these chemicals and their metabolites. In addition, the poor
Database
availability of highly curated databases from which to retrieve and download the chemical, structure, and
Nuclear receptors
Toxicology regulative information about all food contact chemicals has delayed the application of in silico methods. To
overcome these problems, in this study we use robust computational approaches, based on a combination of
highly curated databases and molecular docking, in order to screen all food contact chemicals against the nuclear
receptor family in a cost and time-effective manner.
1. Introduction activity. It is obvious that, given the large number of chemical com
pounds and their metabolites existing and developed every year, it is
A research project starts with a question. The main question of this impossible to use in vitro (or in vivo) tests for identifying all possible toxic
project is: how we can evaluate all the possible food contact chemicals interactions. The solution is to use computational approaches to reduce
against a protein family to discover potential endocrine disrupting the number of wet tests, seeking only the most probable interactors.
* Corresponding author.
E-mail addresses: pietro.cozzini@unipr.it (P. Cozzini), francesca.cavaliere@unipr.it (F. Cavaliere), giulia.spaggiari@unipr.it (G. Spaggiari), gianluca.morelli@
unipr.it (G. Morelli), marco.riani@unipr.it (M. Riani).
https://doi.org/10.1016/j.chemosphere.2021.133422
Received 25 October 2021; Received in revised form 20 December 2021; Accepted 22 December 2021
Available online 28 December 2021
0045-6535/© 2021 Elsevier Ltd. All rights reserved.
P. Cozzini et al. Chemosphere 292 (2022) 133422
Endocrine disrupting chemicals (EDCs) are exogenous substances these errors propagate quickly and easily across the internet. These
that can interfere with the synthesis, secretion, transport, binding, and undermine the effort of in silico methods. So far, much attention has been
elimination of natural hormones in the body that are responsible for the paid to structure normalization to ensure the detection and the correc
maintenance of homeostasis, reproduction, and behavior (Kavlock et al., tion of three-dimensional errors and a variety of public and commercial
1996). Human exposure to EDCs occurs through oral consumption of toolkits exist to address this problem. However, less attention is often
food and water, contact with skin, inhalation, or intravenous, route given to the consistency of the association between chemical identifiers
(Kabir et al., 2015). These molecules are highly heterogenous and (CAS RN and name) and chemical structures. For example, the com
include pesticides, plasticizers (i.e., phthalates, bisphenols), persistent pound classified as flavouring having the CAS N: 563187-91-7 and the
organic pollutants (POPs) (i.e., dioxins, polychlorinated biphenyls), but common name “l-Menthone-1,2-glycerol ketal” in the EFSA list is a
also chemicals added to food to enhance some characteristics (i.e., fla typical example of CAS:Name wrong association. In fact, this CAS
vourings, food additives), or naturally occurred, such as mycotoxins. actually corresponds to “DNA (mouse strain C57BL/6 J clone
EDCs can act through different mechanisms: mimicking the action of a 5430425J12 EST (expressed sequence tag))” and the correct CAS RN of
naturally produced hormone, blocking hormone receptors in cells, the compound “l-Menthone-1,2-glycerol ketal” is 67785–70-0. More
interacting indirectly by influencing the biosynthesis or availability of over, although CAS RN is commonly used as an identifier of the majority
normal hormones. Between them, the most privileged route is the of databases, in several databases molecules are classified using different
interaction with nuclear receptors (NRs). Nuclear receptors are a su identifiers and thus there is often a lack of standardisation (Hersey et al.,
perfamily of 48 ligand-activated transcription factors, including estro 2015). Although data quality is undoubtedly important for every data
gen receptor (ER), androgen receptor (AR), mineralocorticoid receptor base, they may have been developed with different aims and scope, and
(MR), glucocorticoid receptor (GR), progesterone receptor (PR), and it is unreasonable to expect the same degree of curation. The increasing
thyroid receptor (TR). NRs share a common structural organization amounts of compounds released every year (500–1000 new molecules)
composed of an N-terminal region (A/B domain), a conserved region and that are in contact with food, along with the different sources of
DNA-binding domain (DBD), and a ligand-binding domain (LBD) data, have made it difficult to check manually the reliability of data. In
responsible for ligand recognition. The alteration of nuclear receptors view of this, it is essential to design and implement a data curation
pathways is correlated to many pathologies, such as breast cancer, pipeline into an automated procedure.
prostate cancer, and testicular cancer, infertility, cardiovascular com A wide number of computational applications (tools) specifically for
plications, disturbances in energy metabolism, immune responses, the analysis of EDCs are available in the literature in order to determine
impairment of cognitive functions and the regulation of cell prolifera the relationship between one compound and its toxic effect. In partic
tion and differentiation, hypertension, obesity, and so on (Dall’Asta, ular, the molecular docking technique is a well-establish application to
2016) (De Coster and Van Larebeke, 2012) (Desvergne et al., 2009) study protein-ligand interaction, which means analysing if the ligand
(Fucic et al., 2012) (Luccio-Camelo and Prins, 2011) (Odermatt and has the suitable physical-chemical characteristic, shape, the volume to
Gumy, 2008) (Petrakis et al., 2017) (Safe, 2004) (Schug et al., 2011) fit properly into the binding cavity of the receptor. Molecular docking is
(Gore et al., 2015). In order to prevent human diseases, in the past de mainly composed of two main parts: an algorithm that is used to predict
cades, different regulatory and policy approaches were made even if the different binding poses of a molecule in the protein binding site, and a
identification and safety assessment of potential EDCs is complicated scoring function used to evaluate the strength of ligand-protein inter
both by the observed low-dose effects and the often long-term exposure action, i.e., to predict its binding affinity. Different algorithms and
or exposure during a critical window early in development. One of these scoring functions exist but answering the question of which algorithm or
is the REACH (Registration, Evaluation, Authorisation, and Restriction scoring function is the best one, is a complicated task (Morris and
of Chemicals) legislation that is committed to protecting human health Lim-Wilby, 2008). In fact, each docking software (that is the sum of
and the environment from hazardous chemicals. However, testing all the algorithm and scoring function) has been trained with different proteins
possible EDCs against all the potential targets is very important but also and ligands. Thus, before starting a molecular docking analysis, it should
an expensive, long and difficult task (e.g., the nuclear receptors family be advisable to identify the more appropriate software based on the
contains 48 members). In fact, these tests are still mainly based on trained protein-ligand complexes that best fit with the proteins and li
biological and animal experimentations (toxicity tests), very time- and gands under investigation. However, in the present work, 31 different
cost-intensive, and which cause millions of animals’ death every year. In nuclear receptors with different binding pocket characteristics and a
this context, in silico methods, already well-established tools in drug huge number of heterogeneous molecules from a chemical and struc
discovery, can be good tools either in the identification of new EDCs or tural point of view were considered. Thus, it is unthinkable to identify a
pointing in the right direction when finding the mechanism of action for single docking program that may have the same performance for all
already known EDCs. Computational approaches produce predictive nuclear receptors and for all food contact molecules. For that reason, we
models that are more rapid and less costly than in vitro and in vivo tests, used a robust consensus scoring approach using two different docking
allowing a large amount of data concerning numerous chemical sub software and four different scoring functions. The combination of more
stances to be generated and analysed in a short time without the use of scoring functions allows to reduce the number of false-positive and to
test animals (F. Cavaliere et al., 2020). A key prerequisite for the suc obtain more reliable results by compensating the deficiencies of each
cessful application of computational modeling techniques is the quality scoring function, leading to an improvement of the performances (Ter
of the input data. The availability of open access databases offers the amoto and Fukunishi, 2007) (Wang et al., 2003). Such as Bissantz and
capability to retrieve a huge amount of information from different data co-workers have highlighted, the use of three different scoring functions
sources. The CAS Registry Number (RN) has been chosen, long time ago, enhances the capability to reach hit rates from 10% up to 70% (Bissantz
as a unique and unambiguous numeric identifier for a specific chemical et al., 2000).
compound. It is developed by the American Chemical Society to help The goal of this work is to predict a possible endocrine disrupting
scientists to retrieve and use information from different data sources. activity of a huge set of molecules that can contact the food as a base for
Since it may be unique, validated, and internationally recognized, the further in vitro/in vivo tests using computational methods that do not
governmental agencies rely on CAS RNs for substance identification. consider the intake dose. The following approach takes into consider
However, CAS RNs are often used improperly by the scientific commu ation the interaction between a ligand (i.e. the endocrine disruptor
nity and there is no check made by the American Chemical Society. compound) and the binding site of a receptor (i.e. the nuclear receptor)
Thus, it is really common to find some errors and this wrong information that is considered the molecular initiate event (MIE). This event is
propagates easily across the Internet (Grulke et al., 2019). In fact, con fundamental from a biological point of view because it is the first
flicts in the chemical identifier are not so rare in public resources and mechanism that, in most cases, initiates a biological effect based on the
2
occurrence of conformational changes, signaling cascade as well as Table 1

interaction with other proteins. However, molecular docking does not SQL and NoSQL database structure definition.
predict the binding affinity of a ligand to a protein unless a correlation Field Data type (SQL) Mapping (NoSQL)
analysis was made using known experimental binding affinity (Kd, Ki,
CAS CHAR(16) keyword
Ka, etc.) and the docking score value. In fact, molecular docking and CID CHAR(20) keyword
scoring function refers to the binding interaction of a ligand for a protein EC number CHAR(50) keyword
that means analysing if the ligand has the suitable physical-chemical Common name TEXT text keyword
characteristic, shape, the volume to fit properly into the binding cav IUPAC name LONGTEXT text keyword
Molecular Formula CHAR(100) text
ity of the receptor. Thus, it predicts how strong is the interaction. Canonical SMILES TEXT keyword
Although it can be wrongly thought that if a ligand interacts more tightly InChI LONGTEXT keyword
with a receptor, it should have a high binding affinity, this concept is not InChIKey CHAR(254) keyword
so obvious since other mechanisms are involved in determining the MW FLOAT double
Volume FLOAT double
binding affinity. Lower binding force doesn’t mean low in take dose, it
logP FLOAT double
means a lower DG◦ of binding. Acceptor INT(3) byte
Donor INT(3) byte
2. Material and methods Chiral INT(3) byte
Hydrophobe INT(3) byte
Atom Count INT(3) short
2.1. Database resources Bond Count INT(3) byte
Ring Count INT(3) byte
Different databases and web sources have been used to identify the Positive Charge INT(3) byte
molecules that come into contact with food: European Food Safety Au Negative Charge INT(3) byte
Total Charge INT(3) byte
thority (EFSA) (www.efsa.europa.eu), United Stated Environmental
EFSA link CHAR(254) keyword
Protection Agency (EPA) (www.epa.gov), Food Packaging Forum (http ECHA link CHAR(254) keyword
://www.foodpackagingforum.org), and European Chemicals Agency .mol2 LONGTEXT keyword
(ECHA) (www.echa.europa.eu). Classification CHAR(100) text
2.2. Data quality

three different subgroups:
The entire procedure described below has been implemented as two
a) Chemical names: CAS, CID, EC number, common name, IUPAC
different Python procedures, with a common part used to check CAS RN
name;
validity. In fact, most public databases use Chemical names and CAS RNs
b) 1D chemical information: molecular formula, canonical SMILES,
as substance identifiers. CAS RN is widely used across scientific litera
InChI, InChIKey;
ture, Internet resources, and the chemical regulatory domain. Data are
c) Chemical information: molecular weight, volume, logP value, num
often stored using CAS RN as the primary key to the database and
ber of acceptor atoms, number of donor atoms, number of chiral
chemical names and synonyms as secondary identifiers. A CAS RN can
atoms, number of hydrophobic atoms, atom count, bond count, ring
be considered valid if it fulfils two rules: 1) it is composed by 3-numeric
count, rotational bond count, positive charge atoms, negative charge
parts separated by hyphens (## … - ## - #); 2) it satisfies the
atoms, total charge;
“checkdig” validation formula developed by CAS (www.cas.org/s
d) Regulative information: EFSA and ECHA link;
upport/documentation/chemical-substances/checkdig). CAS numbers
e) Three-dimensional structure in .mol2 format;
are preliminarily checked for the presence of leading zeros and zeros are
f) Classification: it classifies the molecule based on its use in the food
removed. After that, the checkdig formula has been used on CAS
industry: flavouring, pesticide, dioxin, etc.
numbers to verify their correctness.
The detail information about where and how these data have been
2.2.1. First procedure
obtained is explained below.
Using CAS RN as input query, the entire procedure can retrieve and
check data congruence of the InChIKey extracted from three different
2.3.1. PubChem information
servers: PubChem (www.pubchem.ncbi.nlm.nih.gov), ChemIDPlus
PubChem database has been used to retrieve some food contact
(www.chem.nlm.nih.gov/chemidplus) from the National Institute of
chemical data, as explained in the previous procedures: "CID", "Com
Health (NIH), and CompTox Chemistry Dashboard (www.comptox.epa.
mon_name", "IUPAC_Name", "MolecularFormula", "MolecularWeight",
gov/dashboard) from EPA. Since the manual curation part of incon
"CanonicalSMILES", "InChI", "InChIKey".
gruent data and/or unfound CAS RN took a great amount of time, a
second procedure has been developed.
2.3.2. ECHA number and ECHA link
A python script has been developed to convert CAS RNs into fixed
2.2.2. Second procedure
URLs to automatically retrieve EC numbers and to provide the corre
Starting from the CAS RN information, it has been converted into
sponding link to the ECHA website’ Substance Infocard.
fixed URLs to automatically extract the correct InChIKey information
within the CAS database (www.commonchemistry.cas.org), which is the
2.3.3. 3D structures
official repository of CAS RN. In this step, the presence of salt and
The three-dimensional structures (in .sdf format) of molecules that
mixture was also checked. At the end, the InChIKey information was
passed the previous steps have been retrieved from PubChem using a
used as input query for extracting other information from PubChem
third python script.
("CAS", "CID", "Common_name", "IUPAC_Name", "MolecularFormula",
"MolecularWeight", "CanonicalSMILES", "InChI", "InChIKey").
2.3.4. Calculated chemical information
To store additional chemical information, other data have been
2.3. Database descriptors
calculated using two software:
The foodchem DB stores 27 different fields that can be divided in
3
• Sybyl v.7.: Acceptor, Donore, Hydrophobe, AtomCOunt, BondCount, Table 2

RingCount, RotBonds, Chiral, logP value, Volume (Å3); The total number of food contact chemicals falling in each sub
• FLAP: number of Charge – and Charge + and the Total Charge. class. Food contact chemicals are divided into 11 subclasses:
dioxins, acrylamide, flavourings, food additives, furans, myco
Moreover, the FLAP (Fingerprint for Ligand and Protein) software toxins, pesticides, phthalates, bisphenols, polychlorinated bi
phenyls (PCBs), and food contact chemicals contained in the
was also used to convert the .sdf file into a .mol2 file.
database of Food Packaging Forum (FCCDB).
2.4. SQL and NoSQL Classification Total number (8091)
Dioxins 75
The data have been organized into two different databases, MariaDB Acrylamide 1
Flavourings 2091
and Elasticsearch, written implementing SQL and Bigdata technology
Food Additives 110
(NoSQL – Not only SQL) respectively. We decided to implement two Furans 133
versions of the same database to answer two requirements. An SQL DB Mycotoxins 327
storing structural data of the selected molecules, more suitable for Pesticides 465
docking and molecular dynamics analysis, and a Big Data version able to Phthalates 361
Bisphenols 51
store a different kind of information, not only structural information but PCBs 209
also in vitro/in vivo tests, regulatory reports, etc. The specification of the FCCDB 4268
structure/mapping used in the present work is explained in more detail
in Table 1.
INTeraction).
2.5. Protein preparation
2.8. Molecular docking with Autodock Vina software
The crystallographic structures of 31 nuclear receptors of Homo sa
piens were downloaded from the Protein Data Bank (PDB) (www.rcsb. Molecular docking experiments were performed with Autodock Vina
org). Among them, only 26 structures with high reliability and quality 1.1.2 using default settings (Trott and Olson, 2009). The search space
are available. For this reason, the nuclear receptors (3) with fragmented was included in a box of 24 × 24 × 24 Å, centred on the binding site of
portions, such as constitutive androstane receptor (CAR), nuclear the ligands as mentioned before. The side chain flexibility was allowed
receptor-related 1 protein (NURR1), and estrogen-related receptor alpha for the same residues defined in the GOLD docking. The ligand amide
(ERRα), were built and minimized for 1 ns with NAMD 2.13 software and backbone flexibility were allowed.
package. In addition, the mutated amino acids present in glucocorticoid
receptor (GR) (F602S) and steroidogenic factor 1 (SF-1) (C247S and 3. Results and discussion
C412S) crystallographic structures were replaced. The receptor struc
tures were processed using Sybyl software v8.1 (www.tripos.com). The foodchem DB has been also designed to accelerate computa
Water molecules and ligands were removed, and hydrogen atoms were tional applications since it stores not only regulative information but
added. Energy was minimized using the Powell algorithm with a also chemical-physical properties and three-dimensional structures.
coverage gradient of ≤0.5 kcal (mol Å)− 1 and a maximum of 1500 cy Very careful attention has been made to ensure the correctness of the 3D
cles. For the molecular docking with AutoDock (see below), the receptors structure to the CAS RN. Thus, it has been conceived for a different
were further processed: using AutoDockTools software polar hydrogens purpose compared to the FPF database which does not contain all the
are added to the proteins and the Gasteiger charges were calculated to chemical-physical information used in the foodchem DB and it does not
assign AD4 type to each atom. store the three-dimensional structure. Moreover, our database has been
written in SQL and NoSQL language with the purpose to make it avail
2.6. Ligand preparation able to the scientific community through a website interface where the
user can make searches and extract information. Using our database, the
Structural coordinates of the endogenous and putative ligands were three-dimensional structures of 8091 substances, belonging to different
retrieved from the NCBI PubChem compound database. Software FLAP sub-classes (Table 2), has been extracted and all these molecules have
was used to assign the correct protonation state to each ligand (pH = been screened using a molecular docking approach in order to identify
7.4). the compounds having the capability to bind the thirty-one nuclear re
ceptors. This method allows to screen the substances which have the
2.7. Molecular docking with GOLD software most probable physical-chemical characteristics to act as endocrine
disruptors.
The GOLD software v5.8.1 (CCDC; Cambridge, UK; www.ccd.cam.ac. Two different docking software and four different scoring functions
uk) was applied in order to dock ligands into the binding site of the 31 have been used as in our previous papers (Francesca Cavaliere et al.,
nuclear receptors. For each compound and receptor, 30 binding poses 2020) (Spaggiari et al., 2021). Thus, for each receptor and for each food
were generated. The binding site centroid of each receptor was defined contact chemical, four values have been obtained. In humans, there are
using the coordinates of the crystallographic complexes. The side chain 48 nuclear receptors, but many of these remain “orphans” as their
flexibility was allowed for each receptor amino acid. For the genetic endogenous ligands are yet to be determined. For this reason, if the
algorithm run, a maximum number of 100000 operations were per endogenous ligand is known, the relative binding affinity (RBA) of each
formed on a population of 100 individuals with a selection pressure of molecule was calculated using it as a reference compound. On the other
1.1. The number of islands and the niche size were set to 5 and 2, hand, all the endogenous and no-endogenous co-crystallized ligands
respectively. The default GoldScore fitness function was applied for were docked against the respective nuclear receptors to obtain a refer
performing the energetic evaluations. The distance for hydrogen ence value. A cut-off value was selected for each four docking values: i) a
bonding and the cut-off value for the van der Waals calculation were set cut-off of 50 for GoldScore; ii) a cut-off of 30 for ChemScore; iii) a cut-off
to 2.5 Å and 4.0 Å, respectively. Flip pyramidal N, flip amide bonds, and of − 7 for Autodock (affinity); and iv) a cut-off of 500 for HintScore.
flip ring corners were allowed for ligand flexibility options. After that, To reach a consensus scoring prediction, a robust statistical method
all the poses generated by GOLD software were rescored using the has been used and it is explained in more detail below.
scoring functions ChemScore and HintScore (HINT, Hydropathic As training dataset, the crystallographic structures available from
4
Fig. 1. Results obtained from the robust multivariate statistical procedure. The 31 NRs are on the x-axis, while the number of the molecules (%) is on the ordinate.
The molecules with a score smaller than 0.3 are highlighted in green (A), the molecules with a score between 0.3 and 0.8 are highlighted in yellow (B), the molecules
with a score greater than 0.8 are highlighted in red (C), while the outliers are highlighted in grey (D). (For interpretation of the references to colour in this figure
legend, the reader is referred to the Web version of this article.)
PDB of all ligand-NR complexes were considered. All ligands bound to ∑4

the corresponding receptor were extracted and docked into the ligand- j=1 xij wj
∑4 i = 1, 2, …, n
binding pocket to obtain the corresponding four scoring values. As for j=1 wj
the food contact chemical data, every single value was used to calculate
the relative binding activity considering the natural ligand as a reference where n is the total number of food contact chemicals.
compound: At the end, the results have been divided into three cases based on
their score: i) the molecules with a score between 0.0 and 0.3 are
food contact chemical score
Relative Binding Affinity(RBA)n = considered weak ligands since they interact with the corresponding
reference compound score
nuclear receptor with a binding affinity that is 70% (or more) lower than
the natural ligand (Fig. 1A); ii) the molecules with a score between 0.3
where n is the number of scoring functions.
and 0.8 are considered medium interactor compared to the natural
However, since the distribution data is non-normal for the potential
ligand (Fig. 1B); iii) the molecules with a score between 0.8 and 1.0 are
presence of some outliers, a robust multivariate method was used to
judged as high interactor since they are able to bind the corresponding
detect atypical values. In fact, it is well-known that the presence of
nuclear receptor with a binding affinity that is more than 80% of the
atypical values can affect the results of any statistical analysis especially
natural ligand (Fig. 1C). This latter case also includes the molecules that
when the number of observations is large. Using a confidence level of
can interact with the nuclear receptor with a binding affinity greater
simultaneous 1%, we removed only values that were very far from the
than the natural ligand. Thus, all food contact chemicals falling in this
general bulk of the data. After the outlier removal, the values were
class may be considered as substances of very high concern and should
rescaled in the domain [0 1] setting a score equal to 1 when it was larger
be the first compounds to analyse with further experimental methods in
than the value of the natural ligand. The degree of dispersion of the four
order to re-evaluate their use in the food industry.
rescaled values (X1 , …,X4 ) has been considered by normalizing them in
Fig. 1 shows the percentage of molecules that can interfere with the
order to obtain four new variables (Z1 , …,Z4 ) with 0 mean and variance
endocrine system receptors highlighting their abundance in each spe
equal to 1. After that, a principal component analysis was used on the
cific nuclear receptor. If we focus on the single nuclear receptor, we can
four new variables to identify a weight coefficient for each scoring
underline that more than 50% of food contact chemicals are good
function (w1, w2, w3, w4) in such a way that the explained variance of
4
interactors of liver X receptor β (LXRβ), pregnane X receptor (PXR),
∑
the original variable is as large as possible (wj ≥ 0 and w2j = 1). We progesterone receptor (PR), farnesoid X receptor (FXR), retinoic acid-
j=1
related orphan receptor γ (RORγ), and peroxisome proliferator-
obtained a weight value of 0.12–0.94 – 0.14–0.29, for GoldScore, activated receptor α (PPARα). In fact, LXRβ is the nuclear receptor
HintScore, ChemScore, and Autodock (affinity), respectively. with the highest number of food contact chemicals that fall in the high
As for the training dataset, the relative binding affinity of each interactor group, and thus, it is likely the receptor most affected by the
molecule and scoring function has been rescaled in the [0 1] domain presence of these compounds in our body.
after the outlier removal. To consider the different degrees of dispersion Considering Fig. 1D, we found almost the same number of outlier
of the new rescaled variables, we standardized them to obtain four new molecules in each nuclear receptor except for the estrogen-related re
variables. Since the purpose of the analysis was to combine the four ceptor α (ERRα). This is not surprising since outlier molecules were
scores into a single consensus score prediction, the final scores for the i- generally substances having a high volume compared to the ligand-
th food contact chemical have been obtained as: binding pocket of nuclear receptors. In fact, due to atom-atom clashes,
5
Fig. 2. The percentage of molecules able to bind more than 15 nuclear receptors with high (≥0.8), medium (0.3–0.8), and low binding affinity (<0.3) considering
each class of food contact chemicals.
the molecular docking scores were far away from the normal trend. flavouring compound, and it is also included in the Food Contact
Thus, considering that the volume of the ligand-binding pocket of ERRα Chemical DB (FCCDB), has two different predicted activities for its
is only about 80 Å3 (against the ~300 Å3 of the most nuclear receptor, capability to act as an agonist for the estrogen receptor α. In fact, in the
excluding the PPAR family), it may be plausible to find a higher number Tox21 project (Richard et al., 2021), the quantitative high-throughput
of outliers. screening assay (qHTS) identifies 4′ -Methoxyacetophenone both as
As the second step of our analysis, we turned our attention on which active and inactive for its agonist activity on ERα. In light of this, we
class of food contact chemicals have the greater number of molecules think that there is not an approach that can be judged as better than
able to interfere with the endocrine system. Thus, we counted the another, but all are equally valid and should be considered together.
number of molecules belonging to each class that can interact with more Thus, the present work should not be seen as an opposing method to
than 50 percent of nuclear receptors with high, medium, and low classical in vitro and in vivo tests, but it should be considered as a useful
binding affinity. As we can see in Fig. 2, almost the totality of dioxins, and preliminary method to screen a huge number of molecules in a cost
furans, and PCBs molecules can interact with more than 15 nuclear re and time-effective manner. In fact, using our robust computational
ceptors with high binding affinity, following by the pesticides and method, we screened a large volume of molecules against the nuclear
phthalates sub-classes. receptor family in a relatively short time when compared to the time
The impact of this finding highlights the potential capability of these needed for in vitro and in vivo experiments.
molecules to cause a very broad endocrine effect on the human body.
Considering the medium interactors, a great number of flavourings, Author contribution statement
bisphenols, and FCCDBs fall in this group. The single compound in the
acrylamide class is also able to interact with more than fifteen nuclear Pietro Cozzini – Conceptualization, Methodology, Project adminis
receptors with medium binding affinity. On the other site, food additives tration, Resources, Supervision, Writing reviewing/editing, Giulia
and mycotoxins are more selective in their interaction with nuclear re Spaggiari – Data curation, Formal analysis, Investigation, Methodology,
ceptors, and just a few numbers of molecules can interact with high Validation, Writing – original draft, Francesca Cavaliere – Data curation,
affinity to more than 50 percent of NRs. Formal analysis, Investigation, Methodology, Software Development,
Validation, Writing – original draft.Marco Riani & Gianluca Morelli –
4. Conclusion Statistical methods and software development
One of the reasons that undermine in silico approaches is the avail

ability of highly curated databases from which to retrieve and download Declaration of competing interest
the three-dimensional structure. This is most relevant in the food context
due to the presence of salt and mixture components. In fact, it is frequent The authors declare that they have no known competing financial
on the web to find mixture or salt substances associated with the CAS RN interests or personal relationships that could have appeared to influence
of the main compound. In the present work, we created a database with the work reported in this paper.
a high level of data curation from which to retrieve chemical, structure,
and regulative information about all food contact chemicals. References
Using our foodchem database, we screened 8091 food contact
chemicals against 31 nuclear receptors with the aim to identify the Bissantz, C., Folkers, G., Rognan, D., 2000. Protein-based virtual screening of chemical
molecules that require major attention about their safety for the human databases . 1 . Evaluation of different docking/scoring combinations. J. Med. Chem.
43, 4759–4767. https://doi.org/10.1021/jm001044l.
body. In the food context, wet experiments are the most used and Cavaliere, F., Lorenzetti, S., Cozzini, P., 2020. Molecular modelling methods in food
accepted methods and, thus, there is often a mistrust about the reli safety: bisphenols as case study. Food Chem. Toxicol. 137 https://doi.org/10.1016/j.
ability of computational techniques. However, dry experiments also fct.2020.111116.
Cavaliere, Francesca, Spaggiari, G., Cozzini, P., 2020. Molecular Docking for Computer-
have their drawbacks. For example, the compound 4′ -Methox Aided Drug Design: Fundamentals, Techniques, Resources and Applications.
yacetophenone (CAS RN: 100-06-1), which is used as an additive and Dall’Asta, C., 2016. Mycotoxins and nuclear receptors: a still underexplored issue. Nucl.
Recept. Res. 3 https://doi.org/10.11131/2016/101204.
6
De Coster, S., Van Larebeke, N., 2012. Endocrine-disrupting chemicals: associated Morris, G.M., Lim-Wilby, M., 2008. Molecular docking. Methods Mol. Biol. 443,
disorders and mechanisms of action. J. Environ. Public Health 2012. https://doi. 365–382. https://doi.org/10.1007/978-1-59745-177-2_19.
org/10.1155/2012/713696. Odermatt, A., Gumy, C., 2008. Glucocorticoid and mineralocorticoid action: why should
Desvergne, B., Feige, J.N., Casals-Casas, C., 2009. PPAR-mediated activity of phthalates: we consider influences by environmental chemicals? Biochem. Pharmacol. 76,
a link to the obesity epidemic? Mol. Cell. Endocrinol. 304, 43–48. https://doi.org/ 1184–1193. https://doi.org/10.1016/j.bcp.2008.07.019.
10.1016/j.mce.2009.02.017. Petrakis, D., Vassilopoulou, L., Mamoulakis, C., Psycharakis, C., Anifantaki, A.,
Fucic, A., Gamulin, M., Ferencic, Z., Katic, J., Krayer Von Krauss, M., Bartonova, A., Sifakis, S., Docea, A.O., Tsiaoussis, J., Makrigiannakis, A., Tsatsakis, A.M., 2017.
Merlo, D.F., 2012. Environmental exposure to xenoestrogens and oestrogen related Endocrine disruptors leading to obesity and related diseases. Int. J. Environ. Res.
cancers: reproductive system, breast, lung, kidney, pancreas, and brain. Environ. Public Health 14, 1–18. https://doi.org/10.3390/ijerph14101282.
Heal. A Glob. Access Sci. Source 11, 1–9. https://doi.org/10.1186/1476-069X-11- Richard, A.M., Huang, R., Waidyanatha, S., Shinn, P., Collins, B.J., Thillainadarajah, I.,
S1-S8. Grulke, C.M., Williams, A.J., Lougee, R.R., Judson, R.S., Houck, K.A., Shobair, M.,
Gore, A.C., Chappell, V.A., Fenton, S.E., Flaws, J.A., Nadal, A., Prins, G.S., Toppari, J., Yang, C., Rathman, J.F., Yasgar, A., Fitzpatrick, S.C., Simeonov, A., Thomas, R.S.,
Zoeller, R.T., 2015. EDC-2: the endocrine society’s second scientific statement on Crofton, K.M., Paules, R.S., Bucher, J.R., Austin, C.P., Kavlock, R.J., Tice, R.R., 2021.
endocrine-disrupting chemicals. Endocr. Rev. 36, E1–E150. https://doi.org/ The Tox21 10K compound library: collaborative chemistry advancing toxicology.
10.1210/er.2015-1010. Chem. Res. Toxicol. 34, 189–216. https://doi.org/10.1021/acs.
Grulke, C.M., Williams, A.J., Thillanadarajah, I., Richard, A.M., 2019. EPA’s DSSTox chemrestox.0c00264.
database: history of development of a curated chemistry resource supporting Safe, S., 2004. Endocrine disruptors and human health: is there a problem. Toxicology
computational toxicology research. Comput. Toxicol. 12, 100096 https://doi.org/ 205, 3–10. https://doi.org/10.1016/j.tox.2004.06.032.
10.1016/j.comtox.2019.100096. Schug, T.T., Janesick, A., Blumberg, B., Heindel, J.J., 2011. Endocrine disrupting
Hersey, A., Chambers, J., Bellis, L., Patrícia Bento, A., Gaulton, A., Overington, J.P., chemicals and disease susceptibility. J. Steroid Biochem. Mol. Biol. 127, 204–215.
2015. Chemical databases: curation or integration by user-defined equivalence? https://doi.org/10.1016/j.jsbmb.2011.08.007.
Drug Discov. Today Technol 14, 17–24. https://doi.org/10.1016/j. Spaggiari, G., Iovine, N., Cozzini, P., 2021. In silico prediction of the mechanism of
ddtec.2015.01.005. action of pyriproxyfen and 4′ -oh-pyriproxyfen against a. Mellifera and h. sapiens
Kabir, E.R., Rahman, M.S., Rahman, I., 2015. A review on endocrine disruptors and their receptors. Int. J. Mol. Sci. 22 https://doi.org/10.3390/ijms22147751.
possible impacts on human health. Environ. Toxicol. Pharmacol. 40, 241–258. Teramoto, R., Fukunishi, H., 2007. Supervised consensus scoring for docking and virtual
https://doi.org/10.1016/j.etap.2015.06.009. screening. J. Chem. Inf. Model. 47, 526–534. https://doi.org/10.1021/ci6004993.
Kavlock, R.J., Daston, G.P., DeRosa, C., Fenner-Crisp, P., Gray, L.E., Kaattari, S., Trott, O., Olson, A.J., 2009. Software news and update AutoDock Vina: improving the
Lucier, G., Luster, M., Mac, M.J., Maczka, C., Miller, R., Moore, J., Rolland, R., speed and accuracy of docking with a new scoring function, efficient optimization,
Scott, G., Sheehan, D.M., Sinks, T., Tilson, H.A., 1996. Research needs for the risk and multithreading. J. Comput. Chem. 31, 455–461.
assessment of health and environmental effects of endocrine disrupters: a report of Wang, R., Lu, Y., Wang, S., 2003. Comparative evaluation of 11 scoring functions for
the U.S. EPA-sponsored workshop. Environ. Health Perspect. 104, 715–740. https:// molecular docking. J. Med. Chem. 46, 2287–2303. https://doi.org/10.1021/
doi.org/10.1289/ehp.96104s4715. jm0203783.
Luccio-Camelo, D.C., Prins, G.S., 2011. Disruption of androgen receptor signaling in
males by environmental chemicals. J. Steroid Biochem. Mol. Biol. 127, 74–82.
https://doi.org/10.1016/J.JSBMB.2011.04.004.

Chemosphere: Pietro Cozzini, Francesca Cavaliere, Giulia Spaggiari, Gianluca Morelli, Marco Riani

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chemosphere: Pietro Cozzini, Francesca Cavaliere, Giulia Spaggiari, Gianluca Morelli, Marco Riani

Uploaded by

Copyright:

Available Formats

Chemosphere 292 (2022) 133422

Contents lists available at ScienceDirect

Computational methods on food contact chemicals: Big data and in silico

• Molecular docking and robust consensus

occurrence of conformational changes, signaling cascade as well as Table 1

2.2. Data quality

The foodchem DB stores 27 different fields that can be divided in

• Sybyl v.7.: Acceptor, Donore, Hydrophobe, AtomCOunt, BondCount, Table 2

2.4. SQL and NoSQL Classification Total number (8091)

PDB of all ligand-NR complexes were considered. All ligands bound to ∑4

One of the reasons that undermine in silico approaches is the avail

You might also like

Chemosphere: Pietro Cozzini, Francesca Cavaliere, Giulia Spaggiari, Gianluca Morelli, Marco Riani

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chemosphere: Pietro Cozzini, Francesca Cavaliere, Giulia Spaggiari, Gianluca Morelli, Marco Riani

Uploaded by

Copyright:

Available Formats

Chemosphere 292 (2022) 133422

Contents lists available at ScienceDirect

Computational methods on food contact chemicals: Big data and in silico

• Molecular docking and robust consensus

occurrence of conformational changes, signaling cascade as well as Table 1

2.2. Data quality

The foodchem DB stores 27 different fields that can be divided in

• Sybyl v.7.: Acceptor, Donore, Hydrophobe, AtomCOunt, BondCount, Table 2

2.4. SQL and NoSQL Classification Total number (8091)

PDB of all ligand-NR complexes were considered. All ligands bound to ∑4

One of the reasons that undermine in silico approaches is the avail­

You might also like

One of the reasons that undermine in silico approaches is the avail