DFinder A Novel End-To-End Graph Embedding-Based M

Downloaded from https://academic.oup.com/bioinformatics/advance-article/doi/10.
1093/bioinformatics/btac837/6965015 by Uniwersytet Warminsko-Mazurski w Olsztynie user on 29 December 2022

Subject Section
DFinder: A novel end-to-end graph

embedding-based method to identify drug-food
interactions
Tao Wang 1,2 , Jinjin Yang 1,2 , Yifu Xiao 1,2 , Jingru Wang 1,2 , Yuxian Wang 1,2 , Xi
Zeng 1,2 , Yongtian Wang 1,2 , and Jiajie Peng 1,2∗
1
School of Computer Science, Northwestern Polytechnical University, Xi’an, 710072, China and
2
Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information
Technology, Xi’an, 710072, China
∗ To whom correspondence should be addressed.
Associate Editor: XXXXXXX
Received on XXXXX; revised on XXXXX; accepted on XXXXX
Abstract
Motivation: Drug-food interactions (DFIs) occur when some constituents of food affect the bioaccessibility
or efficacy of the drug by involving in drug pharmacodynamic (PD) and/or pharmacokinetic (PK) processes.
Many computational methods have achieved remarkable results in link prediction tasks between biological
entities which show the potential of computational methods in discovering novel DFIs. However, there are
few computational approaches that pay attention to DFI identification. This is mainly due to the lack of
drug-food interaction data. In addition, food is generally made up of a variety of chemical substances. The
complexity of food makes it difficult to generate accurate feature representations for food. Therefore, it
is urgent to develop effective computational approaches for learning the food feature representation and
predicting drug-food interactions.
Results: In this paper, we first collect drug-food interaction data from DrugBank and PubMed respectively
to construct two datasets, named DrugBank-DFI and PubMed-DFI. Based on these two datasets, two DFI
networks are constructed. Then, we propose a novel end-to-end graph embedding-based method named
DFinder to identify DFIs. DFinder combines node attribute features and topological structure features
to learn the representations of drugs and food constituents. In topology space, we adopt a simplified
graph convolution network-based method to learn the topological structure features. In feature space, we
use a deep neural network to extract attribute features from the original node attributes. The evaluation
results indicate that DFinder performs better than other baseline methods. The source code is available
at https://github.com/23AIBox/23AIBox-DFinder
Contact: jiajiepeng@nwpu.edu.cn
Supplementary information: Supplementary data are available at Bioinformatics online.
1 Introduction drugs or even cause adverse drug events (ADEs), which can be considered
The efficacy of drugs will be affected by many factors, such as drug dosage, as drug-food interactions (DFIs).
drug delivery route and concomitant drugs. Food, which is often taken with In DFIs, food can affect the efficacy of drugs in three ways (Telessy,
drugs, may also have a significant impact on the efficacy of drugs (Jensen 2018): incompatibilities, pharmacokinetics (PK), and pharmacodynamics
et al., 2015). Several studies (Bushra et al., 2011;Schmidt and Dalhoff, (PD). Incompatibility denotes the circumstance when the drug molecule
2002) have demonstrated that food can increase or decrease the activity of and component of food bind together and produce an insoluble compound
which may hinder the bioaccessibility of the drug. The classical example
is that tetracyclines can not be absorbed because of the calcium content
© The Author(s) 2022. Published by Oxford University Press.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License
(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium,
provided the original work is properly cited.
2 Sample et al.
Downloaded from https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btac837/6965015 by Uniwersytet Warminsko-Mazurski w Olsztynie user on 29 December 2022

of dairy products (Neuvonen, 1976). The PK interactions mean that food which put much effort into feature engineering to develop biological
components can enhance or hinder the movement of drug molecules in features, graph embedding methods can learn better node representations
the body during the Absorption, Transport, Metabolism, and Excretion automatically. Yue et al. (Yue et al., 2020) group graph embedding methods
(ADME) phases. For example, grapefruit juice can inhibit the metabolism into three categories: matrix factorization-based, random walk-based, and
of cyclosporine in a short time after administration because of the inhibition neural network-based methods. MF-based methods aim to factorize a data
of the cytochrome P450 enzyme (Hollander et al., 1995). PD interactions matrix into lower-dimensional matrices while still keeping the manifold
are caused by specific pharmacological interactions between drugs and structure and topological properties hidden in the original data matrix
specific food components (Koziolek et al., 2019). One of the examples is (Belkin and Niyogi, 2003;Ahmed et al., 2013;Cao et al., 2015). Random
warfarin antagonizes the action of vitamin-K (originated from food such walk-based methods try to learn node representations by generating node
as spinach) (Lurie et al., 2010). Therefore, the understanding and study sequences through random walks in graphs which generally have two
of DFIs can reduce adverse drug events and improve the absorption of steps. First, random walk is applied to generate node sequences in a
nutrients and drugs. Unfortunately, the known DFIs are just the tip of the graph. Then, the word2vec model (Mikolov et al., 2013) is adopted to
iceberg. There are still many DFIs to be identified. learn embeddings based on the generated sequences of nodes. In this
Our current knowledge about drug-food interactions mainly comes way, the structural proximity of the graph can be preserved. Many neural
from clinical research and biomedical literature (Otles and Senturk, network-based methods have been proposed for graph embedding, such as
2014;Bushra et al., 2011;Chen et al., 2018). However, these studies only multilayer perceptron (MLP) (Tang et al., 2015), auto-encoder (Kipf and
involve a small number of drugs (anti-diabetic drugs, anti-tubercular drugs, Welling, 2016;Wang et al., 2016), generative adversarial network(GAN)
anti-hypertensive drugs, etc.) and limited food (grapefruit, mango, etc.). (Wang et al., 2017) and graph convolutional network (GCN) (KipF and
And most of them focus on revealing how food affects the efficacy of drugs Welling, 2016). Different methods adopt different neural architectures and
from a biomedical perspective which is time-consuming and expensive. use different kinds of graph information as input. Inspired by the success of
Hence, computational approaches are needed to identify potential DFIs, the graph embedding methods in the biomedical link prediction tasks (Peng
rather than just obtaining DFIs from the literature. In drug-drug interactions et al., 2021;Sun et al., 2019;Zhao et al., 2020), we propose a method called
(DDIs) and drug-target interactions (DTIs) identification tasks, many DFinder to identify the interactions between drugs and food. In addition,
computational methods (Ryu et al., 2018;Feng et al., 2020;You et al., DFinder is designed in an end-to-end fashion for better optimizing all the
2019;Peng et al., 2021) have been proposed and successfully used. These parameters. To summarize, our contributions are 3-fold:
methods have made remarkable advances in solving these problems.
• To solve the lack of DFI resources, we construct two DFI datasets
However, to the best of our knowledge, there are not many computational
named DrugBank-DFI and PubMed-DFI. For DrugBank-DFI, we
methods designed for discovering drug-food interactions. There are two
main reasons. The first reason is the lack of systematic collection of drug- collect DFI-related information directly from the DrugBank database
food interaction data. It is challenging to collect a large amount of DFIs and then parse this information to generate the DrugBank-DFI dataset.
data. The lack of gold standard data is a big obstacle for data-driven For PubMed-DFI, we adopt the co-occurrence-based text mining
computational methods. Second, it is difficult to obtain a good feature method to obtain the PubMed-DFI dataset from PubMed.
• We propose a novel graph representation learning method named
representation of food, food is generally made up of a variety of chemical
substances. Most descriptions of food so far are general and vague. The DFinder to learn better representations for drugs and food. DFinder
complexity of food makes it difficult to generate accurate food features. combines node features and topological structures to learn the final
By now, only a few efforts have been made to identify DFIs. Ryu et al. representations of drugs and food constituents based on the DFI
(Ryu et al., 2018) propose DeepDDI, which first uses principal component networks.
• The evaluation results show that DFinder outperforms the existing
analysis (PCA) to reduce the dimension of the drug features and then
feeds the learned features into a deep neural network (DNN) classifier. As approaches for DFI prediction task.
an extended application, DeepDDI is applied to identify the interactions
between drugs and food constituents. Although DeepDDI can be used
to identify DFIs, it is mainly designed for identifying different types of
DDIs. Ni et al. (Ni et al., 2017) identify drug-food interactions based on the 2 Methods
overlap of proteins associated with drugs and food compounds. However,
this method ignores several types of information, such as the attributes of We propose a novel end-to-end graph embedding-based method named
drugs and food and the known interactions between drugs and food. DFinder to identify DFIs based on the DFI network. Our work
As a data structure, graphs (a.k.a. networks) contain a set of entities and mainly contains two parts: DFI network construction and computational
relationships between these entities, which makes them very suitable for framework for drug-food interaction prediction. In the first part, we
modeling biomedical systems (Yi et al., 2022). In recent years, predicting generate the DrugBank-DFI dataset from the DrugBank database and
interactions between biomedical entities is usually modeled as a link use the co-occurrence-based text mining method to obtain the PubMed-
prediction problem on a graph (Wen et al., 2018;Zitnik et al., 2018). DFI dataset from PubMed. Then we construct two drug-food interaction
Similarly, the identification of DFIs can also be modeled as a link prediction networks based on these two datasets respectively. In the second part,
task. In detail, drug-food interactions can be represented by a network we propose a graph embedding method to learn the low-dimensional
where nodes represent drugs or food and edges represent the relationships representations of drugs and food constituents. We calculate interaction
between food and drugs. For graph analysis, many graph embedding (a.k.a. scores between drugs and food components using the inner product method
graph representation learning) methods have been proposed. The goal of based on low-dimensional representations and optimize our model by
these methods is to learn a low-dimensional representation of each node Bayesian Personalized Ranking (BPR) loss.
while maximally preserving the structural information of graphs. The
learned representation can be used as features for different downstream
graph analysis tasks, such as link prediction(Peng et al., 2019), node 2.1 The Construction of DFI networks
classification(Wang et al., 2021), community detection(Wang et al., 2020), In this section, we introduce how to collect drug-food interaction data and
and clustering (Yi et al., 2022). Compared with traditional methods construct DFI networks.
Drug-food interaction identification 3

2.1.1 DFI network based on DrugBank
We first collect drug-food interaction data from DrugBank (v 5.1.7)
database which contains specific information for each drug (i.e., drug
interactions, pharmacology, chemical structures, targets, metabolism, “small molecule” “small molecule”
etc) (Wishart et al., 2006;Wishart et al., 2007). We download the XML type type
formatted DrugBank dataset which stores sporadic information on drug- Food Constituents Drug List
food interactions. By parsing the XML document, we obtain about 3000 List
sentences describing information on drug-food interaction. We manually
extract DFIs from these sentences. Among the 3000 sentences, some
describe the influence of the time of taking medicine. For example, "Take Drug-Food Constituent
pairs
on an empty stomach." and "Take at least 1 hour before eating a meal". In
addition, there are some sentences that do not show clearly the effects co-occurrence-based text mining
of specific foods on drugs, such as "Take with food. This increases
~3,300,000 abstracts
the bioavailability of darolutamide. ". Therefore, we manually extract
DFIs from sentences by screening out the mutex relationship, no-effect
relationship, and other unclear relationships between food constituent and
drug. MEDLINE
The extracted food involved in DFIs can be divided into two types: life science journals
food (e.g. avocado, figs and cheese) and food constituents (e.g. caffeine, online books
vitamin c and zinc). It is hard to have a unified representation of these
two types of food data. Therefore, we use the top 20 food constituents
Three filtering rules:
with high content in the food to represent the food. This is also consistent
• limit the number of co-occurrences(≥ 20)
with the fact that the interaction between drugs and food is actually the • Delete overlaps in food constituents and drugs
interaction between drugs and food components (Ryu et al., 2018;Ni et al., • Delete the pairs with high similarity
2017). We use the FooDB database (v 1.0) to obtain constituents of food.
FooDB contains numerous records on foods, compounds, contents of food
components and the mapping of food and its components. In addition, in
the process of data acquisition, we only retain "small molecule" types of Drug
…
…
drugs and food components. The final DrugBank-DFI dataset contains Food
Constituent
1,784 interactions, representing 143 drugs, and 213 food components.
The density of the DrugBank-DFI bipartite network is 0.059, which is
comparable with other DFI networks (Supplementary Table 1). Drug-Food Constituent Interactions
Fig. 1. The flowchart of co-occurrence-based text mining method. Firstly, we obtain a

candidate drug list from DrugBank and a candidate food constituent list from FooDB
2.1.2 DFI network based on PubMed respectively. Secondly, we combine all the drugs and food constituents in pairs. Thirdly, we
Although some DFIs have been recorded in the DrugBank, more valuable input each drug-food constituent pair into PubMed to get the number of co-occurrence in
DFIs are still buried in the biomedical literature. Therefore, we try to abstracts of publications. Next, in order to improve the quality of data, we set up three rules
extract DFIs from the literature. Co-occurrence of biological entities to further filter the found drug-food constituent pairs. Finally, we use the filtered drug-food
in the literature is a simple, comprehensive, and popular technique for constituent pairs to construct a DFI network.
identifying entity associations. The technique is based on the assumption

that if a biological entity appears in the same document as another
biological entity, the two entities should have a high probability of being pairs with potential interactions based on their co-occurrence in abstracts
biologically related (Shanfeng et al., 2005; Fleuren and Alkema, 2015). of publications in PubMed. Next, we set up three rules to further filter
This hypothesis was already validated in several existing studies (Junge the identified drug-food constituent pairs for improving the quality of the
and Jensen, 2020;Szklarczyk et al., 2019;Pletscher-Frankild et al., 2015). dataset. The three rules are as follows:
Therefore, we adopt a co-occurrence-based text mining method to obtain • Rule 1: The number of co-occurrences should be larger than a given
DFIs from the literature. The workflow of text mining is shown in Figure threshold. We only keep drug-food constituent pairs that co-occurred
1. more than twenty times in PubMed.
To aid this purpose, we use the standard PubMed Advanced search • Rule 2: Since constituents of food and drugs may be the same, we
API (ncbi.nlm.nih.gov/books/NBK3827/\#pubmedhelp.
removed the pairs including the same constituents.
Advanced_Search) for text mining. In detail, we used the standard • Rule 3: In some cases, the constituents of food and drugs are
"ESearch" utility (eutils.ncbi.nlm.nih.gov/entrez/eutils/ mentioned in the same article because they are similar. To avoid this
esearch.fcgi) to query the occurrence of drug or food constituent, and problem, we calculated the similarity of the names of each pair and
we could output all of the paper IDs whose abstracts contain both the two deleted the pairs with high similarity.
entities using in-house scripts. As of Oct. 2021, we searched about 3.3
million abstracts from PubMed. Similar to DrugBank-DFI, we only focus After applying these three rules, we obtain the PubMed-DFI dataset,
on "small molecule" types of drugs and food components when mining which contains 15,890 drug-food constituent interactions, representing
DFIs from PubMed. Firstly, we obtain all small molecule drugs from the 779 drugs and 818 food constituents. The density of the PubMed-DFI
DrugBank dataset to form a list of candidate drugs. Then we generate all bipartite graph is 0.025. We compare the detailed graph statistics of the
food components in FooDB and screen out the small molecule components two DFI datasets with other DFI and DDI networks and find that the sizes
to form a candidate food component list. Secondly, we combined all the and densities of our DFI networks are reasonable (Supplementary Table
drugs and food components in pairs and then identify drug-food constituent 1).
4 Sample et al.

p
To be noted, the PubMed-DFI is a co-occurrence-based DFI network, 1/ |NF ||ND | can avoid the scale of embeddings increasing with graph
which might be affected by the not real interactions. But it is still valuable convolution operations. After the K-th graph convolution operation, we
in developing and evaluating the robustness of our DFinder method since get the final representations of drugs (and food constituents) by combining
the co-occurrence-based FDIs largely expand the size of the drug-food the embeddings obtained at each layer:
bipartite network. Furthermore, the PubMed-DFI dataset may benefit
further DFI validations by domain experts in the future. More about the K
1 X (k)
co-occurrence-based DFI construction is discussed in the Supplementary TD = T (4)
K + 1 k=0 D
Information.
2.2 Computational Framework for DFI prediction K

1 X (k)
The DFI network we obtain in section 2.1 is a bipartite graph containing TF = T (5)
K + 1 k=0 F
two types of nodes (drugs and food constituents) and one type of edges
(interactions between drugs and food constituents). The overall framework In this way, we can learn the node embeddings that capture the topology
of the DFinder is shown in Figure 2. The key idea of this model is information TD for drug D and TF for food constituent F in the topology
to combine the node feature and topological structure to obtain a better space.
representation of a drug (or food constituent). In topology space, we adopt
a simplified GCN-based method to learn the topological structure of a
2.2.2 Attribute Feature Extraction
network and obtain embedding TD of nodes. In feature space, a deep
In the link prediction task, the original attribute information of the entity to
neural network is used to reduce the dimension of original node attribute
be predicted may have a great impact on the performance of the prediction.
information to learn embedding AD of the node. There are 4 layers in
To improve the effect of prediction, a lot of work studies how to use
the DNN. The input layer has 2159 neurons. The first and second hidden
attribute information for entities. For example, DeepDDI (Ryu et al.,
layers have 1024 and 512 neurons, respectively. The output layer has 64
2018) adopted a simplified molecular-input-line-entry system (SMILE)
neurons. We use the ReLu function as the activation function in each layer.
description to establish a structural similarity profile (SSP). The SSP is a
Then we obtain the final embedding ZD of the node by concatenating TD
feature vector that can capture structure features of individual drugs and
and AD .
food compounds. The SSP contains pairwise structural similarity scores
obtained from the comparison with the 2159 approved drugs of DrugBank.
2.2.1 Structure Feature Extraction
In addition, DeepDDI compared the performance of DeepDDI using four
In this section, we describe how to extract structure features from the
different feature vectors: SSP, molecular descriptor profile (MDP), and
network. As a neural network designed to learn graph data, GCN has
feature vectors generated by two state-of-the-art vectorization methods
shown great popularity in solving graph analysis problems such as node
Molecular Autoencoder (Gómez-Bombarelli et al., 2018) and Mol2vec
classification (Abu-El-Haija et al., 2019;Abu-El-Haija et al., 2020), graph
(Jaeger et al., 2018). SSP slightly outperforms the other three feature
classification (Jiang et al., 2020;Xie et al., 2020), link prediction (Li et al.,
vectors when collaborating with the DeepDDI framework, which proves
2019) and recommendation (Fan et al., 2019). A typical GCN (KipF and
that SSP can effectively capture a unique structural feature of a given drug
Welling, 2016) layer consists of three operations: feature transformation,
or food component. The specific calculation process of SSP is shown in
neighbor aggregation, and non-linear activation. The propagation rule can
Supplementary Figure 1. In the proposed model, we adopt SSP as the
be defined as:
attribute information (feature vector) for drug and food constituent. First,
1 1
we generate an SSP for each node as original attribute information. Then,
Z (l+1) = σ D̃− 2 ÃD̃− 2 Z (l) W l (1)
we use a three-layer deep neural network as the feature extractor to learn
the attribute feature from the original SSP. After inputting SSP into DNN,
where Ã = A + IN is the adjacency matrix with added self-connections. we finally get the low-dimensional embedding as a feature vector for each
P
IN is the identity matrix. D̃ is the degree matrix, where D̃ii = j Ãij ; node in the feature space.
l
W is a layer-specific trainable weight matrix; σ (·) denotes an activation
function.
However, He et al. (He et al., 2020) find that feature transformation
2.3 Feature fusion of different spaces
and non-linear activation in standard GCN have no positive effect on We concatenate structure features and attribute features to generate the
recommendation performance. They propose LightGCN only containing final representation of each node:
the neighborhood aggregation operation which is easier to be trained and
more effective. The goal of the recommendation system is to mine the ZD = (TD ||AD ) ; ZF = (TF ||AF ) (6)
relationship between users and items which is similar to the link prediction
problem of biological networks. Inspired by their work, the proposed
where ZD and ZF denote the final embedding of drug and food constituent
model only employs the simple weighted sum aggregator to learn the
respectively, AD and AF represent the attribute feature of the drug entity
embeddings of drugs and food constituents. The propagation rule on the
and food consistent entity, respectively, and || denotes concatenation
DFI network is defined as:
operation.
(l+1)
X 1 (l)
TD = p p TF (2)
|ND | |NF |
F ∈ND 2.4 Model Training
1 We introduce the prediction of DFIs and the optimization of our model
(l+1) (l)
X
TF = p p TD (3) in this section. The DFI prediction is defined as the inner product (Long
D∈NF
|NF | |ND |
et al., 2020) of the drug and food constituent’s final representations. In
where ND and NF respectively denote the neighbor set of drug D and detail, given a drug node Di and a food constituent node Fj , ZDi and
food constituent F in the DFI network. The symmetric normalization term ZFj are their final representations respectively. The probability that there

A C Link Prediction
… Concatenate ܼ஽భ ٖ ‫ݕ‬ො஽భிయ

…
ܵܵܲ஽భ ‫ܣ‬஽భ ܼ஽భ

DNN
ܼ஽்భ ܼிయ
ܼிయ
B Optimization
Layer 3 ܶ஽భ
Layer 2
Layer 1 (଴)
ܶ஽భ ْ Layer ‫ݕ‬ො஽ி೔ ‫ܨ‬௜ ‫ܰ א‬஽ Bayesian
(௟ିଵ)
‫ܨ‬ଵ Combination
ܶிభ Personalized
(ଷ) ௄
ܶ஽భ 1 (௞) Ranking Loss
(௟ିଵ)
ܶிమ ‫ܨ‬ଶ Aggregation ܶ஽భ = ෍ ܶ஽భ ‫ݕ‬ො஽ிೕ
(ଶ)
ܶ஽భ ‫ܭ‬+1 ‫ܨ‬௝ ‫ܰ ב‬஽
௞ୀ଴
(௟) 1 (௟ିଵ)
(௟ିଵ)
ܶிయ ‫ܨ‬ଷ ܶ஽భ = ෍ ܶி೔ (ଵ) ௄
ܶ஽భ
௜‫א‬ேವభ ܰ஽భ ܰி೔ ଶ
neighbors of ۲૚
‫ܮ‬஻௉ோ = െ ෍ ෍ ෍ ln ‫ݕ‬ො஽ி೔ െ ‫ݕ‬ො஽ிೕ + ߣ ȣ
஽ୀଵ ி೔ ‫א‬ேವ ிೕ‫ב‬ேವ
Fig. 2. The framework of DFinder. We identify the interaction between D1 and F3 as an example to describe the process of our model. DFinder contains three main parts. (A)Attribute
feature extraction. We use a three-layer deep neural network to learn the embedding of node AD1 in feature space. (B)Structure feature extraction. We use a simplified GCN-based method
which only contains the neighborhood aggregation operation to learn the embedding of node in topology space. For instance, in the first layer, drug D1 aggregates the feature of its neighbors
(F1 , F2 and F3 ) by simple weighted sum aggregator. Then we combine the embeddings obtained at each layer to form the final topological feature TD1 . (C) Finally, we concatenate the
topological feature vector TD1 and attribute feature vector AD1 as the final representation ZD1 . The attribute and structure features for food constituents are extracted in the same way
as drug features. And the final representation of food constituent F3 is concatenated from its corresponding attribute and structure features. Finally, we use the inner product to get the score
between D1 and F3 and use the Bayesian Personalized Ranking loss to optimize the model.
is an interaction between Di and Fj can be calculated as: 3.1 Experimental settings

3.1.1 Data generation
T The DFIs in DrugBank-DFI and PubMed-DFI are considered positive
ŷDi Fj = ZD i
· ZFj (7)
samples. We take all the unobserved DFIs in the two datasets as negative
samples. For positive samples, we randomly select 20% of the whole
where ŷDi Fj is the interaction score between drug node Di and food observed DFIs as the testing set and use the remaining 80% as the training
constituent node Fj . set. For negative samples, the training and testing sets are divided in the
We used the Bayesian personalized ranking (BPR) loss as the objective same way as positive samples.
function for model optimization (Rendle et al., 2012). The BPR loss
is a pairwise personalized ranking loss, and has been widely used in
recommendation systems. The training data of BPR consists of positive
3.1.2 Performance evaluation
pairs (observed DFIs) and negative pairs (unobserved DFIs), and it
In this work, the identification of DFIs can be considered a link prediction
encourages the prediction of observed interactions higher than unobserved
task. Thus, we adopt Area Under the Receiver Operating Characteristic
interactions.
Curve (AUROC) and Area Under the Precision-Recall Curve (AUPR)
to evaluate our model. As far as we know, there are only a few
M
computational methods designed for DFI identification, so we mainly
X X X
LBP R = − lnσ ỹDFi − ỹDFj + λ ∥Θ∥2
D=1 Fi ∈ND Fj ∈N
compare our method with the graph embedding methods which are
/ D
(8) developed for the link prediction task on the graph. In the evaluation test,
wherehσ (·) is the sigmoid function. Θiis the embedding of the 0-th layer. we compare DFinder with other twelve algorithms which contain eleven
(0) (0) (0) (0)
Θ = TD1 , ..., TD , TF1 , ..., TF . λ is the parameter to control the graph representation learning methods and preferential attachment (PA)
M N
L2 regularization strength. We use the mini-batch Adam (Kingma and Ba, method (Liben-Nowell and Kleinberg, 2003), where graph representation
2014) as the optimizer, which can calculate the learning rate adaptively learning methods can be categorized into three groups: MF-based methods
for each parameter. (Laplacian (Belkin and Niyogi, 2003), GF (Ahmed et al., 2013), SVD,
GraRep (Cao et al., 2015) and HOPE (Ou et al., 2016)), random walk-
based methods (DeepWalk (Perozzi et al., 2014), node2vec (Grover and
Leskovec, 2016), struc2vec (Ribeiro et al., 2017)) and neural network-
3 Result based methods (LINE (Tang et al., 2015), SDNE (Wang et al., 2016) and
To evaluate the performance of DFinder, we test our model on DrugBank- GAE (Kipf and Welling, 2016)). The PA method estimates the probability
DFI and PubMed-DFI respectively. DrugBank-DFI is generated from of the cooccurrence of an edge (A, B) as a function of the product of
DrugBank containing 1784 interactions. PubMed-DFI is generated by a degrees of end nodes A and B. To fair the comparison, in the performance
co-occurrence-based text mining method from PubMed containing 13580 evaluation of PA, we keep the node degree distribution similar in the
interactions. positive edge set and negative edge set. During the performance evaluation,
6 Sample et al.

Table 1. The detailed AUROC and AUPR performance of DrugBank-DFI Table 2. The detailed AUROC and AUPR performance of PubMed-DFI dataset.
dataset. Bolded numbers are the best performance. Bolded numbers are the best performance.
Method AUROC AUPR F1-score Recall Precision Method AUROC AUPR F1-score Recall Precision
Laplacian 0.8882 0.5377 0.3212 0.2211 0.5874 Laplacian 0.8810 0.2779 0.5572 0.0984 0.5591
GF 0.8875 0.5419 0.377 0.2763 0.5932 GF 0.8790 0.2797 0.1336 0.0759 0.6
MF-based SVD 0.8810 0.5376 0.3811 0.2763 0.614 MF-based SVD 0.8901 0.2921 0.1652 0.0981 0.5474
GraRep 0.8906 0.5430 0.3943 0.2895 0.618 GraRep 0.8959 0.3051 0.1768 0.1049 0.5619
HOPE 0.8881 0.5384 0.3922 0.2921 0.5967 HOPE 0.8835 0.2811 0.1571 0.09 0.5993
DeepWalk 0.8875 0.5283 0.3365 0.2368 0.5807 DeepWalk 0.8845 0.2896 0.1632 0.0961 0.5572
Random Random
node2vec 0.8905 0.5420 0.3308 0.2263 0.6143 node2vec 0.8681 0.2575 0.1427 0.0815 0.5729
walk-based walk-based
struc2vec 0.8884 0.5285 0.3949 0.2868 0.6337 struc2vec 0.8977 0.2991 0.1755 0.1049 0.5356
LINE 0.8888 0.5381 0.3709 0.2684 0.6 LINE 0.8868 0.2877 0.1606 0.0934 0.5738
Neural Neural
SDNE 0.8483 0.4460 0.0001 0.0001 0.0001 SDNE 0.8799 0.2634 0.0035 0.0018 0.375
Net-based Net-based
GAE 0.8709 0.4896 0.313 0.229 0.4943 GAE 0.8322 0.1570 0.0024 0.0012 0.4444
PA 0.5741 0.0788 0.1588 0.1592 0.1584 PA 0.6119 0.0567 0.1288 0.1293 0.1283
DFinder 0.9823 0.9326 0.8722 0.8733 0.8711 DFinder 0.9190 0.4734 0.5353 0.5354 0.5353
Table 3. Performance comparison between DFinder and seven DDI-prediction

each dataset was randomly separated five times. Detailed information on methods on DrugBank-DFI and PubMed-DFI datasets. Bolded numbers are the
these methods can be found in the Supplementary Document. best performance.
Dataset Method AUROC AUPR F1-score Recall Precision
3.2 Performance evaluation on DrugBank-DFI CASTER 0.5853 0.0733 0.0743 0.0741 0.0744
We first adopt the DrugBank-DFI dataset to validate the performance of MR-GNN 0.6262 0.1076 0.1673 0.167 0.1675
DFinder. The error bar chart of AUROC and AUPR on the DrugBank-DFI GCN-BMP 0.7301 0.1977 0.2288 0.2285 0.2291
dataset is shown in Figure 3. The detailed performance of AUROC and DrugBank EPGCN-DS 0.7283 0.176 0.0032 0.0016 0.1
AUPR on the DrugBank-DFI dataset can be seen in Table 1. We compare -DFI SSI-DDI 0.922 0.5041 0.4058 0.2992 0.6491
DFinder with the twelve existing algorithms we mentioned in section DeepDrug 0.9013 0.3733 0.393 0.3923 0.3936
3.1.2. The experiment results demonstrate that DFinder has significantly DeepDDI 0.9348 0.6217 0.4162 0.2879 0.754
improved the performance of DFI identification. In detail, the AUROC DFinder 0.9823 0.9326 0.8722 0.8733 0.8711
of DFinder is 9.17 % higher than other methods (from 0.57 to 0.89), CASTER 0.6437 0.0551 0.1003 0.1002 0.1003
and AUPR is 38.96 % higher than other methods (from 0.08 to 0.54). MR-GNN 0.6672 0.086 0.1425 0.1425 0.1425
The reason for the performance improvement is that DFinder can not GCN-BMP 0.7566 0.1065 0.1691 0.1691 0.1691
only aggregate the topological neighbors’ feature information in the DFI PubMed EPGCN-DS 0.7769 0.1262 0.0229 0.0117 0.6234
network but also introduce abundant attribute information for nodes. By -DFI SSI-DDI 0.856 0.2156 0.0891 0.0485 0.5575
fusing node features and topological structures, DFinder can learn better DeepDrug 0.8919 0.2299 0.2975 0.2975 0.2976
representations for drugs and food constituents. In addition, our model DeepDDI 0.9119 0.4639 0.4787 0.4786 0.4788
learns the representations of nodes in an end-to-end way. In summary, DFinder 0.919 0.4734 0.5353 0.5354 0.5353
DFinder outperforms the other 12 methods which we use as baseline
models in the identification of DFIs.
3.3 Performance evaluation on PubMed-DFI
DFinder with seven state-of-the-art DDI prediction methods, including
To further explore the performance of our model, we also apply DFinder CASTER (Huang et al., 2020), MR-GNN (Xu et al., 2019), GCN-BMP
to PubMed-DFI to evaluate its performance. Compared with all baseline (Chen et al., 2020), EPGCN-DS (Sun et al., 2020), SSI-DDI (Nyamabo
models, the proposed DFinder generally achieves the best performance et al., 2021), DeepDrug (Yin et al., 2022), and DeepDDI (Ryu et al., 2018).
according to AUROC and AUPR (see Table 2). The error bar chart of Detailed information on these methods can be found in the Supplementary
AUROC and AUPR on PubMed-DFI dataset is shown in Figure 4. In detail, Document.
the AUROC of the DFinder is 2.13 % higher than other methods (from To fair the comparison, all these experiments are performed in the
0.61 to 0.90), and AUPR is 16.83 % higher than other methods (from 0.06 same way as DFinder. The compared methods use the DFI network and
to 0.31). DFinder consistently outperforms all the graph representation SMILE-based attribute features on nodes as inputs and are trained in the
learning methods, indicating the effectiveness of combining structure default settings. The detailed performance metrics, including AUROC,
information of the network and attribute information of node as the AUPR, F1-score, Recall, and Precision, of 8 methods on DrugBank-DFI
representation of nodes. In addition, graph representation learning methods and PubMed-DFI are shown in Table 3. The performance of AUROC and
always follow a two steps pipeline: low-dimensional representations are AUPR metrics on DrugBank-DFI is shown in Figure 5. DFinder performs
first learned for nodes and then used as features for different downstream the best in all of the metrics. For AUROC, DFinder reaches 0.98, which is
tasks. The experimental results also show that the end-to-end paradigm 4.75% higher than other methods (ranging from 0.59 to 0.93). For AUPR,
can learn a better representation for downstream tasks. In summary, this DFinder reaches 0.93, which is 31.1% higher than other methods (ranging
experiment demonstrates the effectiveness of the DFinder. from 0.07 to 0.62). And we illustrate the AUROC and AUPR metrics on
3.4 Performance comparison with DDI prediction methods PubMed-DFI in Figure 6. In detail, the AUROC of DFinder is the best
Since the drug and food constituents have been represented as molecules, (0.92), followed by DeepDDI (0.91) and DeepDrug (0.89). The AUPR of
the methods designed for predicting drug-drug interactions are able to DFinder is also the best (0.47), followed by DeepDDI (0.46) and DeepDrug
predict the DFI by treating the DFI network as a homogeneous DDI (0.23). From the above results, we can see our DFinder performs better
network. To further evaluate the performance of our method, we compare than those DDI-prediction methods. These experiments indicate that those

Fig. 3. DFinder improves DFI prediction performance on DrugBank-DFI dataset. We compared the DFI prediction performance of DFinder to other approaches. In the figure, the left and
right ones represent the AUROC and AUPR results of different methods respectively.
Fig. 4. DFinder improves DFI prediction performance on PubMed-DFI dataset. We used the same baseline models as what we used on DrugBank-DFI. In the figure, the left one represents
the AUROC results and the right one represents the AUPR results of different methods.
Fig. 5. Performance comparisons between DFinder and seven DDI prediction methods on the DrugBank-DFI dataset. In the figure, the left and right ones represent the AUROC and AUPR
results of different methods respectively.
Fig. 6. Performance comparisons between DFinder and seven DDI prediction methods on the PubMed-DFI dataset. We used the same baseline models as what we used on DrugBank-DFI.
In the figure, the left one represents the AUROC results and the right one represents the AUPR results of different methods.
8 Sample et al.

DDI prediction methods treating the DFI network as a homogeneous graph as in patients with rheumatoid arthritis. Klawitter et al. have proven the
may lose the specific heterogeneous topology features underlying DFI genetic mechanism that a low-salt diet would induce higher sensitivity to
networks, and highlight the necessity of designing specific DFI prediction cyclosporine (Klawitter et al., 2012). The DFI of "fexofenadine and d-
frameworks, like DFinder. glucose" is also predicted by DFinder. Fexofenadine is a medicine used
in the treatment of allergy symptoms, such as hay fever and urticaria. The
clinical has indicated that consuming large amounts of certain fruit juices
3.5 Evaluation of DFinder discoveries may decrease the levels of fexofenadine in the body, while the d-glucose
To evaluate the prediction results of DFinder, we collect two other DFI is enriched in the fruit juices. These evidence demonstrate the accuracy
resources (FIDEO (Bordea et al., 2020) and POMELO (Hamon et al., and practical value of our method.
2017)) and one DDI resource (DDI2013 (Herrero-Zazo et al., 2013)). The
Food Interactions with Drugs Evidence Ontology (FIDEO) is an ontology A. Evaluation of DFinder predictions on DrugBank-DFI
used for annotation and retrieval of scientific articles about food-drug Prediction FIDEO Prediction POMELO Prediction DDI2013
interactions (Bordea et al., 2020). We extract 1,751 drug-food interactions
from the FIDEO after preprocessing. And after decomposing the food
36 243
into food constituents, we obtained 10,938 DFIs. The POMELO is a 3018 427 1853 322 428 6 52
medline corpus with manually annotated food-drug interactions (Hamon

et al., 2017). We extract 2,180 drug-food-constituent interactions from it
after manually removing duplicated records, and records representing "no B. Evaluation of DFinder predictions on PubMed-DFI
effect" or containing unclear food or drug entities. The DDI corpus (named Prediction FIDEO Prediction POMELO Prediction DDI2013
DDI2013 in this work) is an annotated corpus for drug-drug interactions.
After removing the duplication, we obtained 4,060 DDIs for downstream
1853 204 703 711 46 109 1131 11 95
analysis. Note that the POMELO-DFI and FIDEO-DFI are bipartite
graphs, while DDI2013-DDI is a homogeneous graph in spite of some
drug compounds being also food constituents. The detailed description
Fig. 7. Evaluation of DFinder predictions on DrugBank-DFI and PubMed-DFI using
of data sources and preprocessing can be found in the Supplementary
FIDEO-DFI, POMELO-DFI, and DDI2013 evaluation datasets. Comparisons were limited
document. to the common node entities that are in the DFinder dataset and evaluation dataset. The
We use the FIDEO-DFI, POMELO-DFI, and DDI2013-DDI to predicted edges were prioritized by the top 20% of prediction probabilities. (A) Venn
evaluate the results of DFinder. The top 20% of DFI predictions, based diagrams of DFinder predictions on DrugBank-DFI with three evaluation datasets. (B)
on prediction probabilities, on DrugBank-DFI and PubMed-DFI are Venn diagrams of DFinder predictions on PubMed-DFI with three evaluation datasets.
prioritized for evaluation. The comparisons are limited to the common node
entities that are in our data and the evaluation data. Venn diagrams are used
to illustrate the overlap between predicted DFIs and each evaluation data,
as shown in Figure 7. In detail, in the FIDEO evaluation, we can see that 427
DFIs in FIDEO can be predicted by DFinder based on the DrugBank-DFI
4 Conclusion
dataset, accounting for 18.7% of full FIDEO-DFIs that can be predicted. In this paper, we propose DFinder as a framework to identify interactions
204 (22.5%) DFIs in FIDEO can be predicted by DFinder based on the between drugs and food constituents by using a graph embedding-based
PubMed-DFI dataset. In the POMELO evaluation, 36 (12.9%) DFIs and method on DFI networks. Due to the lack of drug-food interaction data, we
46 (29.7%) DFIs in the POMELO can be predicted by DFinder based construct two DFI datasets named DrugBank-DFI and PubMed-DFI for
on the DrugBank-DFI and PubMed-DFI, respectively. In the DDI2013 link prediction tasks. We obtain the DrugBank-DFI dataset from DrugBank
evaluation, we find only 6 and 11 DFIs in the DDI2013 predicted by the and get the PubMed-DFI dataset from PubMed by using the co-occurrence-
DFinder. This is mainly due to the small number of common node entities based text mining method. We construct two DFI networks respectively
between DDI2013 (only drug entities) and our datasets (drug and food based on the two datasets. DFinder combines node attribute features
entities). We assume that there might be a large distinction between drug and topological structure features to learn the low-dimensional feature
entities and food constituents. So we compare the full list of drugs from representations of drugs and food constituents. Then, we optimize the
DrugBank and the full list of food constituents from FooDB, and we find model in an end-to-end fashion. To evaluate the performance of our model,
there are only a small number of common entities between drugs and we compare DFinder with 11 graph representation learning methods and
food constituents (Supplementary Figure 2). This also demonstrates the 7 DDI prediction methods on these two networks. The experiment results
importance of specifically building DFIs from multiple sources, which will show that the performance of DFinder is better than other methods on the
compensate for each other. DFI identification task.
We also test the performance of DFinder on the combined dataset based
on the POMELO-DFI and the FIDEO-DFI. The performance evaluation
Funding
process is the same as we have described in section 3.1.2. The DFinder
achieves an AUROC equal to 0.9802, AUPR equals to 0.8781, and F1- This work has been supported by National Natural Science Foundation of
score equal to 0.8307, indicating stable performance on the combined China (No.62072376 and No.62102319).
dataset. To aid further validation, the DFinder-predicted DFIs, which can
be replicated in at least one other resource, including the FIDEO-DFI,
POMELO-DFI, and DDI2013, have also been deposited on GitHub. There
are 450 and 223 predicted DFIs in DrugBank and PubMed, respectively, References
that can be replicated with at least one of the other resources. For example,
Abu-El-Haija, S., Perozzi, B., Kapoor, A., Alipourfard, N., Lerman, K.,
the DFI of "cyclosporine and sodium" is not recorded in the DrugBank,
Harutyunyan, H., Ver Steeg, G., and Galstyan, A. (2019). Mixhop: Higher-
but is predicted by DFinder. And this DFI is replicated in POMELO and order graph convolutional architectures via sparsified neighborhood mixing. In
FIDEO. Cyclosporine is used as an immunosuppressant medication such international conference on machine learning, pages 21–29. PMLR.

Abu-El-Haija, S., Kapoor, A., Perozzi, B., and Lee, J. (2020). N-gcn: Multi-scale diet and cyclosporine nephrotoxicity: changes in kidney cell metabolism. Journal
graph convolution for semi-supervised node classification. In R. P. Adams and of proteome research, 11(11), 5135–5144.
V. Gogate, editors, Proceedings of The 35th Uncertainty in Artificial Intelligence Koziolek, M., Alcaro, S., Augustijns, P., Basit, A. W., Grimm, M., Hens, B., Hoad,
Conference, volume 115 of Proceedings of Machine Learning Research, pages C. L., Jedamzik, P., Madla, C. M., Maliepaard, M., et al. (2019). The mechanisms
841–851. PMLR. of pharmacokinetic food-drug interactions–a perspective from the ungap group.
Ahmed, A., Shervashidze, N., Narayanamurthy, S., Josifovski, V., and Smola, A. J. European Journal of Pharmaceutical Sciences, 134, 31–59.
(2013). Distributed large-scale natural graph factorization. In Proceedings of the Li, Z., Liu, Z., Huang, J., Tang, G., Duan, Y., Zhang, Z., and Yang, Y. (2019). Mv-
22nd international conference on World Wide Web. gcn: multi-view graph convolutional networks for link prediction. IEEE Access,
Belkin, M. and Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction 7, 176317–176328.
and data representation. MIT Press. Liben-Nowell, D. and Kleinberg, J. (2003). The link prediction problem for social
Bordea, G., Nikiema, J., Griffier, R., Hamon, T., and Mougin, F. (2020). Fideo: Food networks. In Proceedings of the twelfth international conference on Information
interactions with drugs evidence ontology. In 11th International Conference on and knowledge management, pages 556–559.
Biomedical Ontologies. Long, Q., Jin, Y., Song, G., Li, Y., and Lin, W. (2020). Graph structural-topic neural
Bushra, R., Aslam, N., and Khan, A. Y. (2011). Food-drug interactions. Oman network. In Proceedings of the 26th ACM SIGKDD International Conference on
medical journal, 26(2), 77. Knowledge Discovery & Data Mining, pages 1065–1073.
Cao, S., Lu, W., and Xu, Q. (2015). Grarep: Learning graph representations with Lurie, Y., Loebstein, R., Kurnik, D., Almog, S., and Halkin, H. (2010). Warfarin
global structural information. In Proceedings of the 24th ACM international on and vitamin k intake in the era of pharmacogenetics. British journal of clinical
conference on information and knowledge management, pages 891–900. pharmacology, 70(2), 164–170.
Chen, M., Zhou, S.-y., Fabriaga, E., Zhang, P.-h., and Zhou, Q. (2018). Food- Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient estimation of word
drug interactions precipitated by fruit juices other than grapefruit juice: An update representations in vector space. Computer Science.
review. Journal of food and drug analysis, 26(2), S61–S71. Neuvonen, P. (1976). Interactions with the absorption of tetracyclines. Drugs, 11(1),
Chen, X., Liu, X., and Wu, J. (2020). Gcn-bmp: Investigating graph representation 45–54.
learning for ddi prediction task. Methods, 179, 47–54. Ni, Y., Jensen, K., Kouskoumvekaki, I., and Panagiotou, G. (2017). Nutrichem
Fan, W., Ma, Y., Li, Q., He, Y., Zhao, E., Tang, J., and Yin, D. (2019). Graph neural 2.0: exploring the effect of plant-based foods on human health and drug efficacy.
networks for social recommendation. In The world wide web conference, pages Database The Journal of Biological Databases and Curation, 2017.
417–426. Nyamabo, A. K., Yu, H., and Shi, J.-Y. (2021). Ssi–ddi: substructure–substructure
Feng, Y.-H., Zhang, S.-W., and Shi, J.-Y. (2020). Dpddi: a deep predictor for drug- interactions for drug–drug interaction prediction. Briefings in Bioinformatics,
drug interactions. BMC bioinformatics, 21(1), 1–15. 22(6), bbab133.
Fleuren, W. W. and Alkema, W. (2015). Application of text mining in the biomedical Otles, S. and Senturk, A. (2014). Food and drug interactions: a general review. Acta
domain. Methods, 74, 97–106. Scientiarum Polonorum Technologia Alimentaria, 13(1), 89–102.
Gómez-Bombarelli, R., Wei, J. N., Duvenaud, D., Hernández-Lobato, J. M., Ou, M., Cui, P., Pei, J., Zhang, Z., and Zhu, W. (2016). Asymmetric
Sánchez-Lengeling, B., Sheberla, D., Aguilera-Iparraguirre, J., Hirzel, T. D., transitivity preserving graph embedding. In Proceedings of the 22nd ACM
Adams, R. P., and Aspuru-Guzik, A. (2018). Automatic chemical design using SIGKDD international conference on Knowledge discovery and data mining, pages
a data-driven continuous representation of molecules. ACS central science, 4(2), 1105–1114.
268–276. Peng, J., Hui, W., Li, Q., Chen, B., Hao, J., Jiang, Q., Shang, X., and Wei, Z.
Grover, A. and Leskovec, J. (2016). node2vec: Scalable feature learning for networks. (2019). A learning-based framework for mirna-disease association identification
In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge using neural networks. Bioinformatics, 35(21), 4364–4371.
discovery and data mining, pages 855–864. Peng, J., Wang, Y., Guan, J., Li, J., Han, R., Hao, J., Wei, Z., and Shang, X. (2021).
Hamon, T., Tabanou, V., Mougin, F., Grabar, N., and Thiessard, F. (2017). Pomelo: An end-to-end heterogeneous graph representation learning-based framework for
Medline corpus with manually annotated food-drug interactions. In Proceedings drug–target interaction prediction. Briefings in Bioinformatics, 22(5), bbaa430.
of the Biomedical NLP Workshop associated with RANLP 2017, pages 73–80. Perozzi, B., Al-Rfou, R., and Skiena, S. (2014). Deepwalk: Online learning of
He, X., Deng, K., Wang, X., Li, Y., Zhang, Y., and Wang, M. (2020). Lightgcn: social representations. In Proceedings of the 20th ACM SIGKDD international
Simplifying and powering graph convolution network for recommendation. In conference on Knowledge discovery and data mining, pages 701–710.
Proceedings of the 43rd International ACM SIGIR conference on research and Pletscher-Frankild, S., Pallejà, A., Tsafou, K., Binder, J. X., and Jensen, L. J. (2015).
development in Information Retrieval, pages 639–648. Diseases: Text mining and data integration of disease–gene associations. Methods,
Herrero-Zazo, M., Segura-Bedmar, I., Martínez, P., and Declerck, T. (2013). The 74, 83–89.
ddi corpus: An annotated corpus with pharmacological substances and drug–drug Rendle, S., Freudenthaler, C., Gantner, Z., and Schmidt-Thieme, L. (2012). Bpr:
interactions. Journal of biomedical informatics, 46(5), 914–920. Bayesian personalized ranking from implicit feedback. AUAI Press.
Hollander, A. A., van Rooij, J., Lentjes, E. G., Arbouw, F., van Bree, J. B., Ribeiro, L. F., Saverese, P. H., and Figueiredo, D. R. (2017). struc2vec: Learning node
Schoemaker, R. C., van Es, L. A., van der Woude, F. J., and Cohen, A. F. (1995). The representations from structural identity. In Proceedings of the 23rd ACM SIGKDD
effect of grapefruit juice on cyclosporine and prednisone metabolism in transplant international conference on knowledge discovery and data mining, pages 385–394.
patients. Clinical Pharmacology & Therapeutics, 57(3), 318–324. Ryu, J. Y., Kim, H. U., and Lee, S. Y. (2018). Deep learning improves prediction of
Huang, K., Xiao, C., Hoang, T., Glass, L., and Sun, J. (2020). Caster: Predicting drug–drug and drug–food interactions. Proceedings of the National Academy of
drug interactions with chemical substructure representation. In Proceedings of the Sciences, 115(18), E4304–E4311.
AAAI Conference on Artificial Intelligence, volume 34, pages 702–709. Schmidt, L. E. and Dalhoff, K. (2002). Food-drug interactions. Drugs, 62(10),
Jaeger, S., Fulle, S., and Turk, S. (2018). Mol2vec: unsupervised machine learning 1481–1502.
approach with chemical intuition. Journal of chemical information and modeling, Shanfeng, Z., Yasushi, O., Gozoh, T., and Hiroshi, M. (2005). A probabilistic
58(1), 27–35. model for mining implicit ’chemical compound–gene’ relations from literature.
Jensen, K., Ni, Y., Panagiotou, G., and Kouskoumvekaki, I. (2015). Developing Bioinformatics.
a molecular roadmap of drug-food interactions. PLOS Computational Biology, Sun, M., Zhao, S., Coryandar, G., Olivier, E., Zhou, J., and Wang, F. (2019).
11(2), e1004048. Graph convolutional networks for computational drug development and discovery.
Jiang, H., Cao, P., Xu, M., Yang, J., and Zaiane, O. (2020). Hi-gcn: A hierarchical Briefings in Bioinformatics.
graph convolution network for graph embedding learning of brain network and Sun, M., Wang, F., Elemento, O., and Zhou, J. (2020). Structure-based drug-drug
brain disorders prediction. Computers in Biology and Medicine, 127, 104096. interaction detection via expressive graph convolutional networks and deep sets
Junge, A. and Jensen, L. J. (2020). Cocoscore: context-aware co-occurrence scoring (student abstract). In Proceedings of the AAAI Conference on Artificial Intelligence,
for text mining applications using distant supervision. Bioinformatics, 36(1), 264– volume 34, pages 13927–13928.
271. Szklarczyk, D., Gable, A. L., Lyon, D., Junge, A., Wyder, S., Huerta-Cepas, J.,
Kingma, D. and Ba, J. (2014). Adam: A method for stochastic optimization. Simonovic, M., Doncheva, N. T., Morris, J. H., Bork, P., et al. (2019). String
Computer Science. v11: protein–protein association networks with increased coverage, supporting
KipF, T. N. and Welling, M. (2016). Semi-supervised classification with graph functional discovery in genome-wide experimental datasets. Nucleic acids
convolutional networks. research, 47(D1), D607–D613.
Kipf, T. N. and Welling, M. (2016). Variational graph auto-encoders. arXiv preprint Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., and Mei, Q. (2015). Line: Large-
arXiv:1611.07308. scale information network embedding. In Proceedings of the 24th international
Klawitter, J., Klawitter, J., Schmitz, V., Brunner, N., Crunk, A., Corby, K., Bendrick- conference on world wide web, pages 1067–1077.
Peart, J., Leibfritz, D., Edelstein, C. L., Thurman, J. M., et al. (2012). Low-salt
10 Sample et al.

Telessy, I. (2018). Let’s keep an eye on food-drug interaction. Food Nutr J: FDNJ- and drug targets. Nucleic Acids Research, 36(suppl1 ), D901 − −D906.
187. DOI, 10, 2575–7091. Xie, Y., Yao, C., Gong, M., Chen, C., and Qin, A. K. (2020). Graph convolutional
Wang, D., Cui, P., and Zhu, W. (2016). Structural deep network embedding. In networks with multi-level coarsening for graph classification. Knowledge-Based
Proceedings of the 22nd ACM SIGKDD international conference on Knowledge Systems, 194, 105578.
discovery and data mining, pages 1225–1234. Xu, N., Wang, P., Chen, L., Tao, J., and Zhao, J. (2019). Mr-gnn: Multi-resolution
Wang, H., Jia, W., Wang, J., Miao, Z., and Guo, M. (2017). Graphgan: Graph and dual graph neural network for predicting structured entity interactions. arXiv
representation learning with generative adversarial nets. IEEE Transactions on preprint arXiv:1905.09558.
Knowledge and Data Engineering, PP(99). Yi, H.-C., You, Z.-H., Huang, D.-S., and Kwoh, C. K. (2022). Graph representation
Wang, T., Peng, Q., Liu, B., Liu, Y., and Wang, Y. (2020). Disease learning in bioinformatics: trends, methods and applications. Briefings in
Module Identification Based on Representation Learning of Complex Networks Bioinformatics, 23(1), bbab340.
Integrated From GWAS, eQTL Summaries, and Human Interactome. Frontiers in Yin, Q., Cao, X., Fan, R., Liu, Q., Jiang, R., and Zeng, W. (2022). Deepdrug: A
Bioengineering and Biotechnology, 8, 418. general graph-based deep learning framework for drug-drug interactions and drug-
Wang, T., Shao, Z., Xiao, Y., Zhang, X., Chen, Y., Shi, B., Chen, S., Wang, Y., target interactions prediction. biorxiv, pages 2020–11.
Peng, J., and Shang, X. (2021). Predicting hepatoma-related genes based on You, J., McLeod, R. D., and Hu, P. (2019). Predicting drug-target interaction network
representation learning of ppi network and gene ontology annotations. In 2021 using deep learning model. Computational Biology and Chemistry, 80, 90–101.
IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages Yue, X., Wang, Z., Huang, J., Parthasarathy, S., Moosavinasab, S., Huang, Y., Lin,
1892–1898. IEEE. S. M., Zhang, W., Zhang, P., and Sun, H. (2020). Graph embedding on biomedical
Wen, Zhang, Xiang, Yue, Guifeng, Tang, Wenjian, Wu, Feng, and Huang (2018). networks: methods, applications and evaluations. Bioinformatics, 36(4), 1241–1251.
Sfpel-lpi: Sequence-based feature projection ensemble learning for predicting Zhao, T., Hu, Y., Valsdottir, L. R., Zang, T., and Peng, J. (2020). Identifying drug–target
lncrna-protein interactions. Plos Computational Biology. interactions based on graph convolutional network and deep neural network. Briefings
Wishart, D. S., Knox, C., Guo, A. C., Shrivastava, S., Hassanali, M., in Bioinformatics.
Stothard, P., Chang, Z., and Woolsey, J. (2006). DrugBank: a comprehensive Zitnik, M., Agrawal, M., and Leskovec, J. (2018). Modeling polypharmacy side effects
resource for in silico drug discovery and exploration. Nucleic Acids Research, with graph convolutional networks. Bioinformatics, 34(13), i457–i466.
34(suppl1 ), D668 − −D672.
Wishart, D. S., Knox, C., Guo, A. C., Cheng, D., Shrivastava, S., Tzur, D., Gautam,
B., and Hassanali, M. (2007). DrugBank: a knowledgebase for drugs, drug actions

DFinder A Novel End-To-End Graph Embedding-Based M

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DFinder A Novel End-To-End Graph Embedding-Based M

Uploaded by

Copyright:

Available Formats

Downloaded from https://academic.oup.com/bioinformatics/advance-article/doi/10.

1093/bioinformatics/btac837/6965015 by Uniwersytet Warminsko-Mazurski w Olsztynie user on 29 December 2022

DFinder: A novel end-to-end graph

© The Author(s) 2022. Published by Oxford University Press.

Downloaded from https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btac837/6965015 by Uniwersytet Warminsko-Mazurski w Olsztynie user on 29 December 2022

Downloaded from https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btac837/6965015 by Uniwersytet Warminsko-Mazurski w Olsztynie user on 29 December 2022

Fig. 1. The flowchart of co-occurrence-based text mining method. Firstly, we obtain a

identifying entity associations. The technique is based on the assumption

Downloaded from https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btac837/6965015 by Uniwersytet Warminsko-Mazurski w Olsztynie user on 29 December 2022

2.2 Computational Framework for DFI prediction K

Downloaded from https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btac837/6965015 by Uniwersytet Warminsko-Mazurski w Olsztynie user on 29 December 2022

… Concatenate ܼ஽భ ٖ ‫ݕ‬ො஽భிయ

ܵܵܲ஽భ ‫ܣ‬஽భ ܼ஽భ

is an interaction between Di and Fj can be calculated as: 3.1 Experimental settings

Downloaded from https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btac837/6965015 by Uniwersytet Warminsko-Mazurski w Olsztynie user on 29 December 2022

Table 3. Performance comparison between DFinder and seven DDI-prediction

Downloaded from https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btac837/6965015 by Uniwersytet Warminsko-Mazurski w Olsztynie user on 29 December 2022

Downloaded from https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btac837/6965015 by Uniwersytet Warminsko-Mazurski w Olsztynie user on 29 December 2022

medline corpus with manually annotated food-drug interactions (Hamon

Downloaded from https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btac837/6965015 by Uniwersytet Warminsko-Mazurski w Olsztynie user on 29 December 2022

Downloaded from https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btac837/6965015 by Uniwersytet Warminsko-Mazurski w Olsztynie user on 29 December 2022

You might also like