You are on page 1of 10

IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, VOL. 16, NO.

3, MAY/JUNE 2019 841

A Multimodal Deep Neural Network for Human


Breast Cancer Prognosis Prediction by
Integrating Multi-Dimensional Data
Dongdong Sun, Minghui Wang, and Ao Li

Abstract—Breast cancer is a highly aggressive type of cancer with very low median survival. Accurate prognosis prediction of breast
cancer can spare a significant number of patients from receiving unnecessary adjuvant systemic treatment and its related expensive
medical costs. Previous work relies mostly on selected gene expression data to create a predictive model. The emergence of deep
learning methods and multi-dimensional data offers opportunities for more comprehensive analysis of the molecular characteristics of
breast cancer and therefore can improve diagnosis, treatment, and prevention. In this study, we propose a Multimodal Deep Neural
Network by integrating Multi-dimensional Data (MDNNMD) for the prognosis prediction of breast cancer. The novelty of the method lies
in the design of our method’s architecture and the fusion of multi-dimensional data. The comprehensive performance evaluation results
show that the proposed method achieves a better performance than the prediction methods with single-dimensional data and other
existing approaches. The source code implemented by TensorFlow 1.0 deep learning library can be downloaded from the Github:
https://github.com/USTC-HIlab/MDNNMD.

Index Terms—Breast cancer prognosis prediction, multimodal deep neural network, multi-dimensional data

1 INTRODUCTION

B REAST cancer is the most highly aggressive cancer and a


major health problem in females, and a leading cause of
cancer-related deaths worldwide [1], [2]. According to the
knowledge to assist with clinical decision making [11], [12],
establish patients’ eligibility for care programmes [13],
design and analysis of clinical trials. In addition, when
estimates of American Cancer Society, more than 250,000 patients are predicted to be short term survivors, clinicians
new cases of invasive breast cancer will be diagnosed among can provide patients with the opportunity to consider
females and approximately 40,000 cancer deaths expected in whether they want to be cared for and allow them time to
2017 [3]. This heterogeneous disease is characterized by var- take practical steps to prepare for their own deaths [7].
ied molecular feature, clinical behavior, morphological During the past several decades, with the help of rapid
appearance and disparate response to therapy [4], [5]. Also, development in high throughput technologies of microar-
the complexity among invasive breast cancer and its signifi- rays and gene expression analysis, there have been a num-
cantly varied clinical outcomes now make it extremely diffi- ber of efforts contributed to the understanding of the
cult to predict and treat [6]. Therefore, to the ability of molecular signatures of breast cancer based on gene expres-
predicting cancer prognosis more accurately not only could sion patterns in the previous literature. One of the primary
help breast cancer patients know about their life expectancy, studies to effectively predict breast cancer prognosis via
but also help clinicians make informed decisions and further gene expression profiles is conducted by van de Vijver and
guide appropriate therapy. Meanwhile, prognostication colleagues [8]. They identify 70 gene prognostic signatures
plays an important role in clinical works for all clinicians, from 98 primary breast cancer patients by clustering the
particularly those clinicians working with short term survi- gene expression profile data and correlating them with
vor. When a reasonably accurate estimation of prognosis is prognostic values. Also, they utilize an additional indepen-
available, clinicians often utilize prognosis prediction dent data of 19 young breast cancer patients to validate the
prognosis classifier and the results on validation set con-
 D. Sun is with the School of Information Science and Technology, University
firmed the power of the 70 optimal genes. Later, 76 gene
of Science and Technology of China, Hefei, Anhui 230027, China. prognostic signatures in a training set of 115 tumors are fur-
E-mail: sddchina@mail.ustc.edu.cn. ther identified, and are successfully used as broadly appli-
 M. Wang and A. Li are with the School of Information Science and Tech- cable biomarkers to accurately predict distant metastases of
nology, and Centers for Biomedical Engineering, University of Science
and Technology of China, Hefei, Anhui 230027, China. lymph node-negative primary breast cancer [9]. These sig-
E-mail: {mhwang, aoli}@ustc.edu.cn. natures obtain 48 percent specificity and 93 percent sensitiv-
Manuscript received 28 May 2017; revised 11 Feb. 2018; accepted 12 Feb. ity in a subsequent independent data of 171 lymph-node-
2018. Date of publication 15 Feb. 2018; date of current version 31 May 2019. negative patients. Since breast cancer is a genetic disease,
(Corresponding author: Minghui Wang.) the integration of gene expression profile data and clinical
For information on obtaining reprints of this article, please send e-mail to:
reprints@ieee.org, and reference the Digital Object Identifier below.
data can potentially improve the accuracy of prediction
Digital Object Identifier no. 10.1109/TCBB.2018.2806438 model for prognosis and diagnosis [10]. On the basis of 70
1545-5963 ß 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See ht_tp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on May 19,2022 at 04:08:08 UTC from IEEE Xplore. Restrictions apply.
842 IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, VOL. 16, NO. 3, MAY/JUNE 2019

gene prognostic signatures, Sun et al. further identify a diverse fields such as computer vision [19], [20], speech rec-
hybrid signature through the composition of both genetic ognition [21] and bioinformatics [22], [23]. Besides, consid-
signatures and clinical markers for the prediction of breast ering the fact that methods based on a single source of
cancer prognosis [11]. The results indicate that the identified information usually suffer from limitations such as lack of
hybrid signatures could significantly improve prognostic non-universality, uniqueness, and noisy data, multimodal
specificity compared with the existing gene signatures and learning is proposed to solve these problems by combining
clinical systems, respectively. Gevaert et al. propose a prob- related information from multiple sources to obtain a final
abilistic model based on Bayesian Network (BN) for prog- decision [24], [25]. Multimodal deep learning as a kind of
nosis of lymph node negative breast cancer by integrating multimodal learning successfully employs deep learning
clinical and genomic data [12]. Given the wide range of method and has been widely employed in a lot of applica-
applications of gene signature dataset, we can find that tions such as computer visual recognition [26], multimedia
using gene expression profiles can accurately predict prog- analysis [25], and speech recognition [27]. For example,
nosis in breast cancer. However, these aforementioned stud- Srivastava et al. use multimodal learning with Deep Boltz-
ies are designed to work with a handful of gene expression mann Machine (DBM) to joint space of image and text and
profile data and under the assumption of independence demonstrate that model significantly outperforms than
between different genes in hypothesis testing. SVM and LDA methods [28] on discriminative tasks [29]. In
In fact, microarray data is high-dimensional which con- addition, Kahou et al. propose a method called EmoNets for
sists of approximately 25000 genes per patient and different emotion recognition in video [30], which is the winning
genes may have potential relations between each other, submission in the 2013 EmotiW challenge.
which may improve the accuracy of breast cancer prognosis Inspired by the successful application of deep learning
prediction. Xu et al. propose an efficient feature selection method nowadays and the large contribution by employing
method based on support vector machine (SVM) for breast multi-dimensional data for cancer prognosis prediction, in
cancer prognosis prediction and achieve a superior predic- this study, we propose a novel Multimodal Deep Neural Net-
tion performance than that of the widely used 70 gene sig- work by integrating Multi-dimensional Data (MDNNMD)
natures [13]. Nguyen et al. introduce a method for breast for human breast cancer prognosis prediction. MDNNMD is
cancer prognosis prediction based on random forest (RF) an efficient method to integrate multi-dimensional data
combined with feature selection and achieve the highest including gene expression profile, copy number alteration
classification accuracy than previous methods [14]. Kha- (CNA) profile and clinical data with a score level fusion at the
demi et al. propose a probabilistic graphical model (PGM) final prediction results. This method considers the heteroge-
[10] by integrating two independent models of microarray neity among different data types and makes full use of
and clinical data for prognosis and diagnosis of breast can- abstract high-level representation from each data source.
cer [10]. They first apply Principal Component Analysis Therefore similar to the definition of “multimodal deep belief
(PCA) to reduce the dimensionality of microarray data and network” in previous work of cancer data analysis problem
construct a deep belief network to extract feature represen- [31], we named our method as “multimodal deep neural
tation of the data. Meanwhile, they also apply a structure network”. The results of ten-fold cross validation experiment
learning algorithm to the clinical data. show that MDNNMD achieves an overall better performance
In addition to the success achieved by the aforementioned than the prediction methods with single-dimensional data
approaches, some novel methods have been proposed by and existing research approaches: support vector machine
integrating multi-dimensional data in the area of human can- (SVM), random forest (RF) and logistic regression (LR). We
cer related prediction. Hayes et al. identify relevant micro- also demonstrate the feasibility of the multimodal deep neu-
RNA and mRNA signatures for predicting high-risk and ral network and the usefulness of the multi-dimensional data
low-risk patients in glioblastoma (GBM) [15]. Zhang et al. in breast cancer prognosis prediction.
propose a multiple kernel machine learning method by fus-
ing different types of data for the GBM prognosis prediction
[16]. It is not surprising that the good performance is
2 METHODS AND MATERIALS
achieved when considering multiple data, which are differ- 2.1 Materials
ent features that are widely used in cancer prognosis predic- We use the METABRIC dataset which is available at http://
tion. However, most of these methods directly combine www.cbioportal.org/study?id¼brca_metabric#summary.
different types of data into model generation, and ignore The dataset is extracted from 1,980 valid breast cancer
that the features from different modalities (e.g., gene signa- patients’ data of the METABRIC trial [32], which contains
ture and clinical) may have different representations. With multi-dimensional data among breast cancer such as gene
the recent advances of next generation sequencing technolo- expression profile, CNA profile and clinical information.
gies, it is now rapidly and extensively informing breast can- The median age at diagnosis is 61 and the average survival
cer diagnosis and prognosis based on multiple omics time of all patients is 125.1 months. Among them, 491 patients
including gene expression profiles, clinical information and are regarded as short term survivors (less than 5 year sur-
DNA copy number alteration [17], [18]. Accordingly, based vival) and 1,489 patients are regarded as long term survivors
on the accelerated development of multiple omics data, there (more than 5 year survival). For classification labels in our
are urgent needs to develop efficient computational methods work, the short term patients are labeled as 0 and long term
for accurately predicting prognosis of human breast cancer. patients are labeled as 1. The properties of our dataset are
Recently, deep learning method has been an emerging listed in Table 1. Each patient has 27 clinical features, includ-
methodology and has shown its superior performance in ing age at diagnosis, size, lymph nodes positive, grade etc.

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on May 19,2022 at 04:08:08 UTC from IEEE Xplore. Restrictions apply.
SUN ET AL.: A MULTIMODAL DEEP NEURAL NETWORK FOR HUMAN BREAST CANCER PROGNOSIS PREDICTION BY INTEGRATING... 843

TABLE 1 TABLE 2
The Overall Information of METABRIC The Properties of the Dataset
Breast Cancer Dataset
Data Category Number Feature Number
Cut-off (years) 5
Clinical 27 25
Total population 1980
Gene Expression 24368 400
Long time survivors 1489
Copy Number 26298 200
Short time survivors 491
Median age in diagnosis 61
Average survival (months) 125.1 gk ¼ W k hk1 þ bk ; 1  k  N (1)
 
hk ¼ f gk ; (2)
Here, 25 clinical features are selected as our final clinical fea-
ture data. For gene expression profile data and CNA profile where, W k is the kth weight matrix between ðk  1Þth layer
data, missing values are estimated using a weighted nearest and kth layer. bk is the bias vector for the kth layer. N is the
neighbors algorithm [33]. According to Gevaert et al. [7], the number of layers (here N ¼ 5, including output layer) and
gene expression features are normalized and further discre- hyperbolic tangent (TANH) activation fðÞ is used to hidden
tized into three categories: under-expression (1), over- units, which naturally captures the nonlinear relations
expression (1) and baseline (0). For clinical data, all features within the data. Simultaneously softmax function for output
are normalized into the range [0, 1] by min-max normaliza- layer (N th layer) is used as activation function in DNN
tion [8]. As for CNA features, we directly utilize the original structure and defined as:
data with five discrete values (2, 1, 0, 1, 2).  
N exp hN
o ¼P  : (3)
2.2 Feature Selection N
j exp hj
A common problem in using high throughput sequencing
datasets is the so called “curse of dimensionality” [34] for Afterwards, we initialize the weights between each layer
human cancer prognosis prediction. In our work, the gene using normalized initialization proposed by Glorot and
expression profile data includes approximately 24,000 genes Bengio [39] and the biases are initialized with small num-
and CNA profile data contains approximately 26,000 genes bers (such as 0.1). The weights between layers are initialized
in the METABRIC dataset. The high dimensionality and from a truncated normal distribution defined by:
small sample size of the dataset may lead to bad results for  rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
deep learning methods [35]. It is well known that feature 2 2
WT  ; ; (4)
selection plays a key role in the success of a learning algo- ni þ no ni þ no
rithm in the problems involving a large number of features.
where ni , no denote the number of input and output of the
mRMR [36] is one of the most common dimensionality
units, respectively. Given the fact that breast cancer progno-
reduction algorithm in a wide range of applications [16],
sis prediction task in our study is a binary classification task
[37], [38]. Therefore, we apply mRMR feature selection
(long-term survival and short-term survival), we define cross
method to select the features from the original dataset and
entropy loss as objective function for the DNN model in final
reduce the dimensionality of dataset without a significant
output layer. In addition, to further prevent overfitting of
loss of information. Then, we use the area under curve
our deep learning model, L2 regularization is also added
(AUC) value (see Experimental Design) as the criteria to
into our loss function, which is widely used in deep learning
evaluate the performance of the features as previous work
studies [40], [41]. Finally, our proposed DNN method aims
[3]. In detail, we roughly search the best N features from
at minimizing the loss function and is defined as:
100 to 500 with a step size of 100. Finally, 400 genes from
gene expression profile data and 200 genes from CNA pro-
1 X
N
file data are selected as features to our MDNNMD model. Lðyt ; y^t Þ ¼ ½yt ðiÞlog y^t ðiÞ  ð1  yt ðiÞÞlog ð1  y^t ðiÞÞ
N i¼0
The detailed properties of multi-dimensional data used in (5)
1 XK X nk X
mk
this study are listed in Table 2. 2
þ  wk ;
2 k¼1 j¼1 i¼1 ij
2.3 A Deep Neural Network Prediction Model for a
Single Dataset where L measures errors between predictive scores and the
In this study, a deep neural network (DNN) is used for pre- actual labels. yt ðiÞ is the actual label for the ith class, y^t ðiÞ is
dicting the prognosis of human breast cancer. The DNN the predictive scores obtained from the output layer of our
architectures build a hierarchy from the hidden layers. method. N is the batch size. W k ¼ fwkij gmk nk is the kth
Higher level features are extracted implicitly by the combi- weight matrix and K is the number of weight matrix in
nation of lower level features from each layer. Here, a DNN DNN model (here K ¼ 5).
model is composed of an input layer, multiple hidden layers A common issue in training a DNN model is named
and an output layer. Units between layers are all fully con- “internal-covariate-shift”, which is that input distributions
nected. The input layer with an input vector x consists of change in each layer during training due to the update of
one or multi-dimensional data. The output hkj for layer k parameters from previous layers. In 2015, a novel work
including j units is calculated from the weighted sum of the called batch normalization [42] is proposed by Google to
outputs for the previous layer hk1 (specially h0 ¼ x). solve the aforementioned problem, which allows us to use

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on May 19,2022 at 04:08:08 UTC from IEEE Xplore. Restrictions apply.
844 IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, VOL. 16, NO. 3, MAY/JUNE 2019

TABLE 3 learning rate is selected from 101 to 105 using a magnifica-


Detailed Parameter Configurations of DNN Model tion of 101 . The optimal parameters are chosen by the
parameter combination leading to the best performance
# of hidden layers 4
#of hidden units in 1,2,3,4 [1000, 500, 500, 100] (AUC value) [16], [44]. Finally, we obtain the best perfor-
hidden layer mance with the optimal parameter combination including 4
Initial learning rate 103 hidden layers with 1000, 500, 500 and 100 units, and the size
Mini-batch size 64 of mini-batch and initial learning rate are set to 64 and 103 ,
Training epoch 10-100
qffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffi respectively. The detail parameter lists used in our DNN
2 2
Weights initial range ½ ni þn o
; ni þno  model are described in Table 3.
Batch normalization epsilon 103
Activation function Hyperbolic tangent (TANH) 2.4 MDNNMD Prediction Model for Multi-Dimension
Loss function error Lðyt ; y^t Þ Data
PN An important issue in our study is integrating multi-dimen-
¼ 1
N ^t ðiÞ
i¼0 ½yt ðiÞlog y
sional data including gene expression profile, CNA profile
ð1  yt ðiÞÞlog ð1  y^t ðiÞÞ and clinical data. One of the most straightforward appro-
P Pnk Pmk k 2
þ 12  K
k¼1 j¼1 i¼1 wij aches for discriminative tasks is to train only one DNN
model for all multi-dimensional data. However, different
data may have different feature representation, and directly
combining the three sources of data as an input of a DNN
higher learning rates and be less careful about weights ini- model may not be efficient [10]. We address this problem by
tialization. As expected, the batch normalization is very sig- proposing a multimodal DNN model which efficiently inte-
nificant to optimize our DNN model and obtains a good grates multi-dimensional data. Fig. 1 illustrates the struc-
result. Finally, a DNN model employed in our work com- ture of MDNNMD method. First, we preprocess multi-
prises one input layer, four hidden layers and an output dimensional data of breast cancer, which includes three
layer. A batch normalization is added to each hidden layer sub-data: gene expression, CNA and clinical data. Second,
and a dropout [43] is added before the output layer. In our we use feature selection method to reduce the number of
study, we use a grid search strategy provided by Chen et al. variables for gene expression and CNA data. Third, a triple
[22] to find optimal parameters. In detail, we search the modal DNN is proposed to extract effectively information
number of hidden layers from 1 to 5 increasing by incre- from different data, respectively. Therefore, we greedily
ments of 1. Each hidden layer contains 100, 500, 1,000 or train each DNN model corresponding to each sub-data.
3,000 units. As to mini-batch size, we also search the optimal Finally, our proposed method conducts a score level fusion
value ranging from 32 to 128 with step size of 32. The initial from each independent model. The combined output of

Fig. 1. The overall process of our MDNNMD model for the breast cancer prognosis prediction. The prediction model consists of three independent
models corresponding to each data and finally joints predictive scores from each independent model.

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on May 19,2022 at 04:08:08 UTC from IEEE Xplore. Restrictions apply.
SUN ET AL.: A MULTIMODAL DEEP NEURAL NETWORK FOR HUMAN BREAST CANCER PROGNOSIS PREDICTION BY INTEGRATING... 845

MDNNMD based on a weighted linear aggregation [45], TP


[46] is calculated as: Pre ¼ (11)
TP þ FP
oDNNMD ¼ a oDNNExpr þ b oDNNCNA
(6) TP  TN  FP  FN
þ g oDNNClinical Mcc ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ; (12)
ðTP þ FN Þ  ðTP þ FP Þ  ðTN þ FN Þ  ðTN þ FP Þ

s:t: a þ b þ g ¼ 1; a
0; b
0; g
0; (7) where TP , FP , TN and FN stand for true positive, false pos-
itive, true negative and false negative, respectively.
where the parameters a; b; g are three weight coefficients
used to balance the contribution for each DNN model. In 2.6 Other Prediction Methods for Comparison
this study, MDNNMD chooses the optimal parameters for To verify the benefit of multimodal DNN by integrating
the parameters of different sub-DNN models, alpha, beta multi-dimensional data, DNN based methods with single-
and gamma according to the best prediction performance dimensional data are examined for the prognosis prediction
by using validation set (see Experimental Design). We of breast cancer in this study. The main difference between
screen different combinations of a; b; g by a step 0.1 and these methods and MDNNMD is that they do not integrate
finally select a ¼ 0:3, b ¼ 0:1 and g ¼ 0:6 for METABRIC multi-dimensional data and only use the data type in one
dataset. MDNNMD is implemented based on TensorFlow type. For simplicity, the DNN based methods that use sin-
1.0 deep learning library [47] which is an open-source soft- gle-dimensional data of gene expression profile data, clini-
ware library for Machine Intelligence. Training is deployed cal data, and CNA profile data are thereafter termed as
with two Nvidia GTX TITAN Z graphics cards. The source DNN-Expr, DNN-Clinical and DNN-CNA, respectively.
code of MDNNMD is publicly available at https://github. In order to show the effectiveness of multimodal deep
com/USTC-HIlab/MDNNMD. learning method in prognosis prediction of breast cancer,
we employ three widely used methods as classifiers in
2.5 Experimental Design human breast cancer prognosis prediction, including sup-
To comprehensively evaluate our proposed method, we use port vector machines (SVM) [13], random forest (RF) [14]
ten-fold cross validation experiment in consistent with previ- and logistic regression (LR) [49] for comparison. In those
ous existing studies of cancer prognosis prediction [16], [48]. algorithms, multi-dimensional data are regarded as feature
Specifically, the patients in our experiment are randomized vector to train the model. The performance is also evaluated
into ten subsets. For every round, nine of those ten subsets by ten-fold cross validation process.
are further divided into training (80 percent) and validation
(20 percent) sets [1], while the remaining one subset is uti- 3 RESULTS
lized as testing set. In this way, we obtain the prediction
scores of each testing subset after ten rounds and then merge 3.1 Comparison of DNN Based Methods with
them as an overall prediction scores. Besides, in our study, Multiple and Single Dimensional Data
MDNNMD does not optimize the model configurations and In order to confirm the effectiveness of multi-dimensional
weight coefficients simultaneously. First we search different data, we first adopt deep learning method on different single
configurations and use single domain training set to train data type to predict breast cancer prognosis. We compare the
each sub-DNNs (weights and bias), and avoid overfitting by performance of MDNNMD with DNN-Expr, DNN-Clinical
using the validation set. Second we choose the optimal con- and DNN-CNA. The ROC curves are plotted for four differ-
figuration parameters by using the AUC value as the criteria. ent methods at each specificity level and displayed in Fig. 2a.
Third, after the sub-DNNs are trained, we screen different As shown in Fig. 2a, MDNNMD achieves better overall per-
combinations of these coefficients (alpha, beta, and gamma) formance than those of the single-dimensional data based
until the classification performance (AUC value) on the vali- methods. Besides the ROC curve, the corresponding AUC
dation set reaches maximum. value for each method is also calculated and displayed in
For performance evaluation, we plot receiver operating Fig. 2b. It is indicated that MDNNMD is consistently better
characteristic (ROC) curve, which shows the interplay than DNN-Expr, DNN-Clinical and DNN-CNA. The AUC
between sensitivity and 1-specificity by varying a decision value (showed in Fig. 2b) of MDNNMD (0.845) is 8.4 percent,
threshold, and computes the AUC. The evaluation metric, 3.8 percent and 23.1 percent higher than those of DNN-Expr,
Sensitivity (Sn), Specificity (Sp), Accuracy (Acc), Precision DNN-Clinical and DNN-CNA, respectively. Finally, we plot
(Pre) and Matthew’s correlation coefficient (Mcc) are also both training loss and validation loss in Supplementary
used for performance evaluation and are defined in the fol- Figure S1 by using TensorBoard which is a visualization tool
lowing equations: in TensorFlow library.
At the same time, by following the study of Fan et al. [50],
TP two stringency levels of medium (Sp ¼ 95:0 percent with cor-
Sn ¼ (8) responding threshold of 0.443) and high (Sp ¼ 99:0 percent
TP þ FN
with corresponding threshold of 0.591) specificity are applied
TN to each method for measuring the predictive performance.
Sp ¼ (9)
TN þ FP The corresponding Sn, Acc, Pre and Mcc values are computed
and shown in Figs. 2c and 2d, respectively, suggesting that
TP þ TN MDNNMD achieves better predictive performance than
Acc ¼ (10)
TP þ TN þ FN þ FP other single-dimensional data based methods in all cases. For

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on May 19,2022 at 04:08:08 UTC from IEEE Xplore. Restrictions apply.
846 IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, VOL. 16, NO. 3, MAY/JUNE 2019

Fig. 2. Performance comparison between MDNNMD, DNN-Expr, DNN-Clinical, and DNN-CNA in different metrics. (a) ROC curve. (b) AUC value.
(c) and (d) Acc, Pre, Sn, and Mcc values at stringent levels of Sp ¼ 99:0 percent with a corresponding threshold of 0.591, and Sp ¼ 95:0 percent with
a corresponding threshold of 0.443.

example, when Sp equals to 99.0 percent, the proposed Sp ¼ 95:0 percent, the corresponding Sn values of
method obtains the largest Pre value and the corresponding MDNNMD, DNN-Clinical, DNN-Expr, and DNN-CNA are
value is 0.875, while Pre values of DNN-Clinical, DNN-Expr, 0.450, 0.322, 0.179 and 0.104, respectively. All above compari-
and DNN-CNA are 0.818, 0.622 and 0.462, respectively. Mean- son results indicate that MDNNMD achieves an overall better
while, the Sn value achieved by MDNNMD is 0.200 at performance than the single-dimensional data based meth-
Sp ¼ 99:0 percent, which is 7.2 percent, 15.3 percent, and ods, confirming the tremendous benefits from integrating
17.6 percent higher than DNN-Clinical, DNN-Expr and multi-dimensional data and multimodal fusion in the progno-
DNN-CNA. When Sp equals to 95.0 percent, the Pre value sis prediction of breast cancer.
of the proposed method is 0.749, which is 6.8 percent, To further demonstrate the predictive results of the
20.6 percent and 34.1 percent higher than those of DNN- multi-dimensional data in assessing the risk of developing
Clinical, DNN-Expr and DNN-CNA, respectively. When distant metastases in breast cancer patients, survival data
analyses of the proposed method is also performed accord-
ing to previous studies [16], [51], [52], the Kaplan-Meier
curve is plotted and shown in Fig. 3, for the aforementioned
datasets. It suggests that there is a significant difference
between the patients with short term survival time and the
patients with long term survival time predicted by our pre-
dictive results (p-value < 10e-10).

3.2 Comparison with Other Prediction Methods


We compare the performance of MDNNMD with three
widely used methods for prognosis prediction of breast can-
cer: SVM [13], RF [14] and LR [49]. Ten-fold cross validation
experiment for prognosis prediction of breast cancer is
conducted with four different methods. In this study, we
use an RF and LR package obtained from scikit-learn available
at http://scikit-learn.org/stable/supervised_learning.
html#supervised-learning. As to SVM method, we use an
SVM package, LIBSVM [53] obtained from https://www.
csie.ntu.edu.tw/cjlin/libsvm/. The detailed ROC curves of
four different methods are plotted in Fig. 4. As expected,
among the four methods, MDNNMD achieves competitive or
Fig. 3. Kaplan-Meier curve of breast cancer prognosis prediction. The
long time survivor and short time survivor classes are predicted by our better performance than SVM, RF and LR. In addition, we
proposed method. also compute the AUC value with four methods. The AUC

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on May 19,2022 at 04:08:08 UTC from IEEE Xplore. Restrictions apply.
SUN ET AL.: A MULTIMODAL DEEP NEURAL NETWORK FOR HUMAN BREAST CANCER PROGNOSIS PREDICTION BY INTEGRATING... 847

TABLE 4
Comparison of Acc, Pre, Sn, and Mcc between MDNNMD,
SVM, RF, and LR

method Acc Pre Sn Mcc


Sp ¼ 99:0% (threshold: 0.591)
MDNNMD 0.794 0.875 0.200 0.356
SVM 0.775 0.811 0.122 0.257
RF 0.770 0.787 0.098 0.223
LR 0.754 0.563 0.037 0.093
Sp ¼ 95:0% (threshold: 0.443)
MDNNMD 0.826 0.749 0.450 0.486
SVM 0.805 0.708 0.365 0.407
RF 0.791 0.766 0.226 0.337
LR 0.760 0.549 0.183 0.209

set is put into our MDNNMD method to train a model and


Fig. 4. The ROC curves of MDNNMD, RF, SVM, and LR. The x axis rep- then use the validation set to tune hyperparameter. At last,
resents 1  Sp and y represents Sn for METABRIC dataset. the trained model is used to do classification on the testing
set in breast cancer prognosis prediction. The ROC curve for
value of MDNNMD is 0.845, while the corresponding AUC comparing with single modal data and other methods are
values of SVM, RF and LR are 0.810, 0.801 and 0.663, respec- plotted and shown in Fig. 5. For TCGA-BRCA dataset,
tively (Fig. 4). MDNNMD is also consistently better than DNN-Clinical,
Additionally, the comparison of Sn, Acc, Pre, and Mcc with DNN-Expr and DNN-CNA. The AUC value of MDNNMD is
four methods at the two stringency levels is listed in Table 4. 18.7, 38.3 and 3.1 percent higher than those of DNN-Expr,
From the measurement of AUC, Pre, Acc, Sn and Mcc values DNN-CNA and DNN-Clinical, respectively. In addition, we
in Table 4, it is shown that SVM and RF method could pro- compare the performance of MDNNMD on the TCGA-BRCA
duce a comparable performance for breast cancer prognosis dataset with three widely used methods for prognosis predic-
prediction, while LR produces relatively inferior result. It is tion of breast cancer: SVM, RF and LR. In Fig. 5b, we find that
also observed that at the circumstance of these two Sp levels MDNNMD is also consistently better than SVM, RF and LR.
(Sp ¼ 99:0 percent with corresponding threshold of 0.591, The AUC value of MDNNMD is 4.8, 3.0 and 5.0 percent
and Sp ¼ 95:0 percent with corresponding threshold of higher than those of SVM, RF and LR, respectively.
0.443), MDNNMD achieves better performance than other
prediction methods including SVM, RF and LR for the pre-
diction of breast cancer prognosis. For example, when Sp is 4 DISCUSSION AND CONCLUSION
95.0 percent, the corresponding Sn values obtained by Breast cancer is the most common disease and is usually
MDNNMD, SVM, RF and LR are 0.450, 0.365, 0.226 and associated with poor prognosis. Thus there is an urgent
0.183, respectively. And the precision value of MDNNMD is need to develop effective and fast computational methods
0.749 at Sp ¼ 95:0%, which is 4.1 and 20.0 percent higher than for breast cancer prognosis prediction. In this work, we
SVM and LR, respectively. When Sp enlarges to 99.0 percent, present a novel multimodal deep neural network by inte-
the Acc, Pre, Sn and Mcc values are increased by 1.9 , 6.4, 7.8 grating multi-dimensional data named MDNNMD to pre-
and 9.9 percent compared with SVM, and are improved by dict the survival time of human breast cancer. To efficiently
2.4 , 8.8 , 10.2 and 13.3 percent compared with RF, and have incorporate multi-dimensional data including gene expres-
an improvement of 4.0, 31.2, 16.3 and 26.3 percent compared sion profile, CNA and clinical data in breast cancer, three
with LR, respectively. Altogether, aforementioned analysis independent DNN models are constructed to generate a
suggests that MDNNMD achieves a better performance than final multimodal DNN model considering the heterogeneity
those other prediction methods, further demonstrating the of different types of data. Then, a decision level multimodal
feasibility of the multimodal deep neural network and the fusion [54] (score fusion) is used to integrate both clinical
usefulness of the multi-dimensional data in breast cancer information and breast cancer-specific relationships betwe-
prognosis prediction. en genes. Generally, due to the successful application of the
multimodal deep learning method in our work, MDNNMD
3.3 Validation achieves a better performance than methods with single-
We also use another independent breast cancer dataset to fur- dimensional data and existing prediction methods, indicat-
ther verify our method’s performance. We download 1,054 ing that combining different data types is an efficient way to
valid breast cancer patients from The Cancer Genome Atlas improve performance of human breast cancer prognosis
(TCGA) project [17]. The new dataset is referred as TCGA- prediction. It is anticipated that our research is worth to be
BRCA, which includes gene expression profile data, CNA extended to other similar diseases and is easy to employ
profile data and clinical information. Preprocessing of the other omics data.
new dataset is the same as METABRIC dataset. The TCGA- Despite the success application of MDNNMD, it still
BRCA dataset is also divided into three parts: training, vali- has some avenues for further investigation predicting sur-
dation and testing set in cross validation process. The training vival time of breast cancer. First, while MDNNMD uses

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on May 19,2022 at 04:08:08 UTC from IEEE Xplore. Restrictions apply.
848 IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, VOL. 16, NO. 3, MAY/JUNE 2019

Fig. 5. The x axis represents 1  Sp and y represents Sn for TCGA-BRCA dataset. (a) The ROC curves of MDNNMD, DNN-Clinical, DNN-Expr, and
DNN-CNA. (b) The ROC curves of MDNNMD, SVM, RF, and LR.

multi-dimensional data to efficiently identify survival time REFERENCES


of breast cancer patients, it is unusable for researches where [1] J. Ferlay, C. Hery, P. Autier, and R. Sankaranarayanan, “Global
multiple omics data are unavailable or incomplete. At burden of breast cancer,” in Breast Cancer Epidemiology, Berlin,
the same time, it is difficult and expensive to obtain a large Germany: Springer, 2010, pp. 1–19.
amount of complete clinical data. But we reasonably [2] E. B. C. T. C. Group, Treatment of Early Breast Cancer. 1. Worldwide
Evidence 1985-1990, New York, NY, USA: Oxford Univ. Press, 1990.
believe that more complete omics data and clinical data will [3] R. A. Smith, K. S. Andrews, D. Brooks, S. A. Fedewa, D.
be available based on the fact that many cancer researches Manassaram-Baptiste, D. Saslow, O. W. Brawley, and R. C. Wender,
are now underway. Second, there are only 1,980 available “Cancer screening in the United States: A review of current Ameri-
valid samples in METABRIC and 1,054 available valid can Cancer Society guidelines and current issues in cancer screen-
ing,” CA: A Cancer J. Clinicians, vol. 67, no. 2, pp. 100–121, 2017.
samples in TCGA-BRCA, which are relatively small and [4] E. A. Rakha, J. S. Reis-Filho, F. Baehner, D. J. Dabbs, T. Decker,
may limit further analysis. It is expected that the perfor- V. Eusebi, S. B. Fox, S. Ichihara, J. Jacquemier, and S. R. Lakhani,
mance of the proposed method would be enhanced when “Breast cancer prognostic classification in the molecular era: The
role of histological grade,” Breast Cancer Res., vol. 12, no. 4, 2010,
more samples become available in future. We also think Art. no. 207.
that it will be more meaningful for cancer researchers if [5] A. G. Rivenbark, S. M. O’Connor, and W. B. Coleman, “Molecular and
MDNNMD is built for each subtype, and its performances cellular heterogeneity in breast cancer: Challenges for personalized
may be further improved. Unfortunately, there are few medicine,” The Amer. J. Pathology, vol. 183, no. 4, pp. 1113–1124, 2013.
[6] L. R. Martin, S. L. Williams, K. B. Haskard, and M. R. DiMatteo,
available data for each subtype of breast cancer patients, “The challenge of patient adherence,” Therapeutics Clinical Risk
especially for training a deep neural network which Manage., vol. 1, no. 3, pp. 189–199, 2005.
required large amount of data [13], [14]. Therefore, analysis [7] P. Stone and S. Lund, “Predicting prognosis in patients with
on each subtype of breast cancers will be a promising advanced cancer,” Annals Oncology, vol. 18, no. 6, pp. 971–976, 2006.
[8] M. J. Van De Vijver, Y. D. He, L. J. Van’t Veer, H. Dai, A. A. Hart,
expansion to our study when more samples become avail- D. W. Voskuil, G. J. Schreiber, J. L. Peterse, C. Roberts, and
able in future. Third, we propose a multimodal DNN M. J. Marton, “A gene-expression signature as a predictor of sur-
model, which simply employs three DNN models. Further- vival in breast cancer,” New England J. Med., vol. 347, no. 25,
more, a promising expansion to the MDNNMD in future pp. 1999–2009, 2002.
[9] Y. Wang, J. G. Klijn, Y. Zhang, A. M. Sieuwerts, M. P. Look,
work would be to employ different deep learning models F. Yang, D. Talantov, M. Timmermans, M. E. Meijer-van Gelder,
such as Deep Belief Network (DBN) and Deep Boltzmann and J. Yu, , “Gene-expression profiles to predict distant metastasis
Machine (DBM). Finally, an interesting future research of lymph-node-negative primary breast cancer,” Lancet, vol. 365,
no. 9460, pp. 671–679, 2005.
direction is to integrate more omics data such as gene meth- [10] M. Khademi and N. S. Nedialkov, “Probabilistic graphical models
ylation, miRNA expression. We also consider employing and deep belief networks for prognosis of breast cancer, in Proc.
features from pathology images of cancer patients in our IEEE 14th Int. Conf. Mach. Learn. Appl., 2015, pp. 727–732.
future work. Another research direction is to construct a [11] Y. Sun, S. Goodison, J. Li, L. Liu, and W. Farmerie, “Improved
breast cancer prognosis through the combination of clinical and
multi-task learning system aiming to different cancer genetic markers,” Bioinf., vol. 23, no. 1, pp. 30–37, 2007.
researches including prediction of cancer susceptibility, [12] O. Gevaert, F. De Smet, D. Timmerman, Y. Moreau, and
cancer recurrence, and cancer treatment. B. De Moor, “Predicting the prognosis of breast cancer by integrat-
ing clinical and microarray data with Bayesian networks,” Bioinf.,
ACKNOWLEDGMENTS vol. 22, no. 14, pp. e184–e190, 2006.
[13] X. Xu, Y. Zhang, L. Zou, M. Wang, and A. Li, “A gene signature
This work was supported by the National Natural for breast cancer prognosis using support vector machine,” in
Science Foundation of China through Grants No. 61471331, Proc. 5th Int. Conf. Biomed. Eng. Inform., 2012, pp. 928–931.
[14] C. Nguyen, Y. Wang, and H. N. Nguyen, “Random forest classi-
No. 61571414, and No. 31100955. Dongdong Sun and Min- fier combined with feature selection for breast cancer diagnosis
ghui Wang contributed equally to this paper. and prognostic,” J. Biomed. Sci. Eng., vol. 6, pp. 551–560, 2013.

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on May 19,2022 at 04:08:08 UTC from IEEE Xplore. Restrictions apply.
SUN ET AL.: A MULTIMODAL DEEP NEURAL NETWORK FOR HUMAN BREAST CANCER PROGNOSIS PREDICTION BY INTEGRATING... 849

[15] J. Hayes, H. Thygesen, C. Tumilson, A. Droop, M. Boissinot, [37] C. Ding and H. Peng, “Minimum redundancy feature selection
T. A. Hughes, D. Westhead, J. E. Alder, L. Shaw, and S. C. Short, from microarray gene expression data,” J. Bioinf. Comput. Biology,
“Prediction of clinical outcome in glioblastoma using a biologi- vol. 3, no. 02, pp. 185–205, 2005.
cally relevant nine-microRNA signature,” Mol. Oncology, vol. 9, [38] Y. Cai, T. Huang, L. Hu, X. Shi, L. Xie, and Y. Li, “Prediction of
no. 3, pp. 704–714, 2015. lysine ubiquitination with mRMR feature selection and analysis,”
[16] Y. Zhang, A. Li, C. Peng, and M. Wang, “Improve glioblastoma Amino Acids, vol. 42, no. 4, pp. 1387–1395, 2012.
multiforme prognosis prediction by using feature selection [39] X. Glorot and Y. Bengio, “Understanding the difficulty of training
and multiple kernel learning,” IEEE/ACM Trans. Comput. Biology deep feedforward neural networks,” in Proc. 13th Int. Conf. Artif.
Bioinf., vol. 13, no. 5, pp. 825–835, Sep./Oct. 2016. Intell. Statis. PMLR, 2010, vol. 9, pp. 249–256.
[17] K. Tomczak, P. Czerwinska, and M. Wiznerowicz, “The Cancer [40] M. Havaei, A. Davy, D. Warde-Farley, A. Biard, A. Courville,
Genome Atlas (TCGA): An immeasurable source of knowledge,” Y. Bengio, C. Pal, P.-M. Jodoin, and H. Larochelle, “Brain tumor
Contemp Oncol (Pozn), vol. 19, no. 1A, pp. A68–A77, 2015. segmentation with deep neural networks,” Medical Image Anal.,
[18] J. Gao, B. A. Aksoy, U. Dogrusoz, G. Dresdner, B. Gross, S. O. Sumer, vol. 35, pp. 18–31, 2017.
Y. Sun, A. Jacobsen, R. Sinha, and E. Larsson, “Integrative ana- [41] V. A. Kumar, S. Gupta, S. S. Chandra, S. Raman, and S. S.
lysis of complex cancer genomics and clinical profiles using the Channappayya, “No-reference quality assessment of tone mapped
cBioPortal,” Sci. Signaling, vol. 6, no. 269, 2013, Art. no. pl1. high dynamic range (HDR) images using transfer learning,” in
[19] D. Ciregan, U. Meier, and J. Schmidhuber, “Multi-column deep Proc. 9th Int. Conf. Quality Multimedia Experience, 2017, pp. 1–3.
neural networks for image classification,” in Proc. IEEE Conf. Com- [42] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating
put. Vis. Pattern Recog., 2012, pp. 3642–3649. deep network training by reducing internal covariate shift,”
[20] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for arXiv preprint arXiv:1502.03167, 2015.
image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern [43] N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and
Recognit., 2016, pp. 770–778. R. Salakhutdinov, “Dropout: A simple way to prevent neural net-
[21] E. Cakir, T. Heittola, H. Huttunen, and T. Virtanen, “Polyphonic works from overfitting,” J. Mach. Learn. Res., vol. 15, no. 1, pp. 1929–
sound event detection using multi label deep neural networks,” 1958, 2014.
in Proc. Int. Joint Conf. Neural Netw., pp. 1–7, 2015. [44] P. Baldi, P. Sadowski, and D. Whiteson, “Searching for exotic par-
[22] Y. Chen, Y. Li, R. Narayan, A. Subramanian, and X. Xie, “Gene ticles in high-energy physics with deep learning,” Nature Com-
expression inference with deep learning,” Bioinf., vol. 32, no. 12, mun., vol. 5, 2014, Art. no. 4308.
pp. 1832–1839, 2016. [45] T. Zhou, Z. Kuscsik, J.-G. Liu, M. Medo, J. R. Wakeling, and
[23] D. Quang, Y. Chen, and X. Xie, “DANN: A deep learning Y.-C. Zhang, “Solving the apparent diversity-accuracy dilemma
approach for annotating the pathogenicity of genetic variants,” of recommender systems,” Proc. Nat. Academy Sci. USA, vol. 107,
Bioinf., vol. 31, no. 5, pp. 761–763, 2014. no. 10, pp. 4511–4515, 2010.
[24] F. Wang and J. Han, “Multimodal biometric authentication based [46] R. Burke, “Hybrid recommender systems: Survey and experi-
on score level fusion using support vector machine,” Opto-Elec- ments,” User Modeling User-Adapted Interaction, vol. 12, no. 4,
tron. Rev., vol. 17, no. 1, pp. 59–64, 2009. pp. 331–370, 2002.
[25] A. K. Jain and A. Ross, “Multibiometric systems,” Commun. ACM, [47] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean,
vol. 47, no. 1, pp. 34–40, 2004. M. Devin, S. Ghemawat, G. Irving, and M. Isard, “TensorFlow: A
[26] K. Sohn, W. Shang, and H. Lee, “Improved multimodal deep system for large-scale machine learning,” OSDI, vol. 16, pp. 265–
learning with variation of information,” Adv. Neural Inf. Process. 283, 2016.
Syst., pp. 2141–2149, 2014. [48] K.-H. Yu, C. Zhang, G. J. Berry, R. B. Altman, C. Re, D. L. Rubin,
[27] J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, and A. Y. Ng, and M. Snyder, “Predicting non-small cell lung cancer prognosis
“Multimodal deep learning,” in Proc. 28th Int. Conf. Mach. Learn. by fully automated microscopic pathology image features,”
(ICML-11), 2011, pp. 689–696. Nature Commun., vol. 7, 2016, Art. no. 12474.
[28] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet [49] M. F. Jefferson, N. Pendleton, S. B. Lucas, and M. A. Horan,
allocation,” J. Mach. Learn. Res., vol. 3, no. Jan, pp. 993–1022, 2003. “Comparison of a genetic algorithm neural network with logistic
[29] N. Srivastava and R. R. Salakhutdinov, “Multimodal learning regression for predicting outcome after surgery for patients with
with deep boltzmann machines,” Adv. Neural Inf. Process. Syst., nonsmall cell lung carcinoma,” Cancer, vol. 79, no. 7, pp. 1338–1342,
pp. 2222–2230, 2012. 1997.
[30] S. E. Kahou, X. Bouthillier, P. Lamblin, C. Gulcehre, V. Michalski, [50] W. Fan, X. Xu, Y. Shen, H. Feng, A. Li, and M. Wang, “Prediction
K. Konda, S. Jean, P. Froumenty, Y. Dauphin, and N. Boulanger- of protein kinase-specific phosphorylation sites in hierarchical
Lewandowski, “Emonets: Multimodal deep learning approaches structure using functional information and random forest,” Amino
for emotion recognition in video,” J. Multimodal User Interfaces, Acids, vol. 46, no. 4, pp. 1069–1078, 2014.
vol. 10, no. 2, pp. 99–111, 2016. [51] J. Ranstam and J. Cook, “Kaplan–Meier curve,” British J. Surg.,
[31] M. Liang, Z. Li, T. Chen, and J. Zeng, “Integrative data analysis of vol. 104, no. 4, pp. 442–442, 2017.
multi-platform cancer data with a multimodal deep learning [52] X. Zhu, J. Yao, X. Luo, G. Xiao, Y. Xie, A. Gazdar, and J. Huang,
approach,” IEEE/ACM Trans. Comput. Biology Bioinf., vol. 12, no. 4, “Lung cancer survival prediction from pathological images and
pp. 928–937, Jul./Aug. 2015. genetic data—An integration study,” in Proc. IEEE 13th Int. Symp.
[32] C. Curtis, S. P. Shah, S.-F. Chin, G. Turashvili, O. M. Rueda, Biomed. Imaging, 2016, pp. 1173–1176.
M. J. Dunning, D. Speed, A. G. Lynch, S. Samarajiwa, and [53] C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support vector
Y. Yuan, “The genomic and transcriptomic architecture of 2,000 machines,” ACM Trans. Intell. Syst. Technol., vol. 2, no. 3, 2011,
breast tumours reveals novel subgroups,” Nature, vol. 486, Art. no. 27.
no. 7403, pp. 346–352, 2012. [54] P. K. Atrey, M. A. Hossain, A. El Saddik, and M. S. Kankanhalli,
[33] O. Troyanskaya, M. Cantor, G. Sherlock, P. Brown, T. Hastie, “Multimodal fusion for multimedia analysis: A survey,” Multime-
R. Tibshirani, D. Botstein, and R. B. Altman, “Missing value esti- dia Syst., vol. 16, no. 6, pp. 345–379, 2010.
mation methods for DNA microarrays,” Bioinf., vol. 17, no. 6,
pp. 520–525, 2001. Dongdong Sun received the BS degree in elec-
[34] A. Aliper, S. Plis, A. Artemov, A. Ulloa, P. Mamoshina, and tronic information engineering from Anhui Univer-
A. Zhavoronkov, “Deep learning applications for predicting phar- sity, China, in 2013. He is currently working
macological properties of drugs and drug repurposing using tran- toward the PhD degree at the University of Sci-
scriptomic data,” Molecular Pharmaceutics, vol. 13, no. 7, pp. 2524, ence and Technology of China (USTC). He is a
2016. member of the Centers for Biomedical Engineer-
[35] J. Tan, J. H. Hammond, D. A. Hogan, and C. S. Greene, “ADAGE ing, USTC. His research interests include deep
analysis of publicly available gene expression data collections illu- learning, bioinformatics, and biostatistics.
minates Pseudomonas aeruginosa-host interactions,” bioRxiv,
Art. no. 030650, 2015.
[36] H. Peng, F. Long, and C. Ding, “Feature selection based on mutual
information criteria of max-dependency, max-relevance, and min-
redundancy,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 8,
pp. 1226–1238, Jun. 2005.

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on May 19,2022 at 04:08:08 UTC from IEEE Xplore. Restrictions apply.
850 IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, VOL. 16, NO. 3, MAY/JUNE 2019

Minghui Wang received the BS degree from the Ao Li received the BS degree in biophysics from
School of Gifted Youth, University of Science the School of Life Science, University of Science
and Technology of China (USTC), and the PhD and Technology of China (USTC), in 2000 and
degree in biomedical engineering from the School the PhD degree in biomedical engineering from
of Information Science and Technology, USTC, the School of Information Science and Technol-
in 2006. She is an associate professor in the ogy, USTC, in 2005. Currently, he is an associate
School of Information Science and Technology professor in the School of Information Science
and Centers for Biomedical Engineering, USTC. and Technology and Centers for Biomedical
Her research interests include bioinformatics, Engineering, USTC. His research contributions
biostatistics, and machine learning. include computational cancer genomics, and bio-
informatics with a focus on issues concerning
systematic identification and evaluation of genome-wide variants in can-
cer. He is a member of the IEEE.

" For more information on this or any other computing topic,


please visit our Digital Library at www.computer.org/publications/dlib.

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on May 19,2022 at 04:08:08 UTC from IEEE Xplore. Restrictions apply.

You might also like