Professional Documents
Culture Documents
Abstract—Breast cancer is a highly aggressive type of cancer with very low median survival. Accurate prognosis prediction of breast
cancer can spare a significant number of patients from receiving unnecessary adjuvant systemic treatment and its related expensive
medical costs. Previous work relies mostly on selected gene expression data to create a predictive model. The emergence of deep
learning methods and multi-dimensional data offers opportunities for more comprehensive analysis of the molecular characteristics of
breast cancer and therefore can improve diagnosis, treatment, and prevention. In this study, we propose a Multimodal Deep Neural
Network by integrating Multi-dimensional Data (MDNNMD) for the prognosis prediction of breast cancer. The novelty of the method lies
in the design of our method’s architecture and the fusion of multi-dimensional data. The comprehensive performance evaluation results
show that the proposed method achieves a better performance than the prediction methods with single-dimensional data and other
existing approaches. The source code implemented by TensorFlow 1.0 deep learning library can be downloaded from the Github:
https://github.com/USTC-HIlab/MDNNMD.
Index Terms—Breast cancer prognosis prediction, multimodal deep neural network, multi-dimensional data
1 INTRODUCTION
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on May 19,2022 at 04:08:08 UTC from IEEE Xplore. Restrictions apply.
842 IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, VOL. 16, NO. 3, MAY/JUNE 2019
gene prognostic signatures, Sun et al. further identify a diverse fields such as computer vision [19], [20], speech rec-
hybrid signature through the composition of both genetic ognition [21] and bioinformatics [22], [23]. Besides, consid-
signatures and clinical markers for the prediction of breast ering the fact that methods based on a single source of
cancer prognosis [11]. The results indicate that the identified information usually suffer from limitations such as lack of
hybrid signatures could significantly improve prognostic non-universality, uniqueness, and noisy data, multimodal
specificity compared with the existing gene signatures and learning is proposed to solve these problems by combining
clinical systems, respectively. Gevaert et al. propose a prob- related information from multiple sources to obtain a final
abilistic model based on Bayesian Network (BN) for prog- decision [24], [25]. Multimodal deep learning as a kind of
nosis of lymph node negative breast cancer by integrating multimodal learning successfully employs deep learning
clinical and genomic data [12]. Given the wide range of method and has been widely employed in a lot of applica-
applications of gene signature dataset, we can find that tions such as computer visual recognition [26], multimedia
using gene expression profiles can accurately predict prog- analysis [25], and speech recognition [27]. For example,
nosis in breast cancer. However, these aforementioned stud- Srivastava et al. use multimodal learning with Deep Boltz-
ies are designed to work with a handful of gene expression mann Machine (DBM) to joint space of image and text and
profile data and under the assumption of independence demonstrate that model significantly outperforms than
between different genes in hypothesis testing. SVM and LDA methods [28] on discriminative tasks [29]. In
In fact, microarray data is high-dimensional which con- addition, Kahou et al. propose a method called EmoNets for
sists of approximately 25000 genes per patient and different emotion recognition in video [30], which is the winning
genes may have potential relations between each other, submission in the 2013 EmotiW challenge.
which may improve the accuracy of breast cancer prognosis Inspired by the successful application of deep learning
prediction. Xu et al. propose an efficient feature selection method nowadays and the large contribution by employing
method based on support vector machine (SVM) for breast multi-dimensional data for cancer prognosis prediction, in
cancer prognosis prediction and achieve a superior predic- this study, we propose a novel Multimodal Deep Neural Net-
tion performance than that of the widely used 70 gene sig- work by integrating Multi-dimensional Data (MDNNMD)
natures [13]. Nguyen et al. introduce a method for breast for human breast cancer prognosis prediction. MDNNMD is
cancer prognosis prediction based on random forest (RF) an efficient method to integrate multi-dimensional data
combined with feature selection and achieve the highest including gene expression profile, copy number alteration
classification accuracy than previous methods [14]. Kha- (CNA) profile and clinical data with a score level fusion at the
demi et al. propose a probabilistic graphical model (PGM) final prediction results. This method considers the heteroge-
[10] by integrating two independent models of microarray neity among different data types and makes full use of
and clinical data for prognosis and diagnosis of breast can- abstract high-level representation from each data source.
cer [10]. They first apply Principal Component Analysis Therefore similar to the definition of “multimodal deep belief
(PCA) to reduce the dimensionality of microarray data and network” in previous work of cancer data analysis problem
construct a deep belief network to extract feature represen- [31], we named our method as “multimodal deep neural
tation of the data. Meanwhile, they also apply a structure network”. The results of ten-fold cross validation experiment
learning algorithm to the clinical data. show that MDNNMD achieves an overall better performance
In addition to the success achieved by the aforementioned than the prediction methods with single-dimensional data
approaches, some novel methods have been proposed by and existing research approaches: support vector machine
integrating multi-dimensional data in the area of human can- (SVM), random forest (RF) and logistic regression (LR). We
cer related prediction. Hayes et al. identify relevant micro- also demonstrate the feasibility of the multimodal deep neu-
RNA and mRNA signatures for predicting high-risk and ral network and the usefulness of the multi-dimensional data
low-risk patients in glioblastoma (GBM) [15]. Zhang et al. in breast cancer prognosis prediction.
propose a multiple kernel machine learning method by fus-
ing different types of data for the GBM prognosis prediction
[16]. It is not surprising that the good performance is
2 METHODS AND MATERIALS
achieved when considering multiple data, which are differ- 2.1 Materials
ent features that are widely used in cancer prognosis predic- We use the METABRIC dataset which is available at http://
tion. However, most of these methods directly combine www.cbioportal.org/study?id¼brca_metabric#summary.
different types of data into model generation, and ignore The dataset is extracted from 1,980 valid breast cancer
that the features from different modalities (e.g., gene signa- patients’ data of the METABRIC trial [32], which contains
ture and clinical) may have different representations. With multi-dimensional data among breast cancer such as gene
the recent advances of next generation sequencing technolo- expression profile, CNA profile and clinical information.
gies, it is now rapidly and extensively informing breast can- The median age at diagnosis is 61 and the average survival
cer diagnosis and prognosis based on multiple omics time of all patients is 125.1 months. Among them, 491 patients
including gene expression profiles, clinical information and are regarded as short term survivors (less than 5 year sur-
DNA copy number alteration [17], [18]. Accordingly, based vival) and 1,489 patients are regarded as long term survivors
on the accelerated development of multiple omics data, there (more than 5 year survival). For classification labels in our
are urgent needs to develop efficient computational methods work, the short term patients are labeled as 0 and long term
for accurately predicting prognosis of human breast cancer. patients are labeled as 1. The properties of our dataset are
Recently, deep learning method has been an emerging listed in Table 1. Each patient has 27 clinical features, includ-
methodology and has shown its superior performance in ing age at diagnosis, size, lymph nodes positive, grade etc.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on May 19,2022 at 04:08:08 UTC from IEEE Xplore. Restrictions apply.
SUN ET AL.: A MULTIMODAL DEEP NEURAL NETWORK FOR HUMAN BREAST CANCER PROGNOSIS PREDICTION BY INTEGRATING... 843
TABLE 1 TABLE 2
The Overall Information of METABRIC The Properties of the Dataset
Breast Cancer Dataset
Data Category Number Feature Number
Cut-off (years) 5
Clinical 27 25
Total population 1980
Gene Expression 24368 400
Long time survivors 1489
Copy Number 26298 200
Short time survivors 491
Median age in diagnosis 61
Average survival (months) 125.1 gk ¼ W k hk1 þ bk ; 1 k N (1)
hk ¼ f gk ; (2)
Here, 25 clinical features are selected as our final clinical fea-
ture data. For gene expression profile data and CNA profile where, W k is the kth weight matrix between ðk 1Þth layer
data, missing values are estimated using a weighted nearest and kth layer. bk is the bias vector for the kth layer. N is the
neighbors algorithm [33]. According to Gevaert et al. [7], the number of layers (here N ¼ 5, including output layer) and
gene expression features are normalized and further discre- hyperbolic tangent (TANH) activation fðÞ is used to hidden
tized into three categories: under-expression (1), over- units, which naturally captures the nonlinear relations
expression (1) and baseline (0). For clinical data, all features within the data. Simultaneously softmax function for output
are normalized into the range [0, 1] by min-max normaliza- layer (N th layer) is used as activation function in DNN
tion [8]. As for CNA features, we directly utilize the original structure and defined as:
data with five discrete values (2, 1, 0, 1, 2).
N exp hN
o ¼P : (3)
2.2 Feature Selection N
j exp hj
A common problem in using high throughput sequencing
datasets is the so called “curse of dimensionality” [34] for Afterwards, we initialize the weights between each layer
human cancer prognosis prediction. In our work, the gene using normalized initialization proposed by Glorot and
expression profile data includes approximately 24,000 genes Bengio [39] and the biases are initialized with small num-
and CNA profile data contains approximately 26,000 genes bers (such as 0.1). The weights between layers are initialized
in the METABRIC dataset. The high dimensionality and from a truncated normal distribution defined by:
small sample size of the dataset may lead to bad results for rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
deep learning methods [35]. It is well known that feature 2 2
WT ; ; (4)
selection plays a key role in the success of a learning algo- ni þ no ni þ no
rithm in the problems involving a large number of features.
where ni , no denote the number of input and output of the
mRMR [36] is one of the most common dimensionality
units, respectively. Given the fact that breast cancer progno-
reduction algorithm in a wide range of applications [16],
sis prediction task in our study is a binary classification task
[37], [38]. Therefore, we apply mRMR feature selection
(long-term survival and short-term survival), we define cross
method to select the features from the original dataset and
entropy loss as objective function for the DNN model in final
reduce the dimensionality of dataset without a significant
output layer. In addition, to further prevent overfitting of
loss of information. Then, we use the area under curve
our deep learning model, L2 regularization is also added
(AUC) value (see Experimental Design) as the criteria to
into our loss function, which is widely used in deep learning
evaluate the performance of the features as previous work
studies [40], [41]. Finally, our proposed DNN method aims
[3]. In detail, we roughly search the best N features from
at minimizing the loss function and is defined as:
100 to 500 with a step size of 100. Finally, 400 genes from
gene expression profile data and 200 genes from CNA pro-
1 X
N
file data are selected as features to our MDNNMD model. Lðyt ; y^t Þ ¼ ½yt ðiÞlog y^t ðiÞ ð1 yt ðiÞÞlog ð1 y^t ðiÞÞ
N i¼0
The detailed properties of multi-dimensional data used in (5)
1 XK X nk X
mk
this study are listed in Table 2. 2
þ wk ;
2 k¼1 j¼1 i¼1 ij
2.3 A Deep Neural Network Prediction Model for a
Single Dataset where L measures errors between predictive scores and the
In this study, a deep neural network (DNN) is used for pre- actual labels. yt ðiÞ is the actual label for the ith class, y^t ðiÞ is
dicting the prognosis of human breast cancer. The DNN the predictive scores obtained from the output layer of our
architectures build a hierarchy from the hidden layers. method. N is the batch size. W k ¼ fwkij gmk nk is the kth
Higher level features are extracted implicitly by the combi- weight matrix and K is the number of weight matrix in
nation of lower level features from each layer. Here, a DNN DNN model (here K ¼ 5).
model is composed of an input layer, multiple hidden layers A common issue in training a DNN model is named
and an output layer. Units between layers are all fully con- “internal-covariate-shift”, which is that input distributions
nected. The input layer with an input vector x consists of change in each layer during training due to the update of
one or multi-dimensional data. The output hkj for layer k parameters from previous layers. In 2015, a novel work
including j units is calculated from the weighted sum of the called batch normalization [42] is proposed by Google to
outputs for the previous layer hk1 (specially h0 ¼ x). solve the aforementioned problem, which allows us to use
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on May 19,2022 at 04:08:08 UTC from IEEE Xplore. Restrictions apply.
844 IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, VOL. 16, NO. 3, MAY/JUNE 2019
Fig. 1. The overall process of our MDNNMD model for the breast cancer prognosis prediction. The prediction model consists of three independent
models corresponding to each data and finally joints predictive scores from each independent model.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on May 19,2022 at 04:08:08 UTC from IEEE Xplore. Restrictions apply.
SUN ET AL.: A MULTIMODAL DEEP NEURAL NETWORK FOR HUMAN BREAST CANCER PROGNOSIS PREDICTION BY INTEGRATING... 845
s:t: a þ b þ g ¼ 1; a
0; b
0; g
0; (7) where TP , FP , TN and FN stand for true positive, false pos-
itive, true negative and false negative, respectively.
where the parameters a; b; g are three weight coefficients
used to balance the contribution for each DNN model. In 2.6 Other Prediction Methods for Comparison
this study, MDNNMD chooses the optimal parameters for To verify the benefit of multimodal DNN by integrating
the parameters of different sub-DNN models, alpha, beta multi-dimensional data, DNN based methods with single-
and gamma according to the best prediction performance dimensional data are examined for the prognosis prediction
by using validation set (see Experimental Design). We of breast cancer in this study. The main difference between
screen different combinations of a; b; g by a step 0.1 and these methods and MDNNMD is that they do not integrate
finally select a ¼ 0:3, b ¼ 0:1 and g ¼ 0:6 for METABRIC multi-dimensional data and only use the data type in one
dataset. MDNNMD is implemented based on TensorFlow type. For simplicity, the DNN based methods that use sin-
1.0 deep learning library [47] which is an open-source soft- gle-dimensional data of gene expression profile data, clini-
ware library for Machine Intelligence. Training is deployed cal data, and CNA profile data are thereafter termed as
with two Nvidia GTX TITAN Z graphics cards. The source DNN-Expr, DNN-Clinical and DNN-CNA, respectively.
code of MDNNMD is publicly available at https://github. In order to show the effectiveness of multimodal deep
com/USTC-HIlab/MDNNMD. learning method in prognosis prediction of breast cancer,
we employ three widely used methods as classifiers in
2.5 Experimental Design human breast cancer prognosis prediction, including sup-
To comprehensively evaluate our proposed method, we use port vector machines (SVM) [13], random forest (RF) [14]
ten-fold cross validation experiment in consistent with previ- and logistic regression (LR) [49] for comparison. In those
ous existing studies of cancer prognosis prediction [16], [48]. algorithms, multi-dimensional data are regarded as feature
Specifically, the patients in our experiment are randomized vector to train the model. The performance is also evaluated
into ten subsets. For every round, nine of those ten subsets by ten-fold cross validation process.
are further divided into training (80 percent) and validation
(20 percent) sets [1], while the remaining one subset is uti- 3 RESULTS
lized as testing set. In this way, we obtain the prediction
scores of each testing subset after ten rounds and then merge 3.1 Comparison of DNN Based Methods with
them as an overall prediction scores. Besides, in our study, Multiple and Single Dimensional Data
MDNNMD does not optimize the model configurations and In order to confirm the effectiveness of multi-dimensional
weight coefficients simultaneously. First we search different data, we first adopt deep learning method on different single
configurations and use single domain training set to train data type to predict breast cancer prognosis. We compare the
each sub-DNNs (weights and bias), and avoid overfitting by performance of MDNNMD with DNN-Expr, DNN-Clinical
using the validation set. Second we choose the optimal con- and DNN-CNA. The ROC curves are plotted for four differ-
figuration parameters by using the AUC value as the criteria. ent methods at each specificity level and displayed in Fig. 2a.
Third, after the sub-DNNs are trained, we screen different As shown in Fig. 2a, MDNNMD achieves better overall per-
combinations of these coefficients (alpha, beta, and gamma) formance than those of the single-dimensional data based
until the classification performance (AUC value) on the vali- methods. Besides the ROC curve, the corresponding AUC
dation set reaches maximum. value for each method is also calculated and displayed in
For performance evaluation, we plot receiver operating Fig. 2b. It is indicated that MDNNMD is consistently better
characteristic (ROC) curve, which shows the interplay than DNN-Expr, DNN-Clinical and DNN-CNA. The AUC
between sensitivity and 1-specificity by varying a decision value (showed in Fig. 2b) of MDNNMD (0.845) is 8.4 percent,
threshold, and computes the AUC. The evaluation metric, 3.8 percent and 23.1 percent higher than those of DNN-Expr,
Sensitivity (Sn), Specificity (Sp), Accuracy (Acc), Precision DNN-Clinical and DNN-CNA, respectively. Finally, we plot
(Pre) and Matthew’s correlation coefficient (Mcc) are also both training loss and validation loss in Supplementary
used for performance evaluation and are defined in the fol- Figure S1 by using TensorBoard which is a visualization tool
lowing equations: in TensorFlow library.
At the same time, by following the study of Fan et al. [50],
TP two stringency levels of medium (Sp ¼ 95:0 percent with cor-
Sn ¼ (8) responding threshold of 0.443) and high (Sp ¼ 99:0 percent
TP þ FN
with corresponding threshold of 0.591) specificity are applied
TN to each method for measuring the predictive performance.
Sp ¼ (9)
TN þ FP The corresponding Sn, Acc, Pre and Mcc values are computed
and shown in Figs. 2c and 2d, respectively, suggesting that
TP þ TN MDNNMD achieves better predictive performance than
Acc ¼ (10)
TP þ TN þ FN þ FP other single-dimensional data based methods in all cases. For
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on May 19,2022 at 04:08:08 UTC from IEEE Xplore. Restrictions apply.
846 IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, VOL. 16, NO. 3, MAY/JUNE 2019
Fig. 2. Performance comparison between MDNNMD, DNN-Expr, DNN-Clinical, and DNN-CNA in different metrics. (a) ROC curve. (b) AUC value.
(c) and (d) Acc, Pre, Sn, and Mcc values at stringent levels of Sp ¼ 99:0 percent with a corresponding threshold of 0.591, and Sp ¼ 95:0 percent with
a corresponding threshold of 0.443.
example, when Sp equals to 99.0 percent, the proposed Sp ¼ 95:0 percent, the corresponding Sn values of
method obtains the largest Pre value and the corresponding MDNNMD, DNN-Clinical, DNN-Expr, and DNN-CNA are
value is 0.875, while Pre values of DNN-Clinical, DNN-Expr, 0.450, 0.322, 0.179 and 0.104, respectively. All above compari-
and DNN-CNA are 0.818, 0.622 and 0.462, respectively. Mean- son results indicate that MDNNMD achieves an overall better
while, the Sn value achieved by MDNNMD is 0.200 at performance than the single-dimensional data based meth-
Sp ¼ 99:0 percent, which is 7.2 percent, 15.3 percent, and ods, confirming the tremendous benefits from integrating
17.6 percent higher than DNN-Clinical, DNN-Expr and multi-dimensional data and multimodal fusion in the progno-
DNN-CNA. When Sp equals to 95.0 percent, the Pre value sis prediction of breast cancer.
of the proposed method is 0.749, which is 6.8 percent, To further demonstrate the predictive results of the
20.6 percent and 34.1 percent higher than those of DNN- multi-dimensional data in assessing the risk of developing
Clinical, DNN-Expr and DNN-CNA, respectively. When distant metastases in breast cancer patients, survival data
analyses of the proposed method is also performed accord-
ing to previous studies [16], [51], [52], the Kaplan-Meier
curve is plotted and shown in Fig. 3, for the aforementioned
datasets. It suggests that there is a significant difference
between the patients with short term survival time and the
patients with long term survival time predicted by our pre-
dictive results (p-value < 10e-10).
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on May 19,2022 at 04:08:08 UTC from IEEE Xplore. Restrictions apply.
SUN ET AL.: A MULTIMODAL DEEP NEURAL NETWORK FOR HUMAN BREAST CANCER PROGNOSIS PREDICTION BY INTEGRATING... 847
TABLE 4
Comparison of Acc, Pre, Sn, and Mcc between MDNNMD,
SVM, RF, and LR
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on May 19,2022 at 04:08:08 UTC from IEEE Xplore. Restrictions apply.
848 IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, VOL. 16, NO. 3, MAY/JUNE 2019
Fig. 5. The x axis represents 1 Sp and y represents Sn for TCGA-BRCA dataset. (a) The ROC curves of MDNNMD, DNN-Clinical, DNN-Expr, and
DNN-CNA. (b) The ROC curves of MDNNMD, SVM, RF, and LR.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on May 19,2022 at 04:08:08 UTC from IEEE Xplore. Restrictions apply.
SUN ET AL.: A MULTIMODAL DEEP NEURAL NETWORK FOR HUMAN BREAST CANCER PROGNOSIS PREDICTION BY INTEGRATING... 849
[15] J. Hayes, H. Thygesen, C. Tumilson, A. Droop, M. Boissinot, [37] C. Ding and H. Peng, “Minimum redundancy feature selection
T. A. Hughes, D. Westhead, J. E. Alder, L. Shaw, and S. C. Short, from microarray gene expression data,” J. Bioinf. Comput. Biology,
“Prediction of clinical outcome in glioblastoma using a biologi- vol. 3, no. 02, pp. 185–205, 2005.
cally relevant nine-microRNA signature,” Mol. Oncology, vol. 9, [38] Y. Cai, T. Huang, L. Hu, X. Shi, L. Xie, and Y. Li, “Prediction of
no. 3, pp. 704–714, 2015. lysine ubiquitination with mRMR feature selection and analysis,”
[16] Y. Zhang, A. Li, C. Peng, and M. Wang, “Improve glioblastoma Amino Acids, vol. 42, no. 4, pp. 1387–1395, 2012.
multiforme prognosis prediction by using feature selection [39] X. Glorot and Y. Bengio, “Understanding the difficulty of training
and multiple kernel learning,” IEEE/ACM Trans. Comput. Biology deep feedforward neural networks,” in Proc. 13th Int. Conf. Artif.
Bioinf., vol. 13, no. 5, pp. 825–835, Sep./Oct. 2016. Intell. Statis. PMLR, 2010, vol. 9, pp. 249–256.
[17] K. Tomczak, P. Czerwinska, and M. Wiznerowicz, “The Cancer [40] M. Havaei, A. Davy, D. Warde-Farley, A. Biard, A. Courville,
Genome Atlas (TCGA): An immeasurable source of knowledge,” Y. Bengio, C. Pal, P.-M. Jodoin, and H. Larochelle, “Brain tumor
Contemp Oncol (Pozn), vol. 19, no. 1A, pp. A68–A77, 2015. segmentation with deep neural networks,” Medical Image Anal.,
[18] J. Gao, B. A. Aksoy, U. Dogrusoz, G. Dresdner, B. Gross, S. O. Sumer, vol. 35, pp. 18–31, 2017.
Y. Sun, A. Jacobsen, R. Sinha, and E. Larsson, “Integrative ana- [41] V. A. Kumar, S. Gupta, S. S. Chandra, S. Raman, and S. S.
lysis of complex cancer genomics and clinical profiles using the Channappayya, “No-reference quality assessment of tone mapped
cBioPortal,” Sci. Signaling, vol. 6, no. 269, 2013, Art. no. pl1. high dynamic range (HDR) images using transfer learning,” in
[19] D. Ciregan, U. Meier, and J. Schmidhuber, “Multi-column deep Proc. 9th Int. Conf. Quality Multimedia Experience, 2017, pp. 1–3.
neural networks for image classification,” in Proc. IEEE Conf. Com- [42] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating
put. Vis. Pattern Recog., 2012, pp. 3642–3649. deep network training by reducing internal covariate shift,”
[20] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for arXiv preprint arXiv:1502.03167, 2015.
image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern [43] N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and
Recognit., 2016, pp. 770–778. R. Salakhutdinov, “Dropout: A simple way to prevent neural net-
[21] E. Cakir, T. Heittola, H. Huttunen, and T. Virtanen, “Polyphonic works from overfitting,” J. Mach. Learn. Res., vol. 15, no. 1, pp. 1929–
sound event detection using multi label deep neural networks,” 1958, 2014.
in Proc. Int. Joint Conf. Neural Netw., pp. 1–7, 2015. [44] P. Baldi, P. Sadowski, and D. Whiteson, “Searching for exotic par-
[22] Y. Chen, Y. Li, R. Narayan, A. Subramanian, and X. Xie, “Gene ticles in high-energy physics with deep learning,” Nature Com-
expression inference with deep learning,” Bioinf., vol. 32, no. 12, mun., vol. 5, 2014, Art. no. 4308.
pp. 1832–1839, 2016. [45] T. Zhou, Z. Kuscsik, J.-G. Liu, M. Medo, J. R. Wakeling, and
[23] D. Quang, Y. Chen, and X. Xie, “DANN: A deep learning Y.-C. Zhang, “Solving the apparent diversity-accuracy dilemma
approach for annotating the pathogenicity of genetic variants,” of recommender systems,” Proc. Nat. Academy Sci. USA, vol. 107,
Bioinf., vol. 31, no. 5, pp. 761–763, 2014. no. 10, pp. 4511–4515, 2010.
[24] F. Wang and J. Han, “Multimodal biometric authentication based [46] R. Burke, “Hybrid recommender systems: Survey and experi-
on score level fusion using support vector machine,” Opto-Elec- ments,” User Modeling User-Adapted Interaction, vol. 12, no. 4,
tron. Rev., vol. 17, no. 1, pp. 59–64, 2009. pp. 331–370, 2002.
[25] A. K. Jain and A. Ross, “Multibiometric systems,” Commun. ACM, [47] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean,
vol. 47, no. 1, pp. 34–40, 2004. M. Devin, S. Ghemawat, G. Irving, and M. Isard, “TensorFlow: A
[26] K. Sohn, W. Shang, and H. Lee, “Improved multimodal deep system for large-scale machine learning,” OSDI, vol. 16, pp. 265–
learning with variation of information,” Adv. Neural Inf. Process. 283, 2016.
Syst., pp. 2141–2149, 2014. [48] K.-H. Yu, C. Zhang, G. J. Berry, R. B. Altman, C. Re, D. L. Rubin,
[27] J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, and A. Y. Ng, and M. Snyder, “Predicting non-small cell lung cancer prognosis
“Multimodal deep learning,” in Proc. 28th Int. Conf. Mach. Learn. by fully automated microscopic pathology image features,”
(ICML-11), 2011, pp. 689–696. Nature Commun., vol. 7, 2016, Art. no. 12474.
[28] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet [49] M. F. Jefferson, N. Pendleton, S. B. Lucas, and M. A. Horan,
allocation,” J. Mach. Learn. Res., vol. 3, no. Jan, pp. 993–1022, 2003. “Comparison of a genetic algorithm neural network with logistic
[29] N. Srivastava and R. R. Salakhutdinov, “Multimodal learning regression for predicting outcome after surgery for patients with
with deep boltzmann machines,” Adv. Neural Inf. Process. Syst., nonsmall cell lung carcinoma,” Cancer, vol. 79, no. 7, pp. 1338–1342,
pp. 2222–2230, 2012. 1997.
[30] S. E. Kahou, X. Bouthillier, P. Lamblin, C. Gulcehre, V. Michalski, [50] W. Fan, X. Xu, Y. Shen, H. Feng, A. Li, and M. Wang, “Prediction
K. Konda, S. Jean, P. Froumenty, Y. Dauphin, and N. Boulanger- of protein kinase-specific phosphorylation sites in hierarchical
Lewandowski, “Emonets: Multimodal deep learning approaches structure using functional information and random forest,” Amino
for emotion recognition in video,” J. Multimodal User Interfaces, Acids, vol. 46, no. 4, pp. 1069–1078, 2014.
vol. 10, no. 2, pp. 99–111, 2016. [51] J. Ranstam and J. Cook, “Kaplan–Meier curve,” British J. Surg.,
[31] M. Liang, Z. Li, T. Chen, and J. Zeng, “Integrative data analysis of vol. 104, no. 4, pp. 442–442, 2017.
multi-platform cancer data with a multimodal deep learning [52] X. Zhu, J. Yao, X. Luo, G. Xiao, Y. Xie, A. Gazdar, and J. Huang,
approach,” IEEE/ACM Trans. Comput. Biology Bioinf., vol. 12, no. 4, “Lung cancer survival prediction from pathological images and
pp. 928–937, Jul./Aug. 2015. genetic data—An integration study,” in Proc. IEEE 13th Int. Symp.
[32] C. Curtis, S. P. Shah, S.-F. Chin, G. Turashvili, O. M. Rueda, Biomed. Imaging, 2016, pp. 1173–1176.
M. J. Dunning, D. Speed, A. G. Lynch, S. Samarajiwa, and [53] C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support vector
Y. Yuan, “The genomic and transcriptomic architecture of 2,000 machines,” ACM Trans. Intell. Syst. Technol., vol. 2, no. 3, 2011,
breast tumours reveals novel subgroups,” Nature, vol. 486, Art. no. 27.
no. 7403, pp. 346–352, 2012. [54] P. K. Atrey, M. A. Hossain, A. El Saddik, and M. S. Kankanhalli,
[33] O. Troyanskaya, M. Cantor, G. Sherlock, P. Brown, T. Hastie, “Multimodal fusion for multimedia analysis: A survey,” Multime-
R. Tibshirani, D. Botstein, and R. B. Altman, “Missing value esti- dia Syst., vol. 16, no. 6, pp. 345–379, 2010.
mation methods for DNA microarrays,” Bioinf., vol. 17, no. 6,
pp. 520–525, 2001. Dongdong Sun received the BS degree in elec-
[34] A. Aliper, S. Plis, A. Artemov, A. Ulloa, P. Mamoshina, and tronic information engineering from Anhui Univer-
A. Zhavoronkov, “Deep learning applications for predicting phar- sity, China, in 2013. He is currently working
macological properties of drugs and drug repurposing using tran- toward the PhD degree at the University of Sci-
scriptomic data,” Molecular Pharmaceutics, vol. 13, no. 7, pp. 2524, ence and Technology of China (USTC). He is a
2016. member of the Centers for Biomedical Engineer-
[35] J. Tan, J. H. Hammond, D. A. Hogan, and C. S. Greene, “ADAGE ing, USTC. His research interests include deep
analysis of publicly available gene expression data collections illu- learning, bioinformatics, and biostatistics.
minates Pseudomonas aeruginosa-host interactions,” bioRxiv,
Art. no. 030650, 2015.
[36] H. Peng, F. Long, and C. Ding, “Feature selection based on mutual
information criteria of max-dependency, max-relevance, and min-
redundancy,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 8,
pp. 1226–1238, Jun. 2005.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on May 19,2022 at 04:08:08 UTC from IEEE Xplore. Restrictions apply.
850 IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, VOL. 16, NO. 3, MAY/JUNE 2019
Minghui Wang received the BS degree from the Ao Li received the BS degree in biophysics from
School of Gifted Youth, University of Science the School of Life Science, University of Science
and Technology of China (USTC), and the PhD and Technology of China (USTC), in 2000 and
degree in biomedical engineering from the School the PhD degree in biomedical engineering from
of Information Science and Technology, USTC, the School of Information Science and Technol-
in 2006. She is an associate professor in the ogy, USTC, in 2005. Currently, he is an associate
School of Information Science and Technology professor in the School of Information Science
and Centers for Biomedical Engineering, USTC. and Technology and Centers for Biomedical
Her research interests include bioinformatics, Engineering, USTC. His research contributions
biostatistics, and machine learning. include computational cancer genomics, and bio-
informatics with a focus on issues concerning
systematic identification and evaluation of genome-wide variants in can-
cer. He is a member of the IEEE.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on May 19,2022 at 04:08:08 UTC from IEEE Xplore. Restrictions apply.