You are on page 1of 24

Multimedia Tools and Applications

https://doi.org/10.1007/s11042-020-09384-6

Deep learning neural network for texture


feature extraction in oral cancer: enhanced loss
function

Bishal Bhandari 1 & Abeer Alsadoon 1 1


& P. W. C. Prasad & Salma Abdullah &
2

Sami Haddad 3,4

Received: 27 October 2019 / Revised: 28 June 2020 / Accepted: 16 July 2020

# Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract
The use of a binary classifier like the sigmoid function and loss functions reduces the
accuracy of deep learning algorithms. This research aims to increase the accuracy of
detecting and classifying oral tumours within a reduced processing time. The proposed
system consists of a Convolutional neural network with a modified loss function to
minimise the error in predicting and classifying oral tumours by reducing the overfitting
of the data and supporting multi-class classification. The proposed solution was tested on
data samples from multiple datasets with four kinds of oral tumours. The averages of the
different accuracy values and processing times were calculated to derive the overall
accuracy. Based on the obtained results, the proposed solution achieved an overall
accuracy of 96.5%, which was almost 2.0% higher than the state-of-the-art solution with
94.5% accuracy. Similarly, the processing time has been reduced by 30–40 milliseconds
against the state-of-the-art solution. The proposed system is focused on detecting oral
tumours in the given magnetic resonance imaging (MRI) scan and classifying whether the
tumours are benign or malignant. This study solves the issue of over fitting data during
the training of neural networks and provides a method for multi-class classification.

Keywords Deep learning . Convolutional neural network (CNN) . Oral tumor . Loss function .
Region of interest (ROI)

* Abeer Alsadoon
aalsadoon@studygroup.com

1
School of Computing and Mathematics, Charles Sturt University, Sydney Campus, Sydney, Australia
2
Department of Computer Engineering, University of Technology, Baghdad, Iraq
3
Department of Oral and Maxillofacial Services, Greater Western Sydney Area Health Services,
Mount Druitt, Australia
4
Department of Oral and Maxillofacial Services, Central Coast Area Health, Gosford, Australia
Multimedia Tools and Applications

1 Introduction

Oral cancer is a complex wide spread cancer, which has high severity. Using advanced
technology and deep learning algorithm early detection and classification are made possible.
Medical imaging technique, computer-aided diagnosis and detection can make potential
changes in cancer treatment, can now be detected in the early stages by analysing computed
tomography (CT) scans, magnetic resonance imaging (MRI) scans, and X-ray images [16].
This makes the anatomical analysis of the oral cavity easier and enables the accurate extraction
of the normal region from the tumour-prone regions. During the tumour extraction stage, the
image needs to be applied with various segmentation approaches to separate the normal
regions and cancer-prone regions. This fails to segment the image properly and increases the
processing time, ultimately failing to detect the tumour. The use of deep learning technology
and proper segmentation of CT images will positively affect the current solutions for the
accurate detection and classification of oral tumours.
With the rapid advancement in computer-aided techniques in recent years, the application
of deep learning is playing a vital role in the medical field [29] From a quick analysis of any
medical condition for localising, detecting, and diagnosing cancer in various body parts, deep
learning is having a widespread effect in the field of medical science. The Convolutional
neural network (CNN) is one of the most successful methods of deep learning in the field of
medical image analysis. The CNN is first trained with the training dataset to learn the features
in the dataset, which are later used for extracting and classifying similar features. It has been
used in oral tumour diagnoses as well. It makes the tumour diagnosis process easier and more
accurate. However, the CNN is highly dependent on various factors, such as the quality of the
training datasets, the number of layers used, the activation function, the classification ability,
and so on. The accuracy might be subject to different regression and classification losses.
The current study uses various algorithms and techniques for better feature extraction and
image classification to identify and classify the tumour in the oral cavity. Some solutions have
better accuracy, whereas others have a reduced processing time. Whichever algorithm may be
used, most of these solutions follow similar steps of image pre-processing, image segmenta-
tion, feature extraction, feature selection, and classification. However, the existing CNN
techniques have some limitation resulting in low accuracy, high processing time [2].
The state-of-the-art solution was able to give a classification accuracy of 94.5% [16].
The accuracy is largely affected by various factors, such as the size of the network,
the network layers, the activation function, and so on. Therefore, further improve-
ments can be made to the current best solution.
This paper aims to further improve the accuracy and reduce processing time of the system
to detecting and classifying oral tumours by combining the modified loss function with the
rectified linear unit (ReLU) activation function during the implementation of the CNN. The
inclusion of the cross-entropy loss function helps to overcome the problem of overfitting in the
training datasets. This study proposes a modified loss function and improves processing time,
which is a combination of the mean square error (MSE) rate and the cross-entropy loss
function. This combination of regression loss and classification loss improves processing time
and performance of the neural network and reduces the errors in image classification.
The remaining sections of the paper are structured as follows: The section 2 discusses how
to describe the previous literature related to the problem in “Literature Review” followed by
section 3, “proposed systems” describes the details of the proposed model include the block
diagram and pseudocode of the proposed formula, the section that followed in section 4,
Multimedia Tools and Applications

“Results and Discussion” discusses the various testing techniques used in this algorithm with
different samples on areas. Finally section 5, “Conclusions and Future Work” describes the
comparison between current and proposed system results and provides the conclusion and
provides recommendations for future work.

2 Literature review

The main purpose of this section is to study and provide an overview of the existing solutions
in a similar domain. It provides a brief description of the various methodologies, tools, and
techniques used in different solutions in similar areas of study. By integrating both clinical
biomarkers and extracted features from a diffusion-weighted MRI collected multiple times.
[6] improved the current predictive model by considering the tumor depth as a measuring
factor. The author applied recursive feature elimination to determine the most important
feature to optimize the classifier performance.Which improved [10]. They offered the solution
by developing the classificaiton algorithms to predict pathological lymph node metastatis
which maximized the area under the receiver operating characteristic curve (AUC) to 0.840
using decision forest tree algorithm. While the classification performance has been improved
to 0.840 as compared to other predictive model, the quality of input data set has not been
considered in the experiment. The algorithm needs to consider the quality of data in which the
actual depth of invasion (DOI) is measured according to the accepted standard.
Enhanced the CNN-based framework for predicting head and neck cancer out-
comes. The performance of the framework was enhanced using approaches such as
handcrafted features, associating the deep layer of CNN with the fully connected
layer, and the transfer learning method for fine-tuning the calculated weights. This
solution was able to achieve an area under the curve (AUC) of 0.92% and accuracy,
specificity, and sensitivity of 88%, 0.89, and 0.86, respectively, which are quite high
compared to the existing solutions [11] The high accuracy of this solution is because
of the use of filters within the CNN to explicitly recognise the radiomic features.
Moreover, [5] successfully applied an improvised version of image recognition based on
the CNN in the confocal laser endomicroscopy images of oral squamous-cell carcinoma. A
combination of the grey-level co-occurrence matrix and the local binary pattern was used to
extract the statistical features along with the textural features from the image. While classifying
the image, patch-based Convolutional net processing was used to classify the patches as
proposed by [23] whereas the whole image was fed into transfer learning with a CNN to
reduce the order of the pattern recognition and classification errors. This method achieved
accuracy, sensitivity, and specificity of 88.3%, 86.6%, and 90%, respectively, and the AUC
was 0.96. To predict the presence or absence of extra nodal extension and nodal metastasis in
the CT images of the heads and necks of cancer patients constructed a 3D deep learning neural
network. The proposed model uses the predefined radiomic features using raw pixel informa-
tion and the random forest classifier to extract and analyse the quantitative imaging features
with better accuracy than the existing system. It achieved an accuracy, sensitivity, and
specificity of 85.7%, 0.88, and 0.85, respectively.
To drastically improve the accuracy in predicting oral cancer [1] introduced the gravita-
tional search-optimised echo-state neural network. The use of the adaptive Weiner filter helps
to eliminate the noise in the X-ray images. This paper conducted research using the enhanced
Markov stimulated annealing technique to segment the affected region and analyse the derived
Multimedia Tools and Applications

features with the help of the proposed search-optimised classifier. This solution provides an
accuracy of 99.2% in oral cancer recognition, which is quite high compared to similar
methods, such as the support vector machine (SVM), neural networks, and the multi-layer
perception with an accuracy of 89.2%, 94.1%, and 95.2%, respectively [25]. For predicting
cancer, [30] combined a semi-supervised deep learning strategy proposed by [26] with stacked
auto-encoder-based classification. A greedy layer-wise pre-training was combined with a
sparsity penalty term to capture and extract information from high-dimensional data to classify
the datasets. The prediction performance of the proposed solution achieved an accuracy of
97.54% and a precision of 99.68%.
Moreover, [18] applied a modified version of the faster region-CNN (R-CNN) that includes
features like layer concatenation and spatial constrained layers to improve the predictive ability
of the faster R-CNN. This method improves the functioning of the detection even in the case of
fewer training samples or blurry cancer regions in the collected ultrasound images. The
performance measured in terms of the true positive rate and true negative rate was 0.935
and 0.185, respectively, which is higher compared to the state-of-the-art solution chosen by the
author [17]. In addition [7] enhanced the machine-learning algorithm for the prediction of
occult nodal metastasis in oral-cavity cell carcinoma. A classification algorithm was developed
using logistic regression, decision forest, kernel SVM, and gradient boosting machine [22] to
predict the pathological nodal metastasis. Fivefold cross validation was performed during
training of the algorithm to ensure the stability of the model and to reduce bias.
In terms of classification, [24] improved the existing CNN proposed by [27] and used
transfer learning from the large sample dataset into the available radiographic image dataset
with known biopsy results for the secondary training. This makes the classification of the two
kinds of jaw tumours more accurate. It provides the accuracy, sensitivity, specificity, and
diagnostic time about 83%, 81.8%, 83.3%, and 38 s, respectively. This research provides very
valuable insight in developing enhanced CNN models for the classification of tumours based
on the various features extracted from the radiographic image. The transformation process
used in this study to further refine and improve the image quality can be very useful. Further
studies should be conducted to verify and improve the CNN models so that the accuracy and
specificity in the detection of jaw tumours can be increased. If the accuracy in the detection is
highly increased, it can be widely used for screening and diagnostic applications. Moreover
[16] enhanced the deep learning algorithm for automated, computer-aided diagnosis and
classification of oral cancer by examining the patient’s hyper spectral images. This solution
demonstrated a new structure of partitioned deep CNN with two partitioned layers for
classifying the labelling region of interest in multidimensional hyper spectral images. It
provided an accuracy, sensitivity, and specificity of 94.5%, 0.98, and 0.94, respectively. This
solution provided invaluable insight on the structure of the neural network for classifying
medical images. The accuracy obtained with this solution was higher than any other base
classifier, such as the SVM and deep belief network (DBN) [15]. The partitioned CNN
proposed in this solution rectified various limitations in the traditional CNN and provided
for efficient image classification with the required features.
[9] Implemented automatic identification of cancer in oral tissue from pathological images
using a deep convolutional neural network. This solution (texture based random forest
classifier) provides 96.88% accuracy of detection for keratin pearls that provide aid for
evaluation of histological images during diagnosis to clinicians [8] Utilizing Gabor texture
features, classification of pearl and keratin area is implemented with the use of random forests
tree classifier. The results of proposed computer aided automatic tool segments subepithelial
Multimedia Tools and Applications

layers, epithelial, and detection of keratin pearl. This can be utilized for oral precancerous
screening as well as Oral Squamous Cell Carcinoma (OSCC) grading, respectively. This
would assist clinician for bias free and fast diagnosis. Nevertheless, the algorithm developed
by the researchers is of interest to the proposed work as it has proven to reduce processing time
in this and other research.
In addition, [29] assembled various deep learning approaches incorporating the K-nearest
neighbour (KNN), SVMs, decision trees (DTs), random forests, and gradient boosting decision
trees as classification models. This ensemble classification approach was then implemented on
the datasets of three sets of cancer. By combining various classification methods, this solution
was successful in compensating for the shortcomings of each of these methods if used
individually, as discussed by [3]. A classification accuracy of 95.60% was achieved, which
is greater than each of the classifiers used individually. Moreover, [31] introduced a
regularised ensemble framework focused on solving the problem of imbalanced training data
and the multi-class learning problem in the existing solutions [28]. Unpredictable
regularisation was used to address the issues with misclassification in the previous learning
phase and to adjust the loss function in the classifier weight calculation. The iterative boosting
method was used to adjust the decision boundary, which reduced the weighted errors, enabling
the correct classification of difficult examples. Since this is an ensemble classifier, the
performance accuracy differs with different base learners. The average accuracy improvement
was from 1.9% to 3.94%.
[13] Adapted a tissue classification method by CNN using hyperspectral imaging (HSI)
from optical biopsy datasets. They used CNN to distinguish the oral squamous cells from aero-
digestive tissues. The method achieved an accuracy rate of 81% at 81% sensitivity and 80%
specificity. Decision tree (DT) algorithm was used to divide the normal tissues divided into
two stages, epithelium, and glandular mucosa. An accuracy of 90% with 93% sensitivity and
89% specificity was obtained using DT. Thyroid carcinoma and normal thyroid differentiate
by CNN with 92% accuracy, 92% specificity, and 92% sensitivity. The study shows that HSI
based structure of optical biopsy technique using CNN can provide multi-category detection
information for head and neck carcinomas. However, more data is required to establish the
reliability and generalizability of the study.
Moreover [12] sampled the input by converting the large image into the spectral patches
which is used in the convolutional neural network for image classification. They evaluated the
performance by counting the steps and using the cross-validation method which is also the
benchmark for the proposed solution. This solution has improved the state of art solution [20].
By combining two CNNs to accurately detect and segment the organs at risk in CT images
[19] improved the organ-at-risk detection and segmentation network proposed by [21]. While
the first CNN assigns organ-bounding boxes with their scores, the second CNN predicts the
segmentation masks for each organ by applying the proposed bounding boxes. The sensitivity
and specificity of the detection result were 0.997 and 0.983, respectively.

2.1 State of the art

This part represents the features of the current system (highlighted inside the blue dashed line),
and the limitations (highlighted inside the red dashed line) as shown in (Fig. 1). To improve
the accuracy in the diagnosis of oral cancer [16] proposed a regression-based partitioned deep
learning algorithm. Use of the bagging and boosting method helps prepare the cancerous hyper
spectral dataset for classification. This paper conducted research using the bounding box and
Multimedia Tools and Applications

Fig. 1 Block diagram of the state of the art system [16], [The blue borders show the good features of this state of
art solution, and the red border refers to the limitation of it]

region of interest (ROI) techniques to segment the affected region and analyse the
derived features with the help of the proposed classifier. This solution provides an
accuracy of 94.5% in oral cancer classification, which is quite high compared to
similar methods, such as SVM and DBN, with an accuracy of 84.2% and 86.7%,
respectively. This model consists of four major stages pre-processing, segmentation
and feature extraction, feature selection and classification.

Pre-processing stage The pre-processing stage starts by obtaining hyper spectral images of
possible cancer-prone regions. The bagging and boosting method are used to formulate a
cancerous image dataset from the images to feed the classifier. The weight of the training data
is combined based on the votes for each dataset, which is then used to train the DTs. It
combines the output from DTs based on the votes of the weight. However, this stage lacks a
filter to remove the unwanted noise in the image.

Segmentation and feature extraction stage In this stage, the standard datasets of the
BioGPS UCI repository are used to extract a normal dataset, which is represented by the
vector set. Each image patch is defined by a boundary with a class label. The ROI is detected
based on the boundary and is marked for a single image patch. It extracts the region as either
cancerous or benign, and the regions with no boundaries are considered normal tissue.
Multimedia Tools and Applications

Feature selection stage In this stage, the extracted features are selected for classification.
While selecting the features, the intensity value of the image segments along with the textural
information about the spatial and spectral evidence is considered. The volumetric pixel (voxel)
in the image is considered in order to identify the multidimensional frequency of different
bandwidths in the image segment.

Classification stage In this stage, the cancer-image patch-refinement, deep learning algorithm
[is used to classify the oral tumour from the selected feature vectors. The deep CNN consists of
an input layer, two functional layers, and one fully connected layer. The feature vectors of the
image patches are presented in the input layer. The Convolutional layer consists of multiple 5 ×
5 kernels and uses the sigmoid function as the activation function. The pooling layer uses the
max filter for max pooling. The fully connected layer has a 1 × 1 kernel of softmax prediction to
classify the image region as either a normal or benign tumour or a cancerous malignant tumour.
In the pre-processing stage of the system, no noise-filtering technique is used to reduce the
noise of the image. The unwanted noise in the image can degrade the quality of the image, thus
reducing the classification accuracy. Moreover, the pre-processing method has also not
considered the edge-area enhancement. There might be a small tumour towards the edge of
the image, which could be unnoticed in the current solution. The sigmoid function that is used
as the activation function in the state-of-the-art system is only capable of binary classification.
Its values are within the range (0, 1). Any small or large values passing through the sigmoid
function will only be closer to 0 or 1, causing its gradient to be close to zero, thus slowing the
learning process. In addition, while calculating the activation function, only the empirical loss
is considered. We can include the regression loss, such as the MSE, and the classification loss,
such as the cross-entropy loss, to further improve the classification accuracy and reduce the
processing time. This state-of-the-art model presented an oral tumour classification accuracy,
sensitivity, and specificity of 94.5%, 0.98, and 0.94, respectively The regression-based deep
learning algorithm is implemented to improve the accuracy of the classification, as shown in
Eq. (1). However, considering the cross-entropy loss and MSE rate in the activation function
rather than the sigmoid function can still increase the accuracy.
The state of art algorithm is given in Eq. 1.

E ¼ ∑Nk¼1 θ ðyi ; xi Þ ð1Þ

Where,

N is the number of iterations.


i is the corresponding matrix of image patch.
k is the index of the training patch.
θ is the matching function.
yi is the label point corresponding to the image patch at y axis.
xi is the label point corresponding to the image patch at x axis.
Sigmoid function is expressed as:
f ¼ maxð0; xÞ
Where,

x is the set of input values.


Multimedia Tools and Applications

As the function has a single input, it returns 0 if the input is negative and returns itself if the
input is positive Tables 1, 2 and 3.
Flowchart of the image patch refinement algorithm (Fig. 2).

3 Proposed system

Many classification and detection techniques in predicting oral tumours were analysed in this
study. On the basis of the analysis, the accuracy, sensitivity, specificity, and processing time
are the main issues to be considered. After careful analysis of each the papers, it has been
selected [16] as the best solution. The main reason for selecting this solution because it applies
a partitioned regression-based deep learning algorithm for classifying the oral features using
the effective connectivity of the hidden layers. It uses a patch-based feature map to update the
weight value of the hidden layer that works according to the force interaction with the feature
training. Thus, the features are updated continuously, which makes the network training
quicker and efficient. The use of the quadratic function during the testing process further
reduces the error function.
However, this solution has some limitations in the pre-processing stage. The image is
repeatedly analysed using the bagging and boosting method, which ultimately increases the
overall processing time. On the other hand, it assumes the process dynamics are linear and can
only handle unimodal noise. While pre-processing the image, the solution only focuses on
minimizing the MSE rate and does not consider the edge-area enhancement, where a small
tumour might be present.. This limitation can be [4] for noise reduction and edge smoothing of
the image. Similarly, the use of the sigmoid function as an activation function causes a high
risk of data overfitting and the gradient vanishing problem. Moreover, it can only handle the
binary classification and has no support for the multi-class classification. However, introduc-
ing ReLU as an activation function with a modified loss function [1] can increase the detection
accuracy by reducing the risk of data overfitting and supporting multi-class classification. The
proposed system consists of four major stages as shown in (Fig. 3) pre-processing, segmen-
tation and feature extraction, feature selection, and classification.

Table 1 Cancer image patch refinement deep learning algorithm.


Multimedia Tools and Applications

Table 2 Proposed modified loss function with ReLU

Pre-processing stage The collected MRI images taken from various freely available data-
bases were pre-processed using the median filter before using them as input for the proposed
deep learning model. A single median filter with a window size of 3 × 3 is used that runs
through the elements of the images and replaces each pixel with the neighbouring pixel’s
median in a square region around the evaluated pixel [4]. This enhances the image quality and
smooths the edge area for better detection of the tumorous region. This stage will also build a
strong foundation for the feature extraction phase (Fig. 4).

Segmentation and feature extraction stage In this stage, the ROI is detected based on the
boundary and is marked for a single image patch. A grey-scale intensity value is defined and
used to segment the region into interesting and uninteresting areas based on the pixel intensity.
The pixels in the image can be reassigned with the intensity values of either 0 (for uninter-
esting) or 1 (for interesting). It classifies the region as either cancerous or benign, and the
regions with no boundaries are considered normal tissue.

Feature selection stage In this stage, the extracted features are selected for classification.
While selecting the features, the intensity values of the image segments along with the textural

Table 3 Accuracy and processing time for region of interest segmentation of Oral tumour
Multimedia Tools and Applications

Fig. 2 Flowchart of cancer image


patch refinement deep learning
algorithm
Multimedia Tools and Applications

Fig. 3 Block diagram of proposed system, [The blue borders demonstrate the good features, and the green border
refers to the new parts in our proposed system]

information about spatial and spectral evidence are considered. The volumetric pixel (voxel) in
the image is considered to identify the multidimensional frequency of different bandwidths in
the image segment.

Classification stage In this stage, the proposed modified loss function algorithm is used to
classify the oral tumour from the selected feature vectors. The deep CNN consists of the input
layer, two functional layers, and one fully connected layer. The feature vectors of image
patches are present in the input layer. The Convolutional layer consists of multiple 5 × 5
kernels and uses the ReLU activation function with the modified loss function, which is
derived by combining the MSE rate with the cross-entropy loss function [18]. The pooling
layer uses the max filter for max pooling. The fully connected layer has a 1 × 1 kernel of
softmax prediction to classify the image region as either a normal or benign tumour or a
cancerous malignant tumour.

3.1 Proposed equation

In the proposed solution, the cross-entropy loss functions along with the MSE rate have been
considered to improve the accuracy of the prediction while minimising the processing time.
The fully connected 1 × 1 kernel output layer of the soft max prediction is combined with the
cross-entropy function. The cross-entropy loss function helps to reduce the overfitting effect of
Multimedia Tools and Applications

Fig. 4 Flowchart of proposed


modified loss function algorithm

the dataset, whereas the MSE rate shows the difference between the predicted value and the
actual observations [18].The modified model function combined with the entropy loss function
with the MSE rate is shown in Eq. (7).
Equation (1) has modified by us to Eq. (2) by removing the current activation function from
the equation. A modified activation function with loss value calculation will be applied later.

N
ME ¼ ∑ θ ð2Þ
k¼1

Where,
Multimedia Tools and Applications

N is the number of iteration.


k is the index of the training patch.
θ is the matching function.
[18] mentioned cross entropy loss function as shown in equation (). This equation gives the
cross entropy loss during the classification stage.

  h     i
L y; by ¼ − ∑ yi log ybi −ð1−yi Þ 1−log ybi ð3Þ
j

Where,

y is the spatial constrained layer.


by is the predicted output of the spatial constrained layer.
byi is the ith element of the predicted output.
yi is the ith element of spatial constrained layer.
We modified Eq. (3) to Eq. (4) by removing the sigma function, as we only want the log
function in the equation for our proposed model.

h     i
ML ¼ yi log ybi −ð1−yi Þ 1−log ybi ð4Þ

Where,

y is the spatial constrained layer.


by is the predicted output of the spatial constrained layer.
byi is the ith element of the predicted output.
yi is the ith element of spatial constrained layer.
[1] mentioned Mean Square Error (MSE) as shown in Eq. 5. This function gives the
regression loss in the Convolutional layer.

1 N  2
MSE ¼ ∑i; j¼1 bxði; jÞ−xði; jÞ ð5Þ
N
Where,

x(i, j) is the original image.


N is the number of elements present in x(i, j).
The modified loss is combined with mean square error rate to minimize the both the
regression and classification loss in our model function. It is shown in Eq. (6). We derived a
modified loss function by combining Eq. (4) and Eq. (5). This modified loss function gives the
overall loss in the model.

ML0 ¼ ML þ MSE ð6Þ


Where,
Multimedia Tools and Applications

y is the spatial constrained layer.


by is the predicted output of the spatial constrained layer.
byi is the ith element of the predicted output.
yi is the ith element of spatial constrained layer.
x(i, j) is the original image.
N is the number of elements present in x(i, j).
Our final proposed is now Eq. (7) as given below.

ME0 ¼ ∑Nk¼1 θ*ML’ ð7Þ

Where,

θ is the matching function.


ML’ is the modified loss function.

3.2 Area of improvement

The state-of-the-art solution has used the sigmoid function as an activation function in the
Convolutional layer of the regression-based partitioned deep Convolutional layer followed by
the max-pooling layer [16]. This solution of this paper proposed a modified version of the loss
function with the rectified linear unit (ReLU) activation function during the implementation of
the CNN which is a combination of the cross-entropy loss function and the MSE rate. This
modified loss is integrated to the model function proposed by the state of the art system as
given in Eq. (2). This will enable the multi-class classification of the image and reduce the
overfitting of the training sample. It also reduces the progressive size of the second pooling
layer by reducing the nonlinear down sampling, which makes the neural network faster. It will
also solve the vanishing gradient problem caused by the sigmoid function, making the learning
process faster. Moreover, the use of the median filter during the pre-processing stage will
enhance and smooth the collected MRI images and remove unwanted noise that can hamper
the overall performance of the proposed algorithm.

3.3 Why ReLU with the modified loss function?

The proposed deep learning model consists of neural networks with Convolutional layers,
max-pooling layers, and fully connected output layers with a softmax prediction with various
parameters set in a single hidden layer. This is facilitated by the ReLU activation function with
a modified loss in the Convolutional layer that significantly boosts the performance of the
network [30]. Moreover, this will enable the multi-class classification of the image and reduce
the over fitting of the training sample. It also reduces the progressive size of the second pooling
layer by reducing the nonlinear down sampling, which makes the neural network faster. It will
also solve the vanishing gradient problem caused by the sigmoid function, making the learning
process faster. Sigmoid nonlinearly compresses the positive values to a range between 0 and 1,
while ReLU is linear for positive values. It is more efficient for computing values than the
sigmoid function because it just needs to pick the max (0, x) rather than to perform expensive
exponential functions, such as the sigmoid function. The modified loss function, which
Multimedia Tools and Applications

combines the MSE rate with cross entropy, helps identify the regression loss in the
Convolutional layer and the classification loss in the output layer. This ultimately aids in
improving the accuracy of the overall model. ReLU activation function is used to minimize the
likelihood of gradient to vanish and prevent over fitting of data while training the neural
network. This is derived by combining the MSE rate with the cross-entropy loss function.
As discussed in the literature review section in this paper, most of the solutions applied the
deep learning model for detecting and classifying oral tumours with certain limitations. Most
of these solutions have failed to address the problem of the vanishing gradient and the
overfitting of the data while training the neural network. Our proposed solution has solved
these issues to increase the accuracy in detection by applying a modified loss with the ReLU as
the activation function.

4 Results and discussion

Python 3.7 with the libraries Keras and matplotlib were used for the implementation of the proposed
deep learning algorithm. This solution uses 3045 images of T1-weighted contrast-enhanced oral
tumours from 230 patients taken from four different datasets. In addition, 20% of the total dataset
was reserved for testing purposes. All the images in the dataset had varying sizes and colour
intensities. The obtained images were originally in (350 × 350) resolutions, which were later rotated
and resized to (175 × 175) before feeding them into the neural networks.
The K-fold cross-validation technique was used for validation, where k has the value of 7. The
sample datasets were shuffled randomly and split into datasets of k size each. Among the unique
subsamples, a group was chosen as the test dataset and the remaining groups were the training
datasets. The sample dataset was freely available and downloaded from skymind.ai [14].
To construct and implement the proposed CNN, the open-source neural network library
Keras was used in the Tensor Flow environment. The experiment was conducted on a
computer with a 2.0 GHz Intel Core i5 fifth generation processor with 8.0 GB of RAM. This
experiment uses images from four different kinds of oral tumours: carcinoma, lymphoma,
osteo sarcoma, and fibro sarcoma. These images from different categories have been tested and
tabulated in Tables 4, 5, 6, and 7.
In Fig. 7, L2 is the second layer (max pooling) of the model, which produced a 13 × 13
feature map. The ReLU activation function with modified loss served as the activation function
for the max-pooling layer. The selected features from the max-pooling layer were fed into the
fully connected output layer, which has a (1 × 1) kernel of soft max prediction. This layer
produced a three-way classification of the image, which determined the normal region, benign
tumour, and cancerous malignant tumour. The classification of the sample images from the
training and testing samples are presented in Tables 4, 5, 6, and 7.
The mean and standard deviation for of the sample data from four different types
of oral tumour were calculated using the AVERAGE() and STDDEVA() function of
Microsoft Excel. For the state of the art solution, the mean and the standard deviation
value of the accuracy for the carcinoma is 94.5% and 0.0096, lymphoma is 93.8%
and 0.0105, osteosarcoma is 94.2% and 0.0115 and fibro sarcoma is 93.2% and
0.0045. Similarly, for the proposed solution, the mean and the standard deviation
value of the accuracy for the carcinoma is 96.9% and 0.0058, lymphoma is 95.9%
and 0.0054, osteosarcoma is 96.8% and 0.0048 and fibro sarcoma is 95.6% and
0.0038. The standard deviation has calculated by Eq. (8).
Multimedia Tools and Applications

Table 4 Accuracy and processing time for classification of oral squamous carcinoma

Table 5 Accuracy and processing time for classification of oral lymphoma


Multimedia Tools and Applications

Table 6 Accuracy and processing time for classification of oral osteosarcoma

Table 7 Accuracy and processing time for oral fibro sarcoma


Multimedia Tools and Applications

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
jx−X j
σ¼ ∑ ð8Þ
n
σ standard deviation.
x sample.
X mean of the sample.
n total number of samples.
During the feature extraction stage, the underlying features in the images are automatically
extracted using the Region of Interest technique, which helps in differentiating the cancerous
region from the normal region in the image. (See Figs. 5 and 6).
The true positive region predicted is fed into the input layer of the Convolutional neural network
for classification. The Convolutional neural network produces 25 × 25 feature map output of the
supplied image, which serves as the input of the max-pooling layer. See Figs. 7, 8 and 9.
In Fig. 7, L2 is the second layer (max pooling) of the model, which produces 13X13 feature
map. ReLU activation function with modified loss serves as the activation function for the
max-pooling layer. The selected features from the max-pooling layer are then fed into the fully
connected output layer, which has 1 × 1 kernel of softmax prediction. This layer produces
three-way classification of the image which determines normal region, benign tumour and
cancerous malignant. The classification of sample images from the training and testing
samples are presented in the Tables 4, 5, 6 and 7.
To draw a comparison between the state-of-the-art solution and proposed solution, tables
and graphs were used. The classification result of the sample images from the training and
testing samples were evaluated and shown in Tables 4, 5, 6, and 7. The final results of the
classification are presented in terms of the accuracy and processing time. The accuracy in the
classification was measured based on the probability score of each of the labelled images after
classification. Similarly, the processing time denotes the time taken to actually classify the
image after it has been input in the deep learning model. The input image that consists of
various kinds of oral tumours was classified as a normal region, benign tumour, or cancerous

Fig. 5 MRI Image of oral


squamous carcinoma (OSC)
Multimedia Tools and Applications

Fig. 6 Enlarged view of the image region segmented by the region of interest (ROI)

malignant tumour. The overall classification accuracy and processing time were decided by
calculating the average of the accuracy and processing time for each sample in the dataset.
The results obtained after the classification stage of the proposed solution were compared to
the results of the state-of-the-art solution. It was found that the proposed solution enhanced the
overall accuracy of the classification while decreasing the processing time by employing the
combination of the MSE rate and cross-entropy loss function in the activation function. This
solves the gradient vanishing problem created by the sigmoid function and decreases the
overfitting of the data, resulting in the reduction of processing time.
The processing time and accuracy were calculated by using the now () method and
evaluate() method of the Keras package in python respectively. The evaluate method first
predicts the output for the given input and then computes the specified model functions based
on the true predicted value returning the accuracy as the output. Similarly, start time and end
time is calculated using now () method and the execution time is processed by subtracting the
end time from the start time.
The results obtained after implementing the proposed algorithm illustrate the improvement
in the detection and classification accuracy and the processing time in comparison to the state-
of-the-art solution based on feature extraction and image classification. The proposed algo-
rithm improved the classification accuracy by almost 2%, and the processing time was reduced
by 30–40 milliseconds. The probability score of the labelled data in each class of the dataset
was used to calculate the classification accuracy, whereas the actual execution time was used
to measure the processing time. The degree of improvement in the classification accuracy and
processing time was quantified by running the state-of-the-art and proposed algorithms.

L2-Map1 L2-Map2 L2-Map3

Fig. 7 Feature extraction in L2 layer of the max-pooling layer of CNN


Multimedia Tools and Applications

98

97
Accuracy (in percentage)

96

95

94 State of Art
Proposed Solution
93

92

91
Squamous Lymphoma Osteocarcoma Fibro Sarcoma

Oral tumour type


Fig. 8 Average Accuracy results for the state of art and proposed solution in percentage. The red color denotes
the accuracy of the proposed solution and the blue color denotes the accuracy of the state of the art solution. a)
First couple of bar graphs indicates the average accuracy of the squamous carcinoma. b) Second couple of bar
graphs denotes the average accuracy of lymphoma. c) Third couple of bar graphs denotes the average accuracy of
osteosarcoma. b) Fourth couple of bar graphs denotes the average accuracy of fibro sarcoma

Use ReLU function over the sigmoid function as an activation function allows better
training of the neural network because it provides linearity for the positive values and has a
reduced likelihood of a vanishing gradient. We can stack many hidden layers in the neural

Average processing time in seconds


0.43
0.42
Processing Time (in seconds)

0.41
0.4
0.39
0.38
0.37 State of Art
0.36 Proposed Solution
0.35
0.34
0.33
Squamous Lymphoma Osteocarcoma Fibro Sarcoma

Oral tumour type


Fig. 9 Average processing time results for the state of art and proposed solution in percentage. The red color
denotes the processing time of the proposed solution and the blue color denotes the accuracy of the state of the art
solution. a) First couple of bar graphs indicates the average processing time of the squamous carcinoma. b)
Second couple of bar graphs denotes the average processing time of lymphoma. c) Third couple of bar graphs
denotes the average processing time of osteosarcoma. b) Fourth couple of bar graphs denotes the average
processing time of fibro sarcoma
Table 8 Comparison table between state of art and proposed solutions
Multimedia Tools and Applications

Proposed Solution State of art technique

Name of the Solution Enhanced Modified Loss Function for Early Detection of Oral Tumour Oral Cancer Image Patch Refinement Deep Learning Algorithm
Accuracy 96.5% 94.5%
Processing Time 0.384 s 0.428 s
Proposed Equation The modified model function is: [16] gave model function as:
ME0 ¼ ∑Nk¼1 θ * ML’ N
E ¼ ∑ θ ðyi ; xi Þ
k¼1
Contribution 1 Median filter is using during the pre-processing stage to Does not consider using any filtering
enhance the image, remove unwanted noise and smooth technique to remove the noise in the image.
edge area for better tumour detection.
Contribution 2 ReLU activation function is used to minimize the likelihood Used sigmoid function that causes vanishing
of gradient to vanish and prevent over fitting of data while training the neural network. gradient and performs expensive exponential operations,
which can slow down the system performance.
Contribution 3 Modified loss function is calculated which combines both the Considers calculating empirical loss only.
regression based loss and classification loss in the network
to minimize the error in prediction.
Multimedia Tools and Applications

network without causing overfitting of the data. This increases the efficiency in training the
model with different features, ultimately increasing the accuracy during image classification.
Moreover, the modified loss function, which is a combination of a regression-based and
classification loss function, keeps track of errors encountered during any phase. Only the
outputs from the neurons with a low error rate are considered for classification to reduce the
processing time. In conclusion, combining the ReLU function with the modified loss functions
improved the oral tumour detection and classification, while reducing the processing time by
30–40 milliseconds and increasing the accuracy by 2%.
Various image pre-processing techniques along with feature extraction, feature selection,
and classification have been implemented to detect and classify different types of oral tumours.
The neural networks have been continuously refined and reengineered to improve the accuracy
and processing time. The limitations in the state-of-the-art solution have been improved in the
proposed solution to obtain an improved accuracy of 96.5% against 94.5% and a reduced
processing time of 0.384 s against 0.428 s. The proportions of true positive results were
considered in order to calculate the accuracy. True positives refer to the correctly identified
total positives, whereas true negative refer to the correctly identified total negatives. The use of
the ReLU function with a modified loss function that minimises the risk of overfitting played a
crucial role in acquiring a highly accurate result. In comparison to the state-of-the-art solution,
the proposed system achieved highly improved accuracy while reducing the processing time
when it was applied to different image groups with varying features.

5 Conclusions and future work

It is crucial to detect and classify oral tumours in the early stages to give the specialists ample
time to build an appropriate treatment plan for the patient. Several deep learning-based
prediction approaches have been proposed and implemented to accurately predict and classify
oral tumours. Each of these solutions has certain limitations that affect the accuracy and
processing time. The purpose of this research is to enhance the accuracy of oral tumour
prediction and classification while reducing the overall processing time. The modified loss
function has been derived by combining the MSE rate with the cross-entropy loss function.
The ReLU activation function with the modified loss function in the Convolutional layer
supports better training of the model and reduce the risk of data overfitting. Only the neurons
with minimal loss are selected for classification, which decreases the computational time and
increases the overall accuracy of the model. Therefore, the accuracy improved by almost 2%,
and the processing time reduced by 30–40 milliseconds. See Table 8. In the future, large
datasets with different types of oral tumours with varying features can be used for training and
implementing the proposed solution. The images can be enhanced using the latest enhance-
ment technique to further aid in refining the images before inputting them into the neural
network. The new and improved version of the ROI algorithm can be used for feature
extraction and feature selection to amplify the performance of the proposed solution.

References

1. Al-Ma’aitah M, Ali AlZubi A (2018) Enhanced Computational Model for Gravitational Search Optimized
Echo State Neural Networks Based Oral Cancer Detection. J Med Syst 42(11):205
Multimedia Tools and Applications

2. Alsmadi MK (2018) A hybrid fuzzy C-means and Neutrosophic for jaw lesions segmentation. Ain Shams
Eng J 9(4):697–706
3. Anders S, Huber W (2010) Differential expression analysis for sequence count data. Genome Biol 11(10):
106
4. Anter AM, Ella HA (2018) CT liver tumor segmentation hybrid approach using neutrosophic sets, fast
fuzzy c-means and adaptive watershed algorithm. Artif Intell Med 15(1):157
5. Aubreville M, Knipfer C, Oetter N, Jaremenko C, Rodner E, Denzler J, Bohr C, Neumann H, Stelzle F,
Maier A (2017) Automatic classification of cancerous tissue in Laserendomicroscopy images of the Oral
cavity using deep learning. Sci Rep 7(1):11979–11979
6. Bur AM et al Machine learning to predict occult nodal metastasis in early oral squamous cell carcinoma.
Oral Oncol 92:20–25
7. Bura AM et al (2019) Machine learning to predict occult nodal metastasis in early oral squamous cell
carcinoma. Oral Oncol 92(1):20–25
8. Das DK, Chakraborty C, Sawaimoon S, Maiti AK, Chatterjee S (2015) Automated identification of
keratinization and keratin pearl area from in situ oral histological images. Tissue Cell 47(4):349–358
9. Das DK, Bose S, Maiti AK, Mitra B, Mukherjee G, Dutta PK (2018) Automatic identification of clinically
relevant regions from oral tissue histological images for oral squamous cell carcinoma diagnosis. Tissue
Cell 53:111–119
10. De Silva RK, Siriwardena BSMS, Samaranayaka A, Abeyasinghe WAMUL, Tilakaratne WM (2018) A
model to predict nodal metastasis in patients with oral squamous cell carcinoma," (in eng). PLoS One 13(8):
e0201755–e0201755
11. de Souza Tolentino E, Centurion BS, Ferreira LHC, de Souza AP, Damante JH, Rubira-Bullen IRF (2011)
Oral adverse effects of head and neck radiotherapy: literature review and suggestion of a clinical oral care
guideline for irradiated patients. J Appl Oral Sci 19(5):448–454
12. Halicek M, Lu G, Little JV, Wang X, Patel M, Griffith CC, el-Deiry MW, Chen AY, Fei B (2017) Deep
convolutional neural networks for classifying head and neck cancer using hyperspectral imaging. J Biomed
Opt 22(6):60503–60503
13. Halicek M, Little JV, Wang X, Chen AY, Fei B (2019) Optical biopsy of head and neck cancer using
hyperspectral imaging and convolutional neural networks. J Biomed Opt 24(03):1
14. Health and Biology Data (2018) SkyMind, Ed., ed
15. Jain DK, Dubey SB, Choubey RK, Sinhal A, Arjaria SK, Jain A, Wang H (2018) An approach for
hyperspectral image classification by optimizing SVM using self organizing map. J Comput Sci 25(1):
252–259
16. Jeyaraj P, Nadar ERS (2019) Computer-assisted medical image classification for early diagnosis of oral
cancer employing deep learning algorithm. J Cancer Res Clin Oncol 145(4):1–9
17. Li H, Huang Y, Zhang Z (2017) An improved faster R-CNN for same object retrieval. IEEE 5:13665–
13676
18. Li H et al (2018) An improved deep learning approach for detection of thyroid papillary cancer in ultrasound
images. Sci Rep 8(1):6600
19. Liang S, Tang F, Huang X, Yang K, Zhong T, Hu R, Liu S, Yuan X, Zhang Y (2019) Deep-learning-based
detection and segmentation of organs at risk in nasopharyngeal carcinoma computed tomographic images
for radiotherapy planning. Eur Radiol 29(4):1961–1967
20. Lu G, Fei B (2014) Medical hyperspectral imaging: a review. J Biomed Opt 19(1):10901–10901
21. Mohammed MA, Ghani MKA, Hamed RI, Ibrahim DA (2017) Review on nasopharyngeal carcinoma:
concepts, methods of analysis, segmentation, classification, prediction and impact: a review of the research
literature. J Comput Sci 21(1):283–298
22. Obermeyer Z, Emanuel EJ (2016) Predicting the future — big data, machine learning, and clinical medicine.
N Engl J Med 375:1216–1219
23. Oetter N et al (2016) Development and validation of a classification and scoring system for the diagnosis of
oral squamous cell carcinomas through confocal laser endomicroscopy. J Transl Med 14(1):159
24. Poedjiastoeti W, Suebnukarn S (2018) Application of convolutional neural network in the diagnosis of jaw
tumors. HealthCare Informat Res 24(3):236–241
25. Sharma N, Om H (2014) Extracting significant patterns for oral cancer detection using apriori algorithm.
Intell Inf Manag 6(2):30–37
26. Shi M, Zhang B (2011) Semi-supervised learning improves gene expression-based prediction of cancer
recurrence. Bioinformatics 27(21):3017–3023
27. Shin H-C, Roth HR, Gao M, Lu L, Xu Z, Nogues I, Yao J, Mollura D, Summers RM (2016) Deep
convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and
transfer learning. IEEE Trans Med Imaging 35(5):1285–1298
Multimedia Tools and Applications

28. Tong L-I, Chang Y-C, Lin S-H (2011) Determining the optimal re-sampling strategy for a classification
model with imbalanced data using design of experiments and response surface methodologies. Expert Syst
Appl 38(4):4222–4227
29. Xiao Y, Wu J, Lin Z, Zhao X (2018) A deep learning-based multi-model ensemble method for cancer
prediction. Comput Methods Prog Biomed 153(C):1–9
30. Xiao Y, Wu J, Lin Z, Zhao X (2018) A semi-supervised deep learning method based on stacked sparse auto-
encoder for cancer prediction using RNA-seq data. Comput Methods Prog Biomed 166(1):99–105
31. Yuan X, Xie L, Abouelenien M (2018) A regularized ensemble framework of deep learning for cancer
detection from multi-class, imbalanced training data. Pattern Recogn 77(1):160–172

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.

You might also like