You are on page 1of 14

healthcare

Article
A Deep Learning Approach for Brain Tumor Classification
and Segmentation Using a Multiscale Convolutional
Neural Network
Francisco Javier Díaz-Pernas, Mario Martínez-Zarzuela , Míriam Antón-Rodríguez
and David González-Ortega *

Department of Signal Theory, Communications and Telematics Engineering, Telecommunications Engineering


School, University of Valladolid, 47011 Valladolid, Spain; pacper@tel.uva.es (F.J.D.-P.);
marmar@tel.uva.es (M.M.-Z.); mirant@tel.uva.es (M.A.-R.)
* Correspondence: davgon@tel.uva.es; Tel.: +34-983-423-000 (ext. 5552)

Abstract: In this paper, we present a fully automatic brain tumor segmentation and classification
model using a Deep Convolutional Neural Network that includes a multiscale approach. One of
the differences of our proposal with respect to previous works is that input images are processed
in three spatial scales along different processing pathways. This mechanism is inspired in the
inherent operation of the Human Visual System. The proposed neural model can analyze MRI images
containing three types of tumors: meningioma, glioma, and pituitary tumor, over sagittal, coronal,
and axial views and does not need preprocessing of input images to remove skull or vertebral
 column parts in advance. The performance of our method on a publicly available MRI image

dataset of 3064 slices from 233 patients is compared with previously classical machine learning and
Citation: Díaz-Pernas, F.J.; deep learning published methods. In the comparison, our method remarkably obtained a tumor
Martínez-Zarzuela, M.;
classification accuracy of 0.973, higher than the other approaches using the same database.
Antón-Rodríguez, M.;
González-Ortega, D. A Deep
Keywords: brain tumor classification; deep learning; convolutional neural network; multiscale
Learning Approach for Brain Tumor
processing; data augmentation; MRI
Classification and Segmentation
Using a Multiscale Convolutional
Neural Network. Healthcare 2021, 9,
153. https://doi.org/10.3390/
healthcare9020153 1. Introduction
Automatic segmentation and classification of medical images play an important
Academic Editor: Tin-Chih Toly Chen role in diagnostics, growth prediction, and treatment of brain tumors. An early tumor
Received: 31 December 2020 brain diagnosis implies a faster response in treatment, which helps to improve patients’
Accepted: 31 January 2021 survival rate. Location and classification of brain tumors in large medical images databases,
Published: 2 February 2021 taken in routine clinical tasks by manual procedures, have a high cost both in effort
and time. An automatic detection, location, and classification procedure is desirable and
Publisher’s Note: MDPI stays neutral worthwhile [1].
with regard to jurisdictional claims in There are several medical imaging techniques used to acquire information about
published maps and institutional affil-
tumors (tumor type, shape, size, location, etc.), which are needed for their diagnosis [2].
iations.
The most important techniques are Computed Tomography (CT), Single-Photon-Emission
Computed Tomography (SPECT), Positron Emission Tomography (PET), Magnetic Res-
onance Spectroscopy (MRS), and Magnetic Resonance Imaging (MRI). These techniques
can be combined to obtain more detailed information about tumors. Anyhow, MRI is the
Copyright: © 2021 by the authors. most used technique due to its advantageous characteristics. In MRI acquisition, the scan
Licensee MDPI, Basel, Switzerland. provides hundreds of 2D image slices with high soft tissue contrast using no ionizing
This article is an open access article radiation [2]. There are four MRI modalities used in diagnosis: T1-weighted MRI (T1),
distributed under the terms and
T2-weighted MRI (T2), T1-weighted contrast-enhanced MRI (T1-CE), and Fluid Atten-
conditions of the Creative Commons
uated Inversion Recovery (FLAIR). Each MRI modality produces images with different
Attribution (CC BY) license (https://
tissue contrasts; thus, some are more suited to search a specific kind of tissue than others.
creativecommons.org/licenses/by/
T1 modality is typically used to work with healthy tissues. T2 images are more appropriate
4.0/).

Healthcare 2021, 9, 153. https://doi.org/10.3390/healthcare9020153 https://www.mdpi.com/journal/healthcare


Healthcare 2021, 9, 153 2 of 14

to detect borders of edema regions. T1-CE images highlight tumor borders and FLAIR
images favor the detection of edema regions in Cerebrospinal Fluid [3]. Provided that the
goal of MRI image processing is to locate and classify brain tumors, T1-CE modality is
adequate and, as it is shown in this paper, sufficient.
In the last decades, Brain Tumor Imaging (BTI) has grown at an exponential rate.
More specifically, the number of works about brain tumor quantification based on MRI
images has increased significantly [2]. Brain Tumor Segmentation (BTS) consists in differen-
tiating tumor-infected tissues from healthy ones. In many BTS applications, the brain tumor
image segmentation is achieved by classifying pixels, thus the segmentation problem turns
into a classification [4].
The aim of the work presented in this paper is to develop and test a Deep Learning
approach for brain tumor classification and segmentation using a Multiscale Convolutional
Neural Network. To train and test the proposed neural model, a T1-CE MRI image dataset
from 233 patients, including meningiomas, gliomas, and pituitary tumors in the common
views (sagittal, coronal, and axial), has been used [5]. Figure 1 shows examples of these
three types of tumors. Additional information on the dataset is included in Section 2.2.
Our model is able to segment and predict the pathological type of the three kinds of brain
tumors, outperforming previous studies using the same dataset.

Figure 1. Examples of MRI images of the T1-CE MRI image dataset. Left: coronal view of a meningioma tumor. Center:
Axial view of a glioma tumor. Right: sagittal view of a pituitary tumor. Tumor borders have been highlighted in red.

In the BTS field, two main tumor segmentation approaches can be found: generative
and discriminative. Generative approaches use explicit anatomical models to obtain the
segmentation, while discriminative methods learn image features and their relations using
gold standard expert segmentations [2]. Published studies following the discriminative
approach have evolved from using classical Machine Learning [6–9] to more recent Deep
Learning techniques [10–14].
In works using classical Machine Learning techniques, the segmentation pipeline in-
cludes a preprocessing stage designed for feature extraction. For example, in Sachdeva et al. [6],
a semi-automatic system obtained the tumor contour, and 71 features were then computed
using the intensity profile, the co-occurrence matrix, and Gabor functions. In addition,
skull stripping is a common preprocessing stage in classical discriminative approaches,
although it presents drawbacks such as the selection of parameters or the need of prior
information about the images and a high computational time [15]. The extracted features
are used as the input for a classification or a segmentation stage. In the aforementioned
work [6], two classifiers, SVM (Support Vector Machine) and ANN (Artificial Neural
Network), were compared. The obtained accuracy ranged from 79.3% to 91.7% for SVM
and from 75.6% to 94.9% for ANN. Analogously, Sharma and Chhabra [7] proposed an
automatic brain tumor classification system using 10 features and a Back Propagation
Network as the classifier, and they reached an accuracy of 95.3%. Iftekharuddin et al. [8]
explored using fractal wavelet features as input to a SOM (Self Organizing Map) classifier
Healthcare 2021, 9, 153 3 of 14

and achieved an average precision of 90%. Other examples of tumor classification pipelines
use instance-based learning: Havaei et al. [9] developed a semi-automatic system based
on a kNN classifier. They used the well-known BRATS 2013 dataset and obtained the
second-best score over the complete and core tests, reporting Dice similarities of 0.85 and
0.75 for complete and core tumor regions, respectively.
In the last years, the next step in the evolution of machine learning techniques has been
to allow computers to discover the features that optimally represent the data. This concept
is the foundation of Deep Neural Networks [4], which transform the problems to solve
from feature-driven into data-driven. Within deep neural networks, Convolutional Neural
Networks (CNN) [16] and Fully Convolutional Networks (FCN) [17,18] have been applied
to a broad range of applications. In particular, they are widely used today for generic image
processing [19] and specifically in medical image analysis [1,4,20].
The advent of Deep Learning has had a tremendous impact in the design of medical
imaging systems, and newer techniques have been widely adopted by the researchers over
the last few years [21]. The advances in the field have opened a very exciting time for
collaboration of machine learning scientists and radiologists [22]. Havaei et al. [10] explored
the use of a fully convolutional system for the BRATS 2013 dataset, just after their kNN
classifier proposal [9], commented before. Most of the recent brain tumor classification
studies using Deep Learning are based on CNNs or FCNs. Moeskops et al. [11] used a
multiscale CNN to segment brain tissues and white matter hyperintensities. They used
images from the MRBrainS13 challenge [12] to evaluate their CNN architecture and reached
a precision of about 85%. One of the most significant advances in the field was achieved by
Ronneberger et al. [13], who proposed an interesting CNN named U-net. They used this
CNN architecture to segment neural structures in electron microscopy images from the
ISBI challenge [14]. U-net is composed by 5 convolutional stages and 5 deconvolutional
stages shaping a U, from where it takes its name. They obtained the best results in the ISBI
challenge [14].
However, all the previously cited studies have in common that they aimed to segment
areas belonging to brain tumors without the need to also classify these areas as belonging to
different brain tumor types. As explained above, in our study the objective was to develop
a new Deep Learning model for both tumor segmentation and classification. A relevant
previous related work is the study of Mohsen et al. [23], who proposed a deep learning
classifier combined with a discrete wavelet transform (DWT) and principal components
analysis (PCA) to classify a dataset with three different brain tumors. Four other deep
learning relevant studies following the same purpose have in common the use of the
same dataset that we adopted in this work, which is important to contrast and endorse
the performance results of the proposed model presented here. Abiwinanda et al. [24]
proposed a two-layer CNN architecture. Pashaei et al. [25] proposed two different methods:
in the first method, a CNN model was used for classification, and in the second method,
the features extracted by CNN were the inputs of a KELM method. The KELM method is
a learning algorithm that consists of layers of hidden nodes. Sultan et al. [26] proposed
a CNN network including 16 convolution layers. Finally, Anaraki et al. [27] presented a
hybrid method combining the use of CNNs and genetic algorithm (GA) criteria to improve
the network architecture.
In this paper, we present a fully automatic CNN-based segmentation and classification
method of MRI images for three types of tumor: meningioma, glioma, and pituitary
tumors. Unlike previous CNN-based works [10,23–27], our neural network includes three
processing pathways to deal with the information in three spatial scales. Previous research
conducted by our research group and inspired in the inherent multiscale operation of the
Human Visual System (HVS) [28] led us to propose this multiscale processing strategy
for tumor classification on the assumption that a multiscale approach would be effective
extracting discriminant texture features of different kinds of tumors. The proposed model is
very coherent with the HVS processing. In visual stimuli processing, the HVS works in two
main modes: pre-attentive and attentive vision. Pre-attentive mode is instantaneous and
Healthcare 2021, 9, 153 4 of 14

parallel, covering wide areas of the vision field, while attentive processing acts over limited
regions of the visual field, establishing a serial search through focused attention [29]. In this
processing, simple cells in the V1 area of the visual cortex in the brain filter the stimuli
coming from the LGN (Lateral Geniculate Nucleus), through frequency and bandwidth
filtering. After that filtering, the processed features are concatenated in the inferotemporal
area during the cognitive process [30]. In addition, our method uses a sliding window
technique differing from U-net [13] and other methods, in which the full image is used as
the input.
The rest of the paper is organized as follows: Section 2 introduces the proposed CNN
model, describes the adopted dataset and how it was used for training and measuring the
performance of the neural network. Later, Section 3 presents the experimental results and
discusses these results including a performance comparison with other methods. Finally,
Section 4 draws the main conclusions about the presented work.

2. Materials and Methods


2.1. Proposed Convolutional Neural Network and Implementation Details
In this paper, we propose a multi-pathway CNN architecture (see Figure 2) for tumor
segmentation. The CNN architecture processes an MRI image (slice) pixel by pixel covering
the entire image and classifying each pixel using one of four possible output labels: 0—
healthy region, 1—meningioma tumor, 2—glioma tumor, and 3—pituitary tumor.

Figure 2. The proposed Convolutional Neural Networks (CNN) architecture. Input: 1 × 65 × 65 slid-
ing windows. Model: Three pathways (large, medium, and small feature scales) with 2 convolutional
layers and max-pooling, a convolutional layer with concatenation of the three pathways, and a fully
connected stage that leads to a classification in one out of the four possible output labels: 0—healthy
region, 1—meningioma tumor, 2—glioma tumor, and 3—pituitary tumor. A dropout mechanism
between the concatenation and fully connected stages is included.

In our approach, we use a sliding window, thus each pixel is classified according to a
N × N neighborhood or window, which is the input to our CNN architecture (see Figure 2).
Every window is processed through three convolutional pathways with three scale (large,
medium, and small) kernels that extract the features. In our implementation, we chose a
window of 65 × 65 pixels and kernels of size 11 × 11, 7 × 7 and 3 × 3 pixels, respectively.
The decision on the size of the windows was taken after preliminary configuration tests in
which also 33 × 33 px and 75 × 75 px window sizes were tested.
Healthcare 2021, 9, 153 5 of 14

Each pathway is composed by two convolutional stages with ReLU rectification and a
3 × 3 max-pooling kernel with a stride value of 2. The number of feature maps in the large,
medium, and small pathways is 128, 96, and 64, respectively. We propose the use of a larger
number of maps for larger scales with the assumption that the window features extracted
using different filtering scales help to define the three kinds of tumor to be classified.
Scale features from the three pathways are concatenated in a convolutional layer
with a 3 × 3 kernel with a ReLU activation function and a 2 × 2 max-pooling kernel with
a stride value of 2. The output of this stage enters a fully connected stage where 8192
concatenated scale features compose the classification method towards the four prediction
label types. In order to prevent overfitting, the model includes a dropout layer before the
fully connected layer. The last layer uses a softmax activation function.
The proposed CNN has been implemented using Pytorch™. The number of trainable
parameters in the neural network is near three million (2,856,932). All the tests have been
performed in a Linux environment with an Intel Core i7 CPU and an Nvidia GTX1080
T1-11GB GPU. The training process took 5 days, and the average prediction time per image
was 57.5 s.

2.2. Dataset and Data Preprocessing


In clinical settings, usually only a certain number of slices of brain CE-MRI with a large
slice gap, not a 3D volume, are acquired and available. It is difficult to build a 3D model
with such sparse data. Hence, the proposed method is based on 2D slices collected from
233 patients, acquired from Nanfang Hospital, Guangzhou, China, and General Hospital,
Tianjing Medical University, China, from 2005 to 2010 [5].
This dataset contains 3064 slices and includes meningiomas (708 slices), gliomas
(1426 slices), and pituitary tumors (930 slices) in the common views (sagittal, coronal,
and axial). Figure 1 shows examples of these three types of tumors. This dataset provides
also 5-fold cross-validation indices. By using this information, 80% (2452) of the images
are employed for training and 20% (612 images) are used for obtaining performance
measurements. The process is repeated 5 times.
The images have an in-plane resolution of 512 × 512 pixels with pixel size 0.49 × 0.49 mm2.
The slice thickness is 6 mm, and the slice gap is 1 mm. The tumor border was manually
delineated by three experienced radiologists. Every slice in the dataset has an information
structure attached containing the patient’s pid; the tumor type label (l gt ): 1 for meningioma,
2 for glioma, and 3 for pituitary tumor; the coordinates vector (x,y) of the points belonging
to the tumor border, and the tumor mask (Tij): a binary image where  the tumor
positions
contain a value of 1 and the healthy ones a value of 0. The pair l gt , Tij will be the
ground truth in the training process.
During training, data augmentation using an elastic transform [31] has been used to
prevent overfitting of the neural network. Figure 3 shows an example of this transformation.
Data augmentation procedure doubled the number of training images available on each
fold iteration up to 4904 images. A thorough process was conducted to extract 65 × 65 px
training examples from every image in the training dataset: 150 true positive window
examples and 325 true negative window examples per tumor. The pixels on these windows
were scaled using pixel standardization (zero mean and unit variance) across the entire
training dataset.
Healthcare 2021, 9, 153 6 of 14

Figure 3. Example of elastic transformation used in the data augmentation. Left: original slice. Right:
image transformed. In both images, the edges of the tumor have been highlighted in red.

2.3. Neural Network Training and Performance Measurements


The proposed CNN and performance measurements were obtained after a 5-fold
cross validation technique with the slice train/test subgroups specified in the dataset,
as explained before.
On each fold iteration, the proposed model was trained for a total of 80 epochs, using
a Stochastic Gradient Descent (SGD) optimizer with a starting learning rate of 0.005 and a
momentum coefficient of 0.9. In addition, we used a learning rate exponential decay every
20 epochs. The dropout parameter was set to 0.5.
Within each fold, performance of the model was tested over 612 images, using a
sliding window approach: every testing window of size 65 × 65 px enters the trained
CNN so that a predicted tumor type label l p , is obtained. Segmentation of the tumor
is achieved considering the region containing all the pixels identified as a given tumor
type. The previously stored global mean and standard deviation of pixels in the training
windows, computed for data preprocessing, are used to normalize the testing windows
before they enter the CNN.
Once all pixels (i, j) from the input slice are labeled, Pij (see Equation (1)), the classifi-
cation function can be calculated using Equation (2). The predicted label l p , which indicates
the tumor type in a slice, is determined from vector { f l , l = 1, 2, 3} in the classification
function.
 The classification
function calculates the relation between sizes of the predicted la-
bel,
 l, P ij == l (number of pixels with the predicted label l), and the complete prediction,
Pij > 0 (number of pixels with a predicted label belonging to any tumor). The predicted
label, l p , will belong to the label with the greatest relation in size that is above the minimal
relation defined by the confidence threshold, τc ∈ [0, 1].


 Pij, = 0, i f (i, j) is healthy position
 P
ij, = 1, i f (i, j) is meningioma tumor
Pij, (1)
 Pij,
 = 2, i f (i, j) is glioma tumor
Pij, = 3, i f (i, j) is pituitary tumor


 | Pij ==l |
> τc
fl = | Pij > 0| (2)
 0

argmax { f l > 0} i f { f l > 0} 6= {∅}
lp = (3)
−1 nonclassi f ied
where | .| is the size or pixel number (number of pixels in the slice that meet the condi-
tions inside).
For performance measurements, we compute the Confusion Matrix, together with the
Dice and Sensitivity scores [3]. In addition, with the aim of evaluating the precision of the
predicted tumor type, we defined the predicted tumor type ratio score, pttas. This index
Healthcare 2021, 9, 153 7 of 14


calculates the relation in size between the correctly labeled
 regions Pij = l gt and all
tumor
the tumor regions determined by our methodology Pij > 0 . In the pttas index, the ratio
is measured over the total size of the predicted tumor, while in the Sensitivity index the
ratio is calculated over the size of the ground truth tumor.
The evaluation metrics are calculated using the following Equations:

| P1 ∧ T1 | 2 TP
Dice( P, T ) = = (4)
(| P1 | + | T1 |)/2 2 TP + FP + FN

| P1 ∧ T1 | TP
Sensitivity( P, T ) = = (5)
| T1 | TP + FN
| P1 |
pttas = (6)
| P > 0|
where TP is the number of true positives, FP is the number of false positives, FN is the
number of false negatives, ∧ is the logical AND operation, | .| is the size or pixel number
(number of pixels in the slice that meet
 the conditions
inside), P1 represents the addition of
TP and
 FP (the
predicted positives Pij = l gt ), and T1 represents the addition of TP and
FN Tij = 1 .

3. Results and Discussion


As explained in the previous section, the neural network was tested using the 5-fold
cross validation train/test subgroups specified in the dataset. The quantitative results
obtained are displayed in the figures and tables below in this section.
Figure 4 shows the performance of our method for three slices. Figure 4—left—
corresponds to a slice with a meningioma tumor, Figure 4—center—shows a glioma tumor,
and
 Figure 4—right—a pituitary tumor. The images show the tumor segmentation obtained,
Pij > 0 . To mark them with different colors, color images were generated where the
predicted tumor region is filled with red color, the ground truth tumor region, Tij , is colored
green, and the intersection of those two regions appears in yellow.

Figure 4. Examples of results of the proposed method for three slices corresponding to meningioma, glioma, and pituitary
tumors, respectively. The images show the tumor segmentation: The region detected is shown in red while the ground truth
region is shown in green. As a result, the intersection region is shown in yellow.

The segmentation metrics obtained are shown in Table 1. The Dice index, which is in-
versely related to the number of false positives and false negatives extracted (see Equation (3)),
is the largest for images with meningioma, then with pituitary tumor, and the lowest with
glioma. Therefore, the proportion of false positives and false negatives is the lowest in
the prediction of meningiomas and is the largest in the prediction of gliomas. The Sen-
sitivity index, which computes the ratio between true positives and the ground truth
(see Equation (4)), led to the same ranking (the largest value for images with meningioma,
then with pituitary tumor, and the lowest value with glioma), although the value of this
index is closer for images with meningioma and pituitary tumor. The prediction of gliomas
Healthcare 2021, 9, 153 8 of 14

had the lowest relation of true positives with respect to the ground truth. On the contrary,
the pttas index is the largest for images with glioma, then with pituitary tumor, and the
lowest with meningioma. The values of this index, which measures the relation in size
between the correctly labeled tumor regions and all the predicted tumor regions in the
images, indicates that for gliomas our model misclassified a lower proportion of pixels as
belonging to the two other brain tumors than for meningiomas or pituitary tumors. It can
be observed that the different features of the three analyzed brain tumors and the terms
including in the segmentation metrics led to a different ranking in the segmentation of the
three tumors. The values obtained for all the segmentation metrics are remarkable with
average values of Dice = 0.828, Sensitivity = 0.940, and pttas = 0.967.

Table 1. Segmentation metrics of the 3064 slices processed with 5-fold cross validation.

Metric\Tumor Meningioma Glioma Pituitary Tumor Average


DICE 0.894 0.779 0.813 0.828
SENSITIVITY 0.961 0.907 0.954 0.940
PTTAS 0.938 0.986 0.979 0.967

The histograms for these metrics of the segmentation per processed slice are included
in Figure 5. Remarkably, Sensitivity and pttas histograms have their values concentrated on
a very narrow interval around the highest zone. This means that almost all the processed
images have a high value in those evaluation metrics.

Figure 5. Histograms of metrics obtained after processing the dataset. Left: Dice histogram. Center: Sensitivity histogram.
Right: Histogram of the predicted tumor type.

Regarding the pttas index histogram (see Figure 5—right), we can see a very outstand-
ing behavior. Around value 0, there are just a few samples. In these samples the predicted
tumor is wrong as pttas is lower than 1/3 (pttas measures the ratio to the right label).
If pttas is larger than 1/3, it means that the tumor type prediction was correct. The rest of
the samples are found in values above 0.4, which means that they have all been correctly
predicted. The high values of the Sensitivity index (see Figure 5—center) are due to the fact
that almost all the tumor regions have been correctly extracted in the processed images (re-
gions in yellow in Figure 4). A greater amplitude in the Dice histogram (see Figure 5—left)
reflects the existence of false positives and false negatives. However, the Dice index results
are relatively high, so the number of false positives and false negatives is quite low.
Figure 6 shows some segmentations with misclassified areas; in the example of the
upper row, the greater part of the detected tumor is correctly labeled (red area in the
right column), resulting in a very correct segmentation (yellow area in the left column).
However, there is a region detected in the non-cerebral area wrongly labeled as glioma
tumor (green area in the right column). This example shows the added complexity inherent
to this dataset due to the fact that it includes non-cerebral areas that can generate false
positives. This complexity also manifests itself in the example of the middle row. Similarly,
Healthcare 2021, 9, 153 9 of 14

the segmentation is relatively correct (yellow region in the left column), but there is a
misclassified area labeled as a pituitary tumor (blue region in the right column) located
in the sphenoidal sinuses area, which is where pituitary tumors appear. The physical
structure of the sphenoidal sinuses led to confusion to our model. The third example (lower
row) shows a confusion between a real glioma (green region in the right column) and a
wrongly predicted meningioma region (red area in the left column).

Figure 6. Confusion in segmentation. Color-generated images for proper visualization of regions.


Upper row corresponds to a meningioma tumor, middle row to a meningioma tumor, and lower row
to a glioma tumor. Right column shows the predicted tumor tags (red: meningioma, green: glioma,
blue: pituitary tumor). Left column shows segmentations (red: predicted, green: ground truth and,
as a result, the intersection is shown in yellow).

In our method, tumor classification is a direct outcome of the resulting segmentation.


The classification function for the tumor in the image counts the tumor type prediction for
every pixel (pttas, see Equation (5)) and considers the highest value to be the predicted
tumor type given that it is greater than the confidence threshold τc (see Equation (2)).
Tables 2 and 3 show the values of the confusion matrix and the tumor classification accuracy
for Cheng et al. [5] method and our method, respectively. It can be observed that the
confusion matrix for our method has very high values in its diagonal, which results in a
high tumor classification accuracy of 0.973.
Healthcare 2021, 9, 153 10 of 14

Table 2. Confusion matrix of the predicted tumor for Cheng et al. [5] method. The tumor type
precision (tumor classification accuracy) achieved (∑ diag/ ∑ total) is 0.912.

True\Predicted Meningioma Glioma Pituitary Tumor Sensitivity


Meningioma 571 57 80 0.86
Glioma 44 1326 56 0.96
Pituitary tumor 75 54 801 0.87
Tumor classification accuracy 0.912

Table 3. Confusion matrix of the predicted tumor for our method. The number of non-classified images is 61 (42/4/15),
taking a confidence threshold τc = 0.75. The tumor type precision (tumor classification accuracy) achieved (∑ diag/ ∑ total)
is 0.973.

True\Predicted Meningioma Glioma Pituitary Tumor Sensitivity Non-Classified


Meningioma 659 4 3 0.93 42
Glioma 7 1414 1 0.99 4
Pituitary tumor 1 3 911 0.98 15
Tumor classification accuracy 0.973

The classification function, Equation (2), depends on the confidence threshold param-
eter τc . Figure 7 shows the relationship between the confidence threshold and the resulting
precision. A slow descent can be observed up to a confidence threshold of 0.88 followed by
a sharp decline. There is no strong restriction on the selection of the confidence threshold
and, in order to obtain a classification with a precision level higher than 0.9, it could be
possible to raise our confidence threshold to a very high value (around 0.92).

Figure 7. Graph of the relation between the confidence threshold and the precision of the predicted
tumor type.

This behavior could be predicted observing the pttas index histogram (see Figure 5).
Peak values in the histogram can be found once a pttas value of 0.88 is reached. The number
of images with a pttas index value between 0.4 and 0.88 is very low, so when the confidence
Healthcare 2021, 9, 153 11 of 14

threshold is raised from 0.4 to 0.88, the precision will only be marginally lower as can be
observed in the graph in Figure 7.
Due to the fact that all the processed images with a pttas index value greater than 0.4
are correctly labeled, if we were to choose a confidence threshold equal or lower than 0.4,
the precision would be close to 100% (all samples would be correctly labeled, except those
with a value close to zero). This is why the confidence threshold diagram begins with
accuracy ≈1 (= 0.994).

Comparison with Other Methods


Segmentation performance metrics of our method (see Table 1) are in the range of the
wining proposals in the well-known brain tumor image segmentation challenge BRATS [3].
This benchmark uses glioma tumor images and supplies a dataset with four modalities of
MRI (T1, T2, T1-CE, and FLAIR). The 2013 top-ten ranking of the participating methods
obtained glioma Dice index values between 0.69 and 0.82 for the segmentation of the
complete tumor. The Dice index value obtained by our method for glioma segmentation is
0.779, which is close to the highest achieved value, even with the inconveniences associated
with working with a single MRI modality and classifying among three different kinds
of tumors.
A more appropriate comparison of our results is with those results obtained by previ-
ous works using the same T1-CE MRI image dataset. Table 4 shows the tumor classification
accuracy of our approach, together with seven published methods: two feature-driven
approaches [5,32] and five deep learning approaches [24–27]. Our method outperforms the
other proposals with a tumor classification accuracy of 0.973.

Table 4. Comparison of the proposed approach with other approaches over the same T1-CE MRI
image dataset.

Authors Classification Method Tumor Classification Accuracy


Cheng et al. [5] SVM 0.912
Cheng et al. [32] Fisher kernel 0.947
Abiwinanda et al. [24] CNN 0.841
Pashaei et al. [25] CNN 0.810
Pashaei et al. [25] CNN and KELM 0.936
Sultan et al. [26] CNN 0.961
Anaraki et al. [27] CNN and GA 0.942
Our approach Multiscale CNN 0.973

The two studies conducted by Cheng et al. [5,32] are examples of classical machine
learning approaches using pre-defined feature extraction. In a first paper, Cheng et al. [5]
proposed a classification method based on the technique of splitting the augmented tu-
mor region in ring-shaped sub-regions and extracting local features using three different
methods: intensity histogram, gray level co-occurrence matrix (GLCM), and bag-of-words
(BoW). Those features were employed to create a dictionary with the K-means technique.
Then, a trained classifier was used to classify the ROIs as a tumor type. Three classification
methodologies were compared: SVM, sparse representation-based classification (SRC),
and k-Nearest Neighbors (kNN). The best results were obtained using BoW and SVM as the
classifier. In a second paper, Cheng et al. [32] proposed a content-based image retrieval for
MRI images using the Fisher kernel framework. Cheng et al. methodologies [5,32] start with
segmentations performed by experienced radiologists, and thus, they are semi-automatic
methods. They need an interactive human intervention during classification, whereas our
proposed methodology is fully automatic. Nonetheless, using a confidence threshold of
0.75, our method obtains a tumor classification accuracy of 0.973 versus 0.912 in [5] and
0.947 in [32]. Moreover, Figure 7 shows that it is possible to raise the confidence threshold
up to 0.9 in order to obtain even a higher classification accuracy. Setting a threshold of 0.75,
61 slices resulted unclassified (2% of the total processed MRI images): 42 meningiomas,
Healthcare 2021, 9, 153 12 of 14

4 gliomas, and 15 pituitary tumors. These figures show to what extent the detection of
meningioma tumors is much more demanding. As in Cheng et al. [5] method (see Table 2),
in our work the lowest Sensitivity value is for meningioma (see Table 3), due to the lower
intensity contrast between tumor areas and healthy areas. Sensitivity for glioma is the
highest. Pituitary tumors are somewhere in between meningiomas and gliomas.
The five Deep Learning methods in Table 4 also obtained a lower accuracy than our
proposal. Abiwinanda et al. [24] used two convolution layers and 64 fully connected
neurons. They did not use the full dataset but selected 700 images from each brain
tumor type, with the purpose of balancing the data. A subset of 500 tumor images of
each class was used for training and a subset of 200 images for testing. They did not
apply data augmentation. Their classification accuracy was 0.841, quite lower than our
approach (0.973). Pashaei et al. [25] proposed two Deep Learning methods. In their first
method, a CNN model is used for classification with 4 convolution and normalization
layers, 3 max-pooling layers, and a last fully connected layer. In their second method,
the features extracted by a CNN are the inputs of a KELM classifier. They used 70%
of the dataset for training, no data augmentation and 30% for testing using a 10-fold
cross validation. The classification accuracy was 0.810 with the first method, and it was
significantly improved with the second method (0.936). Sultan et al. [26] used a CNN
containing 16 convolution layers, including pooling and normalization, and a dropout
layer before the last fully connected layer. They used 68% of the images for training and
the remaining 32% for validation and test. Significantly, training data was increased by
a factor of 5 with geometric augmentation and grayscale distortion. Their classification
accuracy is the second highest in Table 4 (0.961). Finally, Anaraki et al. [27] also proposed a
CNN but included a genetic algorithm to select the best configuration of the CNN: number
of convolutional, max-pooling and fully connected layers, activation function, dropout
parameter, optimization method, and learning rate. Data augmentation was carried out by
applying rotation, translation, scaling, and mirroring. They obtained an accuracy of 0.942.
There are some other interesting deep learning approaches that used a different
brain tumor MRI dataset in their experiments. Havaei et al. [10] trained several CNN
neural networks with the BRATS-2013 glioma tumor dataset. They included a three-step
preprocessing: removal of the highest and lowest intensities, intensity inhomogeneity
correction, and normalization. The segmentation results were close to those obtained
with our method, with a Dice index between 0.77 and 0.88 for the complete tumor and
a Sensitivity index between 0.75 and 0.96. Our method obtains a slightly higher Dice
index value of 0.779 and a Sensitivity index of 0.907 for glioma segmentation (see Table 1).
Mohsen et al. [23] proposed a Deep Learning approach together with DWT and PCA and
applied it to classify a dataset of 66 MRIs into 4 classes: normal, glioblastoma, sarcoma,
and metastatic bronchogenic carcinoma tumors. They used Fuzzy C-means clustering to
separate brain tissues from skull before the deep learning processing. Although the types
of brain tumors in the images are different from the ones in the T1-CE MRI image dataset
we used, the classifier had to obtain an output label among four different values, similarly
to our approach with the adopted T1-CE MRI image dataset. They obtained a classification
accuracy of 0.969, lower than our approach.

4. Conclusions
In this paper, we present a fully automatic brain tumor segmentation and classification
method, based on a CNN (Convolutional Neural Network) architecture designed for mul-
tiscale processing. We evaluated its performance using a publicly available T1-weighted
contrast-enhanced MRI images dataset. Data augmentation through elastic transformation
was adopted to increase the training dataset and prevent overfitting. The measures of
performance obtained are in the range of the top ten methods from the BRATS 2013 bench-
mark. We compared our results with other seven brain tumor classification approaches that
used the same dataset. Our method obtained the highest tumor classification accuracy with
a value of 0.973. Our multiscale CNN approach, which uses three processing pathways,
Healthcare 2021, 9, 153 13 of 14

is able to successfully segment and classify the three kinds of brain tumors in the dataset:
meningioma, glioma, and pituitary tumor. In spite of the fact that skull and vertebral
column parts are not removed and the variability of the three tumor types, which caused
false positives in some images, our approach achieved outstanding segmentation perfor-
mance metrics, with an average Dice index of 0.828, an average Sensitivity of 0.940, and an
average pttas value of 0.967. Our method can be used to assist medical doctors in the
diagnostics of brain tumors and the proposed segmentation and classification method can
be applied to other medical imaging problems. As future work, we plan to develop an
FCN architecture for the classification of the same MRI images dataset and compare its
performance with the proposed model. Moreover, we plan to study the applicability of
the proposed multiscale convolutional neural network for segmentation in other research
fields such as satellite imagery.

Author Contributions: Conceptualization, F.J.D.-P.; methodology, F.J.D.-P.; software, M.M.-Z.,


M.A.-R., and D.G.-O.; experimental results, M.M.-Z., M.A.-R., and D.G.-O.; supervision, F.J.D.-P.;
writing—original draft and review, F.J.D.-P., M.M.-Z., M.A.-R., and D.G.-O. All authors have read
and agreed to the published version of the manuscript.
Funding: This research received no external funding.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: The dataset used in the experiments is publicly available from Figshare
(http://dx.doi.org/10.6084/m9.figshare.1512427 (accessed on 31 January 2021)).
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Mohan, G.; Subashin, M. MRI based medical image analysis: Survey on brain tumor grade classification. Biomed. Signal
Process. Control 2018, 39, 139–161. [CrossRef]
2. Işın, A.; Direkoğlu, C.; Şah, M. Review of MRI-based brain tumor image segmentation using deep learning methods.
Procedia Comput. Sci. 2016, 102, 317–324. [CrossRef]
3. Menze, B.J.; Jakab, A.; Bauer, S.; Kalpathy-Cramer, J.; Farahani, K.; Kirby, J.; Burren, Y.; Porz, N.; Slotboom, J.; Wiest, R.; et al.
The multimodal brain tumor image segmentation benchmark (BRATS). IEEE Trans. Med. Imaging 2015, 34, 1993–2024. [CrossRef]
[PubMed]
4. Litjens, G.; Kooi, T.; Bejnordi, B.E.; Arindra, A.; Setio, A.; Ciompi, F.; Ghafoorian, M.; van der Laak, J.A.W.M.; van Ginneken, B.;
Sánchez, C.I. A survey on deep learning in medical image analysis. Med. Image Anal. 2017, 42, 60–88. [CrossRef] [PubMed]
5. Cheng, J.; Huang, W.; Cao, S.; Yang, R.; Yang, W.; Yun, Z.; Wang, Z.; Feng, Q. Enhanced performance of brain tumor classification
via tumor region augmentation and partition. PLoS ONE 2015, 10, e0140381.
6. Sachdeva, J.; Kumar, V.; Gupta, I.; Khandelwal, N.; Ahuja, C.K. A package-SFERCB-“Segmentation, features extraction, reduction
and classification analysis by both SVM and ANN for brain tumors”. Appl. Soft Comput. 2016, 47, 151–167. [CrossRef]
7. Sharma, Y.; Chhabra, M. An improved automatic brain tumor detection system. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 2015, 5,
11–15.
8. Iftekharuddin, K.M.; Zheng, J.; Islam, M.A.; Ogg, R.J. Fractal-based brain tumor detection in multimodal MRI. Appl. Math. Comput.
2009, 207, 23–41. [CrossRef]
9. Havaei, M.; Jodoin, P.-M.; Larochelle, H. Efficient Interactive Brain Tumor Segmentation as Within-Brain kNN Classification.
In Proceedings of the 22nd International Conference on Pattern Recognition, Stockholm, Sweden, 24–28 August 2014; IEEE:
Los Alamitos, CA, USA, 2014; pp. 556–561.
10. Havaei, M.; Davy, A.; Warde-Farley, D.; Biard, A.; Courville, A.; Bengio, Y.; Pal, C.; Jodoin, P.-M.; Larochelle, H. Brain tumor
segmentation with deep neural networks. Med. Image Anal. 2017, 35, 18–31. [CrossRef]
11. Moeskops, P.; de Bresser, J.; Kuijf, H.J.; Mendrik, A.M.; Biessels, G.J.; Pluim, J.P.W.; Išgum, I. Evaluation of a deep learning approach
for the segmentation of brain tissues and white matter hyperintensities of presumed vascular origin in MRI. NeuroImage Clin.
2018, 17, 251–262. [CrossRef]
12. MRBrainS: Evaluation Framework for MR Brain Image Segmentation. Available online: https://mrbrains13.isi.uu.nl/ (accessed
on 12 December 2020).
Healthcare 2021, 9, 153 14 of 14

13. Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the
18th International Conference on Medical Image Computing and Computer-Assisted Intervention—MICCAI, Munich, Germany,
5–9 October 2015; Navab, N., Hornegger, J., Wells, W., Frangi, A., Eds.; LNCS. Springer: Cham, Switzerland, 2015; Volume 9351,
pp. 234–241.
14. ISBI Challenge: Segmentation of Neuronal Structures in EM Stacks. Available online: http://brainiac2.mit.edu/isbi_challenge
(accessed on 15 December 2020).
15. Chaddad, A.; Tanougast, C. Quantitative evaluation of robust skull stripping and tumor detection applied to axial MR images.
Brain Inf. 2016, 3, 53–61. [CrossRef] [PubMed]
16. Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 11,
2278–2324. [CrossRef]
17. Shelhamer, E.; Long, J.; Darrell, T. Fully Convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell.
2017, 39, 640–651. [CrossRef] [PubMed]
18. Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation.
IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [CrossRef]
19. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Comms. ACM 2017,
60, 84–90. [CrossRef]
20. Al-Antari, M.A.; Al-Masni, M.A.; Choi, M.-T.; Han, S.M.; Kim, T.-S. A fully integrated computer-aided diagnosis system for
digital X-ray mammograms via deep learning detection, segmentation, and classification. Int. J. Med. Inform. 2018, 117, 44–54.
[CrossRef]
21. Mazurowski, M.A.; Buda, M.; Saha, A.; Bashir, M.R. Deep learning in radiology: An overview of the concepts and a survey of the
state of the art with focus on MRI. J. Magn. Reson. Imaging 2019, 49, 939–954. [CrossRef]
22. Chartrand, G.; Cheng, P.M.; Vorontsov, E.; Drozdzal, M.; Turcotte, S.; Christopher, J.P.; Kadoury, S.; Tang, A. Deep learning:
A primer for radiologists. Radiographics 2017, 37, 2113–2131. [CrossRef]
23. Mohsen, H.; El-Dahshan, E.-S.A.; El-Horbaty, E.-S.M.; Salem, A.-B.M. Classification using deep learning neural networks for
brain tumors. Future Comput. Inform. J. 2018, 3, 68–71. [CrossRef]
24. Abiwinanda, N.; Hanif, M.; Hesaputra, S.T.; Handayani, A.; Mengko, T.R. Brain tumor classification using convolutional neural
network. In Proceedings of the World Congress on Medical Physics and Biomedical Engineering, Prague, Czech Republic,
3–8 June 2018; Springer: Singapore, 2019; pp. 183–189.
25. Pashaei, A.; Sajedi, H.; Jazayeri, N. Brain Tumor Classification via Convolutional Neural Network and Extreme Learning
Machines. In Proceedings of the 8th International Conference on Computer and Knowledge Engineering (ICCKE), Mashhad,
Iran, 25–26 October 2018; Curran Associates: New York, NY, USA, 2018; pp. 314–319.
26. Sultan, H.H.; Salem, N.M.; Al-Atabany, W. Multi-classification of brain tumor images using deep neural network. IEEE Access
2019, 7, 69215–69225. [CrossRef]
27. Anaraki, A.K.; Ayati, M.; Kazemi, F. Magnetic resonance imaging-based brain tumor grades classification and grading via
convolutional neural networks and genetic algorithms. Biocybern. Biomed. Eng. 2019, 39, 63–74. [CrossRef]
28. Díaz-Pernas, F.J.; Martínez-Zarzuela, M.; Antón-Rodríguez, M.; González-Ortega, D. Learning and surface boundary feedbacks
for color natural scene perception. Appl. Soft Comput. 2017, 61, 30–41. [CrossRef]
29. Julesz, B.; Bergen, J.R. Textons, the fundamental elements in preattentive vision and perception of textures. In Readings in Computer
Vision: Issues, Problems, Principles, and Paradigms; Fischler, M.A., Firschein, O., Eds.; Morgan Kaufmann: Los Altos, CA, USA, 1987;
pp. 243–256.
30. Kruger, N.; Janssen, P.; Kalkan, S.; Lappe, M.; Leonardis, A.; Piater, J.; Rodriguez- Sanchez, A.J.; Wiskott, L. Deep hierarchies in
the primate visual cortex: What can we learn for computer vision? IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 1847–1871.
[CrossRef] [PubMed]
31. Simard, P.Y.; Steinkraus, D.; Platt, J.C. Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis.
In Proceedings of the 7th International Conference on Document Analysis and Recognition, Edinburgh, Scotland, UK, 3–6 August
2003; IEEE: Los Alamitos, CA, USA, 2003; pp. 958–963.
32. Cheng, J.; Yang, W.; Huang, M.; Huang, W.; Jiang, J.; Zhou, Y.; Yang, R.; Zhao, J.; Feng, Q.; Chen, W. Retrieval of brain tumors by
adaptive spatial pooling and fisher vector representation. PLoS ONE 2016, 11, e0157112. [CrossRef] [PubMed]

You might also like