You are on page 1of 18

Expert Systems With Applications 238 (2024) 122159

Contents lists available at ScienceDirect

Expert Systems With Applications


journal homepage: www.elsevier.com/locate/eswa

Development of hybrid models based on deep learning and optimized


machine learning algorithms for brain tumor Multi-Classification
Muhammed Celik *, Ozkan Inik
Tokat Gaziosmanpasa University, Engineering and Architecture Faculty, Computer Engineering Department, Tokat 60250, Turkey

A R T I C L E I N F O A B S T R A C T

Keywords: Accurate classification of magnetic resonance imaging (MRI) images of brain tumors is crucial for early diagnosis
Brain tumors and effective treatment in clinical studies. In these studies, many models supported by artificial intelligence (AI)
Deep learning have been proposed as assistant systems for experts. In particular, state-of-the-art deep learning (DL) models that
Machine learning
have proven themselves in different fields have been effectively used in the classification of brain MRI images.
Hyperparameter optimization
Classification
However, the low accuracy of multiple classification of these images still leads researchers to conduct different
studies in this field. Especially there is a need to develop models that achieve high accuracy on original images,
and it is believed that this need can be met not only by DL models but also by classical machine learning (ML)
algorithms. However, it is critical to choose the hyperparameters correctly for the hybrid use of ML algorithms
with DL models. This study proposes a powerful new hybrid method to perform multiple classifications of brain
tumors with high accuracy. This method also uses a novel convolutional neural network (CNN) model for feature
extraction, and ML algorithms are used for feature classification. In addition, nine state-of-the-art CNN models
are used for CNN performance comparison. The Bayesian optimization algorithm is used to obtain the optimal
hyperparameter values of ML algorithms. The results obtained from the experimental studies show that the
proposed hybrid model achieved 97.15% mean classification accuracy and 97% recall, precision, and F1-score
values. Other hybrid models, including DarkNet19-SVM, DarkNet53-SVM, DenseNet201-SVM, EfficientNetB0-
SVM, InceptionV3-SVM, NasNetMobile-SVM, ResNet50-SVM, ResNet101-SVM, and Xception-SVM, achieved
mean classification accuracies of 95.01%, 95.58%, 96.87%, 97.01%, 95.3%, 95.01%, 96.3%, 95.87%, and
96.23%, respectively. Additionally, the proposed hybrid model exhibited remarkable time efficiency, accom­
plishing the classification process in a mere 67 min. Conversely, the model that exhibited the lowest time effi­
ciency was the InceptionV3, with a processing time of 370 min. In terms of computational complexity, the
EfficientNetB0 model is the most efficient. Despite the higher computational complexity of the proposed CNN
model compared to some other models, it achieves the second-best classification accuracy. These results show
that the proposed method performs better than previous studies on the same dataset. Especially in the classifi­
cation problem, the optimized ML algorithms were superior to CNN classifiers. Finally, except for one, the
proposed CNN model achieved better classification accuracies than the state-of-the-art CNN models.

1. Introduction tumors are usually not fatal as they do not spread to other tissues. In
contrast, malignant tumors are fast-growing and can spread to other
The brain is a complex and essential organ that controls numerous tissues, severely affecting the quality of life (Kumar and Mankame,
vital functions, composed of billions of neurons, synapses, and nerve 2020). Brain tumors comprise several common types, including glioma,
cells. However, like other organs, the brain is also prone to developing meningioma, and pituitary, which can have fatal outcomes (Badža and
abnormalities known as brain tumors. These tumors refer to the Barjaktarović, 2020). Gliomas refer to a group of tumors that develop in
abnormal growth of cells in different sizes and locations within the brain the glial cells, the supportive tissue of the brain. They account for
and can be fatal by killing healthy cells (Abd-Ellah et al., 2018). Brain approximately 30 % of all brain tumors and are often malignant. Me­
tumors are categorized into two types: benign and malignant. Benign ningiomas are primarily benign and slow-growing tumors that arise

* Corresponding author.
E-mail addresses: muhammed.celik@gop.edu.tr (M. Celik), ozkan.inik@gop.edu.tr (O. Inik).

https://doi.org/10.1016/j.eswa.2023.122159
Received 15 February 2023; Received in revised form 10 September 2023; Accepted 12 October 2023
Available online 18 October 2023
0957-4174/© 2023 Elsevier Ltd. All rights reserved.
M. Celik and O. Inik Expert Systems With Applications 238 (2024) 122159

from the brain’s membranes. Although they do not originate from inside Table 1
the brain, they cause symptoms by exerting pressure on the brain as they Recent studies for the classification of brain MRI images.
grow. Pituitary tumors are unusual growths that occur in the pituitary Authors Problem Method Dataset Accuracy
gland located at the base of the brain behind the nose. Some of these (%)
tumors can cause the pituitary gland to produce excessive hormones that (Cheng et al., Multi- SVM Figshare (Cheng, 91.28
regulate essential bodily functions and are typically benign (Sharif et al., 2015) Classification 2017)
2020). (Zacharaki Multi- SVM and Special 85
The most common methods used to detect cancerous cells and tissues et al., 2009) Classification KNN
(El-Dahshan Binary- ANN and Harward Medical 98
in the body are medical imaging methods, with MRI and computed to­ et al., 2010) Classification KNN School Dataset (
mography scans being the most frequently used (Khan et al., 2020). MRI Johnson and
is an imaging method that uses magnetic fields and radio waves to Becker, 1999)
obtain high-resolution images, and is especially useful in visualizing the (Zhang et al., Binary- ANN Harward Medical 100
2011) Classification School Dataset
brain, spinal cord, joints, and internal organs, as well as diagnosing and
(Sultan et al., Multi- CNN Figshare 96.13
monitoring diseases such as cancer. MRI provides more precise and 2019) Classification
detailed results compared to other imaging methods, which makes it a (Aamir et al., Multi- CNN Figshare 98.95
useful tool in detecting and diagnosing brain tumors (Díaz-Pernas et al., 2022) Classification
2021; Kumar and Mankame, 2020). Experts evaluate the MRI images to (Chattopadhyay Binary- CNN and BraTS 2020 ( 99.74
and Maitra, Classification SVM Menze et al., 2015)
determine whether the patient has a brain tumor and, if so, which type 2022)
of tumor it is. However, manual examination of MRI images can be time- (Nayak et al., Multi- CNN Figshare 98.78
consuming, costly, and prone to errors, especially with a large volume of 2022) Classification
data (Gómez-Guzmán et al., 2023). This can result in a longer diagnostic (Wahlang et al., Binary- CNN, 1. Figshare 88
2022) Classification CNN- 2. Brainweb (
process, and increased risk to the patient’s life. To overcome these is­
DNN and Collins et al.,
sues, computer-aided diagnosis systems have been developed, and SVM 1998)
studies (Díaz-Pernas et al., 2021; Gómez-Guzmán et al., 2023; Khan 3. Radiopaedia (
et al., 2020; Sharif et al., 2020) have shown that ML and DL are suc­ radiopaedia.org)
cessful methods for classifying, segmenting, and detecting brain tumors. (Al-Badarneh Binary- ANN and Harward Medical 100
et al., 2012) Classification KNN School Dataset
CNN models and DL approaches have been widely used to analyze
(Raza et al., Multi- CNN Figshare 99.6
medical images of malignant tumors and various diseases (Das et al., 2022) Classification
2019), such as pneumoconiosis(Devnath et al., 2022), skin cancer (Alam (Maqsood et al., Multi- CNN- 1. BraTS 2018 ( 98.92
et al., 2022), and brain tumors (Srinivas et al., 2022). However, there are 2022) Classification SVM Menze et al., 2015)
2. Figshare
hybrid frameworks where ML and DL methods are used together (El-
(Amran et al., Binary- CNN Br35H (Hamada, 99.51
Dahshan et al., 2010; Zhang et al., 2011). In order to achieve this, the 2022) Classification 2020)
transfer learning (TL) method is utilized (Deepak and Ameer, 2019). By (Samee et al., Multi- CNN Figshare 99.51
utilizing the weights of a previously trained network, TL can speed up 2022) Classification
the process by providing a shorter training time for the new problem and (Latif et al., Multi- CNN- BraTS 2018 96.19
2022) Classfication SVM
increasing accuracy (Aurna et al., 2022). Studies on brain MRI images in
(Gómez- Multi- CNN 1. Figshare 81.05
the literature are presented in the next section. Guzmán et al., Classification 2. SARTAJ (
2023) Bhuvaji et al.,
1.1. Related works 2020)
3. Br35H
(combination of 3
In this section, studies on the classification of brain tumors in MRI dataset)
images are presented. A thorough search was conducted for recent ar­ (Yazdan et al., Multi- Multi- SARTAJ 91.2
ticles that achieved high classification accuracy in the Science Direct, 2022) Classification Scale
IEEE Xplore, and Google Scholar databases. The search was performed CNN
(Díaz-Pernas Multi- Multi- Figshare 97.3
using the keywords “Brain MRI Images,” “Deep Learning,” and “Machine et al., 2021) Classification Scale
Learning”. Important studies in the literature found with these keywords CNN
are summarised and presented in Table 1. The table includes the prob­ (Jibon et al., Binary- CNN Harward Medical 96
lem that the studies are based on, the methods used to solve the problem, 2022) Classification School Dataset
(Das et al., Multi- CNN Figshare 94.39
and the dataset used. On the far right of the table are the highest ac­
2019) Classification
curacy values achieved in the test phases of the studies. DL models and (Deepak and Multi- CNN- Figshare 98
ML algorithms have been widely used for the classification of brain tu­ Ameer, 2019) Classification SVM,
mors. Cheng et al. (2015) proposed to improve brain tumor classifica­ CNN-
tion performance by enlarging the tumor region through image KNN
(Deepak and Multi- CNN- Figshare 95.82
expansion and then subdividing it into subregions. They used three Ameer, 2021) Classification SVM
methods such as intensity histogram, Gray Level Co-occurrence Matrix (Aurna et al., Multi- CNN- FigsharevSARTAJ 98.96
(GLCM), and Bag of Words (BOW) for feature extraction and used 2022) Classification Ensemble
annular segmentation in addition to tumor region expansion. They (Ullah et al., Binary- CNN Abhranta 99.33
2022b) Classification Panigrahi (
achieved the best accuracy of 91.28 % using a SVM classifier. Zacharaki
Panigrahi, 2021)
et al. (2009) introduced a system for glioma grade classification that (Mohsen et al., Multi- DNN Harward Medical 96.97
utilized SVM and k-Nearest Neighbor (KNN) methods. The system per­ 2018) Classification School Dataset
formed both multiclassification of different glioma grades and binary (Srinivas et al., Binary- CNN Kaggle (Hemanth 96
classification to differentiate between high and low grades. The study 2022) Classification et al., 2019)
(Ananda Kumar Multi- CNN BraTS (Bhupendra 99.57
yielded an accuracy of 85 % for multiclassification and 88 % for binary et al., 2022) Classification et al., 2022)
classification. El-Dahshan et al. (2010) proposed a method for classi­
fying brain MRI images as normal and abnormal using artificial neural
network (ANN) and KNN classifiers. They used Discrete Wavelet

2
M. Celik and O. Inik Expert Systems With Applications 238 (2024) 122159

Transform (DWT) and Principal Component Analysis (PCA) methods in In addition to these studies, there are also studies in the literature
image preprocessing stages. In the study, the highest accuracy of 98 % that use TL. Latif et al. (2022), proposed a hybrid method for classifying
was achieved with the KNN classifier. Zhang et al. (2011) proposed an glioma tumors. The method used a CNN model for feature extraction,
ANN for binary classification of brain tumor MRI images. They used and then classification was performed with SVM.
DWT and PCA for feature extraction and feature reduction, respectively. The success rate obtained from the classification process was calcu­
As a result, they achieved 100 % accuracy. Al-Badarneh et al. (2012), lated as 96.16 %. Deepak and Ameer (2019) proposed a hybrid method
used ANN and KNN methods for automatic binary classification of MRI for classifying brain tumors. The method used an MRI dataset containing
images. According to the test results, ANN and KNN achieved 98.92 % three different brain tumors. GoogLeNet was used as the feature
and 100 % accuracy, respectively. extractor in the proposed method. These features were then classified
The success of CNN in classification problems is high in medical data using SVM and KNN algorithms. The results obtained achieved an ac­
as well as in other data. Sultan et al. (2019) proposed a CNN architecture curacy of 98 % with a CNN-KNN structure. Deepak and Ameer (2021)
for the classification of multi-class brain tumors from MRI images. With used the CNN-SVM hybrid structure for brain tumor classification, as in
the proposed architecture, they were able to classify images with 96.13 their previous study. The classification result obtained was 95.82 %
% accuracy. Aamir et al. (2022) proposed three DL models for multiple accuracy. Jibon et al. (2022) proposed a method for binary classification
classification of brain tumors. Two of them were used for feature of brain MRI images. This method uses log-polar transform (LPT) and
extraction, and one for classification. As a result, the data was classified CNN for feature extraction and classification. The proposed method
with 98.95 % accuracy. Chattopadhyay and Maitra (2022) proposed a achieved 96 % accuracy as a result of the study.Mohsen et al. (2018)
CNN model for classifying MRI images into tumor and non-tumor for proposed combining DWT and D-CNN methods for classifying brain
detecting brain tumors. In the proposed method, SVM, softmax, and tumors. DWT was used for feature extraction in this study. Later, the D-
sigmoid functions are used in the last layer of the model to diversify the CNN algorithm was used to classify the obtained features. The classifi­
activation function. RMSProp algorithm was used for optimization. In cation result obtained was 96.97 % accuracy. Srinivas et al., (2022)
the tests, the method using the softmax classifier and RMSProp opti­ demonstrated the effectiveness of the transfer learning approach for
mizer gave the best result with 99.74 % accuracy. binary classification of brain tumors. The study used pre-trained VGG16,
In the classification problem of brain tumors, using CNN-based ResNet50, and InceptionV3 models. The highest success rate in the
hybrid structures also provides effective results. These structures experimental study was achieved with the VGG16 model at 96 % ac­
perform the classification process on a new model created by combining curacy. Ananda Kumar et al. (2022) conducted a study using the
two or more CNN models. Nayak et al. (2022) developed a CNN model ResNet152 pre-trained model to detect and classify benign, malignant,
for the classification of brain tumors. This model was derived from the and healthy brain tissues. Additionally, the CoV-19 OA optimization
EfficientNet architecture and was used to classify multiple brain tumor algorithm was used for weight parameter optimization. A DCNN model
MRI images. According to the test results, the model achieved 98.78 % was used in the data preprocessing stage of the dataset. The proposed
accuracy. In a study conducted by Yazdan et al. (2022) a Multi-Scale approach achieved 99.57 % accuracy in the test studies.
(MS) CNN model was proposed for performing multi-classification on Studies in the literature demonstrate the effectiveness of using pre-
a four-class brain MRI dataset. According to the results of the study, the trained models for brain tumor classification. Gómez-Guzmán et al.
proposed model performed better than ALexNet and ResNet models, (2023) developed a CNN model for multi-classification of brain tumors.
achieving 91.2 % accuracy. Díaz-Pernas et al. (2021) used the MS-CNN The performance of various pre-trained models was also compared with
model for the segmentation and multi-classification of brain tumors. The the proposed model. The InceptionV3 CNN model achieved the highest
model was designed by taking inspiration from the processing pathways accuracy on the applied dataset, reaching 97.12 %. The accuracy of the
in the human visual system and includes analysis of input images at proposed Generic CNN model was 81.05 %. Das et al. (2019) used his­
three different spatial scales. In the study, the proposed model achieved togram equalization method in the pre-processing stage of the images in
a tumor classification accuracy of 97.3 %. Wahlang et al. (2022) pro­ the dataset used for classification and classified the resulting features
posed a CNN model based on the LeNet architecture for detecting brain with a CNN model they developed. They achieved an accuracy of 94.39
tumors in MRI images. Factors such as age and gender were also added % in classification. Aurna et al. (2022) used five different pre-trained
to the model’s criteria in this study. The proposed method was compared CNN models and their proposed CNN model to demonstrate classifica­
with CNN-DNN, SVM, and AlexNet methods and stood out with the tion success. The pre-trained models used were VGG19, EfficientNetB0,
highest accuracy rate of 88 %. Raza et al. (2022), proposed a model InceptionV3, ResNet50, and Xception. As a result of experiments on the
called DeepTumorNet using the GoogLeNet architecture to perform merged dataset, the proposed model exhibited superior performance
multi-classification of three different types of brain tumors. In the pro­ compared to other models, with an accuracy of 98.96 %. Ullah et al.
posed method, 15 additional layers were added to the GoogLeNet ar­ (2022b) proposed a new CNN model called TumorResnet for binary
chitecture. This model achieved high classification success with a 99.6 % classification of brain tumors. The proposed model achieved a classifi­
accuracy rate. Maqsood et al. (2022) developed a method for multi- cation accuracy of 99.33 %.
classification of brain tumors. This method involves a five-stage
approach. In the first stage, the edges of the input images were detec­ 1.2. Motivation
ted. In the second stage, brain tumors were segmented using a 17-layer
CNN model. In the third stage, feature extraction was performed using Early diagnosis of brain tumors is a critical process, and in recent
the MobileNetV2 CNN model. In the fourth stage, the best features were years, computer vision systems have been used for this diagnosis. Many
selected, and in the final stage, classification was performed using SVM. studies have been conducted on the classification of brain MRI images,
The classification process was performed on two separate datasets. As a and DL-based architectures have generally been used in these studies.
result of the experiments conducted, an accuracy of 97.47 % was ob­ However, in some studies, feature extraction has been performed by
tained for the BraTS 2018 dataset and 98.92 % for the Figshare dataset. CNNs, and classical ML algorithms such as SVM and KNN have been used
Amran et al. (2022), conducted a study using the GoogLeNet architec­ for classification. Nevertheless, it has been observed that there are some
ture for the classification of brain tumors. However, unlike previous shortcomings in the multiple classification of brain MRI images in the
studies, the last 5 layers were deleted, and 14 layers that perform literature. These are listed below in bullet points.
automatic feature extraction were added in their place. The developed
model was compared with various pre-trained CNN models and machine 1. The studies on the same dataset show that CNN models are more
learning algorithms. As a result, the proposed model classified brain successful when used with ML algorithms. However, a single study
tumors with 99.51 % accuracy. should determine which method (CNN or ML) is more successful.

3
M. Celik and O. Inik Expert Systems With Applications 238 (2024) 122159

2. In previous studies, the hyperparameters of ML algorithms are usu­ 2. Material and methods
ally used as default, so the impact of parameter optimization on al­
gorithm performance should be investigated. In this section, the dataset used for brain tumor classification and the
3. CNNs are typically used with KNN and SVM algorithms, but the ML algorithms (SVM, KNN, NB and DT) and CNN models used in the
classification performance of other machine learning algorithms proposed method are discussed. The method, including the proposed
(such as Decision Tree (DT) and Naive Bayes (NB)) should be CNN model, is thoroughly explained and visually illustrated in Fig. 1.
explored. The steps summarized in the block diagram will be explained in more
4. State-of-the-art models require high training times when used with detail in the next section.
ML algorithms. Therefore, it is necessary to design a new CNN model
to have a shorter training time, so that more efficient results can be 2.1. Dataset
obtained when used with ML.
In this study, a publicly available MRI dataset is used to train, vali­
The main motivation of this study is to investigate each of the above
date and test CNN-ML based techniques and models. The brain tumor
points one by one and to verify the results with experimental studies.
MRI dataset (Nickparvar, 2021) is a combination of Figshare, SARTAJ
and BrH35 datasets and consists of 4 different classes. The dataset,
1.3. Contributions detailed in Table 2, is in grayscale and JPG format containing 7023
human brain MRI images of different types. There are four classes of
This study proposes a new approach for classifying four types of brain tumors in this dataset: Glioma (1297 images for training and 324
brain tumors (glioma, meningioma, no tumor, and pituitary) using images for testing), Meningioma (1316 images for training and 329
optimized ML algorithms with a new CNN and state-of-the-art CNN images for testing), No tumor (1600 images for training and 400 images
models. A publicly available dataset (Nickparvar, 2021) was used for the for testing) and Pituitary (1406 images for training and 351 images for
training and testing processes of the models. This dataset is a combi­ testing). For training and testing, 80 % and 20 % of the images were
nation of three different datasets, namely Fighshare, SARTAJ, and used, respectively. However, resizing was applied to the dataset in the
Br35H, and contains 7023 images. The contributions of the proposed preprocessing stage to provide a sufficient input size for each model.
method are presented below: Furthermore, the grayscale images were converted to RGB format before
the training and testing phases.
1. Features obtained from nine state-of-the-art CNN models (Dar­ Images belonging to the no-tumor class were taken from the Br35H
kNet19, DarkNet53, DenseNet201, EfficientNetB0, InceptionV3, dataset. Since the images belonging to the Glioma class in the SARTAJ
NasNetMobile, ResNet50, ResNet101, and Xception) were fed into dataset were misclassified, these images were taken into account and
ML algorithms optimized using Bayesian optimization for the clas­ replaced when merging the 3 datasets. Fig. 2 shows some sample images
sification task. In addition to feature extraction, these models were from the dataset.
also used for direct classification.
2. To minimize the time complexity in the proposed method, a novel 2.2. State-of-the-art CNN models
CNN model was designed in addition to the state-of-the-art models.
The implementation of the developed CNN model with the KNN al­ In this study, nine state-of-the-art CNN models were used. These
gorithm resulted in the shortest training time among all the evalu­ models are widely used and have high accuracy rates in various image
ated approaches. classification tasks. Model selection was based on low parameters and
3. The effectiveness of CNN-ML models in MRI classification is high accuracy values. Thus, the goal was to achieve high accuracies with
compared in several aspects. One of these aspects is the Model Size a small model size in MRI image classification. The input image size,
versus Model Accuracy graph. This graph shows the effect of model depth, and number of parameters for each model are presented in
size on accuracy as well as other performance measures. Table 3. This information will be helpful in understanding the different
4. The main goal of the study is to discover the optimal hybrid classifier characteristics of these models and evaluating their suitability for a
for the quadruple brain tumor class, which involves a higher specific classification task.
complexity, and this classifier is intended to be used for solving other Darknet19 model (Redmon and Farhadi, 2016) is a DL model with 19
multi-class problems. layers. The model is specifically designed for use in systems with limited
5. Unlike previous studies on the same or similar datasets, high classi­ resources such as mobile devices and provides successful results for
fication accuracy was achieved without applying data augmentation object detection, classification and other image processing tasks. The
techniques.
Table 2
The subsequent sections of this paper are structured as follows: Number of training and test samples for the classes in the dataset.
Section 2 elucidates the material and methods employed in the study. Class Train Test Total
Section 3 presents a detailed account of the proposed method. Section 4
Meningioma 1316 329 1645
reports on the Experimental results. Discussion of the findings is pre­
Pituitary 1406 351 1757
sented in Section 5, and finally, a Conclusion is drawn in Section 6. Glioma 1297 324 1621
No tumor 1600 400 2000
Total 5619 1404 7023

Fig. 1. Brief of the methodology.

4
M. Celik and O. Inik Expert Systems With Applications 238 (2024) 122159

Fig. 2. Some samples of the dataset.

technique called skip connection. This technique allows the network to


Table 3
learn more easily by directly adding the outputs of some layers to deeper
Information about model parameters (Shaukat et al., 2023).
layers. Additionally, batch normalization is used to train the network
CNN Model Input Size Depth Number of Parameters (Million) faster and more effectively. ResNet50 has a network structure with 50
DarkNet19 256x256 19 20.8 convolutional layers and can recognize 1000 different objects trained on
DarkNet53 256x256 53 41.6 the ImageNet dataset.
DenseNet201 224x224 201 18.9 ResNet101 (He et al., 2015) is a deeper and more complex CNN
EfficientNetB0 224x224 82 5.3
InceptionV3 299x299 48 23.9
model derived from ResNet50. This model uses a technique called skip
NasNetMobile 224x224 389 4.4 connections as a solution to the overfitting problems experienced in
ResNet50 224x224 50 25.6 previous ResNet models. ResNet101 has a network structure with 101
ResNet101 224x224 101 44.6 convolutional layers, consisting of convolutional, pooling, and normal­
Xception 299x299 71 22.9
ization layers. Additionally, the softmax function is used at the end of
the network to predict the class of the input image.
input of the model is a 224x224 image and the output is a softmax layer Xception (Chollet, 2016), is a deeper and more complex CNN model
used to identify and classify objects from the ImageNet (Deng et al., using convolutional layers and the depthwise separable convolution
2010) dataset containing 1000 class labels. technique. This technique reduces the computational cost and allows for
Darknet53 (Redmon and Farhadi, 2018) is a 53-layer CNN model better results by performing the convolution in two stages. Xception has
that utilizes a residual network structure, created using convolutional an architecture with 126 convolutional layers. It also uses batch
layers and skip connection blocks that combine the outputs of previous normalization to ensure faster and better training of the network. The
layers. This approach helps to reduce overfitting problems and allows output layers of the network estimate the class of the input image with
for faster training, despite the deeper network structure. Darknet53 is the softmax function.
commonly used as the base model for You Only Look Once V3 (YOLO
V3) in many object detection and classification tasks, with variations 2.3. Feature extraction
trained on different datasets for similar tasks.
DenseNet201 (Huang et al., 2016) is a 201-layer CNN model that Feature extraction is a process used to transform the information in a
heavily utilizes skip connection blocks to create deeper networks while dataset into a more understandable format. In other words, it is a
minimizing overfitting problems. The model includes structures called fundamental pre-processing step for ML algorithms, which usually re­
dense blocks, in which each block learns rich features using all the quires human intervention. However, with DL architectures, automatic
outputs of the previous block as input. DenseNet201 is trained on the feature extraction is performed on raw data, and this is one of the biggest
ImageNet dataset and can recognize 1000 different objects. advantages of DL architectures. While manual extraction requires
EfficientNetB0 (Tan and Le, 2019) is a CNN architecture designed specialized knowledge, automatic extraction does not require human
using the compound scaling technique, which increases the depth, intervention to extract features from signals or images. In particular,
width, and resolution of the network simultaneously to provide high CNNs are used to learn features in images. Each convolution layer
accuracy and efficiency. The EfficientNetB0 model consists of 8 recur­ consists of specialized filters that detect different features in the image.
rent blocks, each containing multiple convolutional layers optimized to The first convolution layers usually learn edges, corners, and other
learn more features using fewer parameters. simple features, while later layers learn more complex features. This
Inceptionv3 (Szegedy et al., 2015) is a CNN architecture that learns process happens automatically, without the need for human interven­
better features with fewer parameters using a special architecture called tion, and helps make images more understandable.
the Inception module. This module allows the network to learn different In this study, nine state-of-the-art CNN models and a new CNN model
features using filters of different sizes. Additional classification layers were utilized to extract features from images. The CNN models consist of
called auxiliary classifiers also enable faster network training. The two main components: feature extraction and classification. Custom
dropout technique helps prevent overfitting. Inceptionv3 can recognize layers of the state-of-the-art CNN models were utilized, as shown in
1000 different objects and is a high-performance model. There are also Table 4, generating 1000 features per image, which helped in reducing
lighter versions of the model that are optimized for use on mobile computational cost. The names of the feature layers in the table belong
devices. to the models in the deep learning library used.
NasNetMobile (Zoph et al., 2017) is a CNN architecture built using
the automated technique known as Neural Architecture Search, which 2.4. Machine learning algorithms
provides high accuracy. The model consists of multiple convolutional
layers and output layers optimized to learn more features using fewer AI is a field within computer science that focuses on developing
parameters. The network is made lighter and faster using a technique techniques, theories, and applications to enable machines to perform
called separable convolution. intelligent tasks. ML, a sub-branch of AI, involves the development of
ResNet50 (He et al., 2015) is a CNN architecture that includes con­ algorithms that can build models based on training data, enabling them
volutional layers in blocks to avoid overfitting problems using a to make predictions or decisions about new data without explicit pro­
gramming (Shaukat et al., 2020). ML has numerous applications,

5
M. Celik and O. Inik Expert Systems With Applications 238 (2024) 122159

Table 4 if-then rules. To construct the tree, entropy and information gain mea­
Information about feature layers of models. sures are used to select the best possible intermediate node (Shaukat
CNN Model Layer type Feature Layer Number of Extracted et al., 2020).
Name Features NB is a type of classifier based on Bayes’ theorem, which breaks
DarkNet19 Average Pooling avg1 1000 down the conditional probability of a problem being analyzed. This
Layer classifier works best with discrete attributes and is considered
DarkNet53 Convolutional conv53 1000 straightforward with a faster detection speed. There are three main
Layer techniques under Naïve Bayes: multinomial, Bernoulli, and Gaussian.
DenseNet201 Fully Connected fc1000 1000
Layer
Multinomial Naïve Bayes is used for discrete values, where feature
EfficientNetB0 Fully Connected head dense 1000 vectors represent the number of occurrences of an event. Bernoulli Naive
Layer MatMul Bayes is used for binary feature vectors, and an example is the “bags of
Inceptionv3 Fully Connected predictions 1000 words” technique. Gaussian Naive Bayes is a classifier used for contin­
Layer
uous values, where the values follow a Gaussian distribution (Shaukat
NasNetMobile Fully Connected predictions 1000
Layer et al., 2020).
ResNet50 Fully Connected fc1000 1000
Layer 3. Proposed method
ResNet101 Fully Connected fc1000 1000
Layer
Xception Fully Connected predictions 1000
In this section, a five-step workflow for a proposed CNN-ML method
Layer for multiclass classification of brain tumors is described, as shown in
Proposed CNN Fully Connected fc_1 1000 Fig. 3. The first step involves providing a dataset consisting of three
Layer different public datasets to the system. In the second step, the dataset is
resized to fit the input dimensions of CNN models. The third step in­
volves training and testing using the proposed CNN model and state-of-
including fault detection (Alharbi et al., 2023) and detection of diseases
the-art CNN models to demonstrate their classification performance. In
(Ali et al., 2022; Kumar et al., 2022).
the fourth step, features extracted from the CNN models are classified by
In the field of machine learning, algorithms can serve three main
ML algorithms with hyperparameter optimization using the Bayesian
purposes: clustering, prediction, and classification. The choice of algo­
optimization algorithm. Finally, the trained models are tested and their
rithm depends on the specific task it is intended to perform. For instance,
performance is evaluated using standard classification metrics such as
SVMs are typically used for classification, while linear regression is
Accuracy, Precision, Recall, F1-score, and AUC. Each step of the pro­
employed for prediction. This study employed four machine learning
posed method is explained in detail in the following sections.
algorithms, namely SVM, KNN, NB, and DT, which will be elaborated on
in this section.
3.1. Proposed CNN model
KNN is a type of learning algorithm that is unsupervised and is used
for clustering problems. It measures how different two pieces of data are
The purpose of this study is to classify brain MRI images into four
by using a distance function. It takes less time to train than other clas­
classes using a newly developed CNN model. The model is capable of
sifiers, but it takes a long time to compute during classification. This
identifying Glioma-0, Meningioma-1, No tumor-2, or Pituitary-3 from an
classifier assumes that similar data points are closer to each other than
input image. It consists of 25 layers, including the input layer, convo­
dissimilar data points. There are two main categories of KNN based on
lution layers with preprocessed (resized) images and ReLU activation
anomaly scores. The anomaly scores can be calculated by finding the
functions, batch normalization, maximum pooling, two fully connected
difference between the kth neighbor and data point or by calculating the
layers, and finally the softmax activation layer. The model architecture
density of each data instance. The value of the kth data point affects the
is illustrated in Fig. 4, and the parameter values for each layer are pre­
overall performance of the classifier. This classifier is sensitive to noisy
sented in Table 5.
data and the distance function used to calculate the difference between
The input layer of the model takes an image of size 224x224 pixels
data points. KNN requires a lot of storage space and is computationally
and has 3 channels including RGB color channels. The first convolution
expensive. The most common distance function used is Euclidean dis­
layer of the model uses 32 filters of size 1x1 and has a stride value of 1
tance (d(x, y)) to find the distance between data points × and y (Shaukat
and a padding value of 0. The second convolution layer also has 64
et al., 2020).
filters of size 1x1. This layer is followed by a pooling layer with a size of
SVM is a popular and successful ML technique used for classification
3x3 and a stride value of 2. The third convolution layer has 64 filters of
problems. It separates and classifies two different data classes based on
size 2x2 with a stride value of 1 and a padding value of 0. The fourth
the margin on either side of the hyperplane. The classifier can achieve
convolution layer has 64 filters of size 3x3 and again has a stride value of
high accuracy in classifying a data point by maximizing the margin and
1 and a padding value of 0. The fifth convolution layer has 128 filters of
distances between hyperplanes. The data points that lie on the border of
3x3 size and has a stride value of 1 and a padding value of 0. This layer is
the hyperplane are called support vector points. SVM is categorized into
followed by another pooling layer, this time with a size of 5x5 and a
two types: linear and non-linear, which depend on the kernel function
stride of 2. The sixth convolution layer has 128 filters of size 3x3 and has
used. Additionally, SVM can be one-class or multi-class, depending on
a stride value of 1 and a padding value of 0. The seventh convolution
the detection type. However, SVM requires a lot of memory for pro­
layer has 128 filters of size 4x4 and again has a stride value of 1 and a
cessing and training time. To learn a dynamic user’s behavior, SVM
padding value of 0. This is followed by another pooling layer with a size
needs training at different time intervals for better results. The perfor­
of 3x3 and a stride of 5. Finally, there are two fully connected (FC)
mance of the classifier is also affected by the kernel function and its
layers, the first with a size of 1x1x1000 and the second with a size of
parameters (Shaukat et al., 2020).
1x1x4.
DT is a supervised ML technique that uses a tree-like structure. The
tree is made up of three parts: a root or intermediate node, path, and leaf
node. The root/intermediate node represents an object/attribute, and 3.2. ML optimization with Bayesian optimization algorithm
each path that diverges from it represents the possible values of the
parent node (object). The leaf node represents the predicted category/ The machine learning algorithms use the extracted features from
classified attribute. The resulting tree is then represented in the form of each CNN model’s feature layers to make classifications. To achieve the
highest possible accuracy in classification, it’s important to identify the

6
M. Celik and O. Inik Expert Systems With Applications 238 (2024) 122159

Fig. 3. Flowchart of proposed method.

Fig. 4. Proposed CNN model.

factors that affect the accuracy of the algorithms and choose the best consider the balance between accuracy and computational complexity
values based on the selected features. In the literature, there are studies when deciding which kernel to use.
on parameter optimisation of DL models and ML algorithms (Inik, 2023; SVM algorithm has three hyperparameters: kernel type, box
Inik and Ulker, 2022; Ini˙k et al., 2021). In this study, hyperparameters of constraint level, and multiclass method. The kernel type decides how the
each ML algorithm were selected through Bayesian optimization and a data is transformed before classification and can be linear, Gaussian
standard eight-fold cross-validation procedure during the training pro­ Radial Basis Function (RBF), quadratic, or cubic. The box constraint
cess. Different ML algorithms have unique hyperparameters that can be level controls the range of Lagrange multiplier values, which impacts the
optimized. For example, KNN has hyperparameters like distance metric, number of support vectors and training time. Selecting these hyper­
distance weight, and number of neighbors; SVM has hyperparameters parameters carefully is essential for creating an efficient classifier. The
like box constraint level, multiclass method, and kernel type; DT has multiclass method converts the multiclass classification problem into
hyperparameters like maximum number of splits and split criterion; and several binary classification subproblems using one-vs-one or one-vs-all
NB has hyperparameters like kernel type and distribution name. approaches. The one-vs-one method trains a separate SVM learner for
The selection of the split criterion and the maximum number of splits each class pair, while the one-vs-all method trains a single SVM learner
play a crucial role in the accuracy of the DT classifier. The maximum for each class to differentiate it from all other classes.
number of splits controls the depth of the tree and influences the accu­ KNN algorithm has three important settings to be optimized for
racy and generalization of the model. While deeper trees can have high achieving the best results: the number of neighbors, distance metric, and
accuracy rates, they can also be prone to overfitting and may not distance weight. The number of neighbors represents how many nearby
perform well on new test sets. On the other hand, shallower trees are neighbors should be considered for classifying each location. The dis­
more robust and tend to have similar training accuracy rates as repre­ tance metric calculates the distance between data points, and there are
sentative test sets. The split criterion is used to determine the optimal different options available like Euclidean, City block, and more. The
splitting point of a node in the tree and can be based on measures such as distance weight changes the importance of each distance in the calcu­
the Gini diversity index, Twoing rule, or maximum deviance reduction. lation, and it can be set to Equal, Inverse, or Squared inverse weighting.
The objective of DT is to produce pure nodes with a single class, and To determine the optimal number of neighbors, a logarithmic search is
impurities in nodes can be quantified using metrics such as the Gini conducted in the range of [1,max(2,round(n/2))], where n is the number
index or deviation criteria. The Twoing rule is an alternative criterion of observations. The algorithm searches for the best results in the dis­
that maximizes node purity by maximizing the Twoing rule expression. tance weight setting as well(Aslan et al., 2022).
The type of kernel used in a NB algorithm affects how accurate the Tables 6 and 7 show the hyperparameter values that were found
predictions are. There are four options: Gaussian, Box, Epanechnikov, using Bayesian optimization. These hyperparameter values were then
and Triangle. Each kernel has its own strengths and weaknesses. used to train and test the machine learning algorithm. The training
Gaussian Naive Bayes models use a Gaussian distribution to calculate the phase of each algorithm was set to terminate after 20 iterations.
probabilities of each class, while Kernel Naive Bayes models use a Bayesian optimization is a commonly used and effective method for
chosen kernel distribution. The choice of kernel type should depend on hyperparameter optimization. Equation (1) (Wu et al., 2019) represents
the data and the desired properties of the model. It is important to the fundamental formula utilized in the Bayesian optimization

7
M. Celik and O. Inik Expert Systems With Applications 238 (2024) 122159

Table 5 integers, or categorical variables with discrete labels. In hyperparameter


Parameter values at each layer of the proposed CNN model. optimization, f(x) is usually an objective function or a Gaussian Process
Layer Name Activation Learnable Parameters Total Learnable (GP) model, evaluated on a validation set, and X is the set of possible
Maps Parameters values for the hyperparameters. Bayesian optimization is based on
Input 224x224x3 – 0 Bayes’ theorem, which is shown in Eq. (2). Bayes’ theorem uses previous
Conv2D-1 224x224x32 Weights:1x1x3x32, 128 information to make an inference. Eq. (2) shows that the posterior dis­
Bias:1x1x32 tribution is proportional to the likelihood (P(Z|X )) and the prior distri­
ReLu-1 224x224x32 – 0 bution (P(X)), and the belief or forecast is represented by P(X|Z ) (Aslan
Conv2D-2 224x224x64 Weights:1x1x32x64, 2112
Bias:1x1x64
et al., 2022).
ReLu-2 224x224x64 0

P(X|Z )∝P(Z|X )P(X) (2)
MaxPool2D-1 111x111x64 – 0
BatchNorm-1 111x111x64 Offset: 1x1x64, 128
Scale: 1x1x64 The process of finding the best hyperparameter in a search space can be
Conv2d-3 110x110x64 Weights:2x2x64x64, 16,448 time-consuming when using the objective function in machine learning
Bias:1x1x64 algorithms. To make the optimization process more efficient, a proxy
ReLu-3 110x110x64 – 0 model is often used to estimate the objective function. Instead of eval­
Conv2D-4 108x108x64 Weights:3x3x64x64, 36,928
Bias:1x1x64
uating the objective function at all points in the search space, iterative
ReLu-4 108x108x64 – 0 methods are used to focus on areas that are more likely to contain the
Conv2d-5 106x106x128 Weights: 3x3x64x128, 73,856 optimal solution, which reduces the number of function evaluations
Bias:1x1x128 needed to find the best hyperparameter (Aslan et al., 2022). Bayesian
ReLu-5 106x106x128 0

optimization is a technique that incorporates prior models into the
MaxPool2D-2 51x51x128 – 0
BatchNorm-2 51x51x128 Offset: 1x1x128, 256 search for optimal solutions. It can use different types of probability
Scale: 1x1x128 models, but the most commonly used one is the Gaussian Process (GP).
Conv2D-6 49x49x128 Weights: 3x3x128, 147,584 GP is used as a surrogate model for objective functions in Bayesian
Bias: 1x1x128 optimization, as it represents a stochastic process that collects random
ReLu-6 49x49x128 0
variables. These variables are often indexed by space or time and their

Conv2D-7 46x46x128 Weights: 4x4x128x128, 262,272
Bias:1x1x128 distribution is described by a multivariate Gaussian distribution (Fraz­
ReLu-7 46x46x128 – 0 ier, 2018).GP can be expressed using Eq. (3) as a mean function (m(x))
MaxPool2D-3 9x9x128 0

and covariance (k(x, x′)). Eq. (4) shows how the f function maps inputs x
BatchNorm-3 9x9x128 Offset: 1x1x128, 256
Scale: 1x1x128
to outputs y. Eq. (5) models the noise (∊) with a Gaussian distribution
FC-1 1x1x1000 Weights: 1x1x1000, 10,369,000 (Bishnoi et al., 2021). The GP is updated by combining the newly ob­
Bias:1000x1 tained sample point (y) with the previously acquired samples.
FC-2 1x1x4 Weights: 4x1000, 4004
Bias:4x1 f (x) GP(m(x), k(x, x′) ) (3)
SoftMax 1x1x4 – 0
Classification 1x1x4 – 0 y = f (x) + ∊ (4)
Output
Number of total learnable parameters 10,912,972 ( )
∊ N 0, σ 2
(5)

technique. 4. Experimental results


x* = argmaxf (x), x ∈ X (1)
This section presents the performance metrics, parameter optimiza­
Eq. (1) is used to find the best value of a function called f(x). It does this tion results, results of proposed model, comparison results with pre-
by finding the value of x* that gives the best result for the function. The trained CNN models, time complexity of models and comparison with
× variable can include hyperparameters that can be real numbers, previous works. MATLAB 2021b software was used to implement the

Table 6
Results of hyperparameter optimization (1).
ML Algorithm Hyperparameters CNN Models
Darknet19 Darknet53 Densenet201 Efficientnetb0 Inceptionv3

DT Max. Number of Splits 190 256 427 641 263


Split Criterion MDR Twoing rule MDR MDR MDR
Total Elapsed Time(s) 259.37 240.54 249.54 240.15 243.26

NB Distribution Name Kernel Kernel Kernel Kernel Kernel


Kernel Type Gaussian Gaussian Gaussian Gaussian Gaussian
Total Elapsed Time(s) 14,256 14,416 14,293 13,581 14,738

KNN Number of Neighbors 1 1 2 1 1


Distance Metric Correlation Cosine Correlation Spearman Spearman
Distance Weight Inverse Inverse Inverse Inverse Inverse
Total Elapsed Time(s) 366.33 352.12 463.32 1143.5 398.98

SVM Kernel Type Quadratic Quadratic Quadratic Quadratic Quadratic


Box Constraint Level 173,4434 344,7146 0,0010016 999,6984 3,581
Multiclass Method One-vs-All One-vs-All One-vs-All One-vs-All One-vs-All
Total Elapsed Time(s) 17,499 14,495 14,132 15,677 22,218

8
M. Celik and O. Inik Expert Systems With Applications 238 (2024) 122159

Table 7
Results of hyperparameter optimization (2).
ML Algorithm Hyperparameters CNN Models
Nasnetmobile Resnet50 Resnet101 Xception Proposed

DT Max. Number of Splits 316 441 347 529 510


Split Criterion MDR MDR MDR MDR MDR
Total Elapsed Time(s) 258.35 245.99 251.29 255.5 278.49

NB Distribution Name Kernel Kernel Kernel Kernel Kernel


Kernel Type Gaussian Gaussian Gaussian Gaussian Gaussian
Total Elapsed Time(s) 14,259 16,958 14,141 14,931 14,557

KNN Number of Neighbors 1 1 1 1 1


Distance Metric Cosine Spearman Correlation Spearman City block
Distance Weight Squared inverse Inverse Squared inverse Inverse Inverse
Total Elapsed Time(s) 657.05 780.39 1009.6 394.36 593.71

SVM Kernel Type Quadratic Quadratic Gaussian Quadratic Linear


Box Constraint Level 0.0010033 1.0010013 953.0343 0.0010028 27.2379
Multiclass Method One-vs-All One-vs-One One-vs-All One-vs-All One-vs-One
Total Elapsed Time(s) 15,856 13,077 16,411 18,863 9665,8

deep learning and machine learning libraries in the experimental


2TP
studies. The computer used for the experiments had an Intel(R) Core F1 − score = (9)
2TP + FP + FN
(TM) i5-8400 CPU @ 2.80 GHz (6 CPUs), 2.8 GHz, 16 GB RAM, and a
GPU NVIDIA GeForce GTX 1080 Ti with 11 GB memory. True Positive (TP) refers to the number of positive samples correctly
classified by the model. True Negative (TN) is the number of instances
4.1. Performance metrics that a model classifies as negative are actually negative. That is, TN is
the number of instances in the negative class that are correctly classified.
Various metrics are used to evaluate the performence of ML and DL False Positive (FP) refers to the number of negative instances that are
models. Different learning tasks may require different performance incorrectly classified as positive by the model. False Negative (FN) refers
metrics to be emphasized (Shaukat et al., 2020). A confusion matrix is to the number of positive samples misclassified as negative by the model
one way to formally present the performance of a learning model. A (Shaukat et al., 2020). In addition to these metrics, the complexity
confusion matrix, also known as an error matrix, is a table summarizing analysis of the models is also included in Section 4.5 to compare model
the performance of a classification or prediction model, and a number of performances.
classification performance metrics can be defined based on the confu­
sion matrix (Deng et al., 2016). Below are the commonly used metrics
4.2. Hyperparameter optimization results
that were selected as performance metrics for this study.
Accuracy can be defined as the ratio of correctly classified instances
The previous section provided a detailed discussion on the Bayesian
to the total number of predictions (Eq. (6). This is a key performance
optimization of hyperparameters. In this section, the optimization out­
metric that measures how accurate the classification model produces
comes are presented in Tables 6 and 7. As evident in these tables, each
(Deng et al., 2016). A high accuracy value reflects the accuracy of the
ML algorithm has distinct hyperparameters, and these values were ob­
classification model and therefore a higher accuracy value is preferred
tained through 20 iterations. However, the total computation times
(Shaukat et al., 2020).
varied across all algorithms. Notably, the DT algorithm was the fastest,
TP + TN concluding the optimization process in 240.15 s for the Efficientnetb0
Accuracy = (6)
TP + TN + FP + FN CNN model and 278.49 s for the proposed CNN model. Nonetheless, as
indicated in Table 8, DT had lower accuracy rates than other algorithms
Precision (Eq. (7) is a measure of accuracy in particular for accurately in most cases. On the other hand, SVM algorithm was the slowest for
predicting a class (Deng et al., 2016). most runs, requiring 22,218 s for the Inceptionv3 CNN model and
TP 9665.8 s for the proposed CNN model. However, SVM demonstrated the
Precision = (7) highest classification accuracy among all other algorithms.
TP + FP

Recall (Eq. (8) is a measure of the ability of a prediction model to 4.3. Results of proposed CNN model
accurately detect instances of a given class from a dataset (Deng et al.,
2016)Recall indicates how complete the detection of a class is. The proposed CNN model was trained for 20 epochs using a mini-
TP batch size of 64 and 1740 iterations, taking 56 min and 50 s. The Sto­
Recall = (8) chastic Gradient Descent with Momentum (SGDM) solver was used with
TP + FN
an initial learning rate of 0.001. Fig. 5 displays the training and vali­
The F1-score is a metric commonly used to evaluate the performance of dation graphs for the proposed CNN model, depicting the loss values and
machine learning or deep learning models. It is calculated by combining accuracy. The training accuracy was 100 % with a loss of 0, while the
the precision and recall of the model (Eq. (9). The F1-score can be validation accuracy was 95.66 % with a loss of 0.13 %. Additionally, the
particularly useful when the class distribution is unbalanced (i.e., when confusion matrices for the proposed CNN and CNN-ML models are
the number of samples in one class is significantly different from presented in Fig. 6.
another) and the user is seeking a trade-off between precision and recall. Fig. 6 shows that the CNN-KNN architecture had the lowest
A higher F1-score indicates that the model performs better than other misclassification error rate among all the models. The misclassification
models (Shaukat et al., 2020). errors for glioma, meningioma, tumor, and pituitary were 9, 20, 4, and

9
M. Celik and O. Inik Expert Systems With Applications 238 (2024) 122159

Table 8 4.4. Comparison results with state-of-the-art CNN models


Performance metrics obtained from confusion matrices.
Model Method Accuracy (%) Precision Recall F1-Score Brain MRI image classification was performed using both CNN and
ML algorithms. The ML algorithms utilized features derived from State-
Darknet19 CNN 92.95 0.93 0.93 0.93
DT 76 0.75 0.76 0.76 of-the-art CNN models, and Bayesian optimization was applied during
NB 75.95 0.75 0.75 0.74 training to optimize the hyperparameters of ML algorithms to enchance
KNN 94.66 0.95 0.95 0.95 their classification accuracy. The hyperparameters obtained for each
SVM 95.58 0.96 0.96 0.96 model are shown in Tables 6 and 7, and were not altered during the
training and testing phases. Four ML techniques were employed for
Darknet53 CNN 93.8 0.94 0.94 0.94 classifying the features of each model. The results of performance met­
DT 76 0.75 0.76 0.75
rics obtained from the confusion matrix of all CNN models are presented
NB 78.42 0.78 0.78 0.77
KNN 95.66 0.96 0.96 0.96 in Table 8, which indicates that the features extracted from each CNN
SVM 96.87 0.96 0.96 0.96 model were accurately classified using ML techniques that employed
hyperparameters generated via Bayesian optimization. The proposed
Densenet201 CNN 93.16 0.93 0.93 0.93 CNN model achieved the highest classification accuracy of 95.66 %
DT 78.77 0.78 0.78 0.78 among the CNN models. However, the SVM emerged as the strongest ML
NB 80.84 0.80 0.80 0.80 method for classification, achieving the highest accuracy of 97.93 % by
KNN 96.72 0.97 0.97 0.97
extracting features from brain MRI images using EfficientNetb0 and
SVM 97.01 0.97 0.97 0.97
classifying them using the SVM. After feature extraction, the proposed
CNN model achieved the highest accuracy of 97.15 % when classified
Efficientnetb0 CNN 95.51 0.95 0.96 0.95
using the KNN algorithm. The EfficientNetB0-SVM structure had preci­
DT 82.3 0.82 0.82 0.82
NB 82.5 0.82 0.83 0.82 sion, recall, and F1-Score values of 0.98, 0.98, and 0.98, respectively,
KNN 97 0.97 0.97 0.97 which are displayed in Table 8 and are higher compared to those ob­
SVM 97.93 0.98 0.98 0.98 tained using alternative CNN models. The ROC curves of all CNN-ML
structures are illustrated in Fig. 9. The average accuracy values across
Inceptionv3 CNN 91.74 0.92 0.92 0.92 all CNN models were 93.56 %, 95.99 %, 95.46 %, 77.78 %, and 76.04 %
DT 72.65 0.72 0.72 0.72 for the CNN, SVM, KNN, NB, and DT classification methods, respec­
NB 77.35 0.77 0.77 0.76
tively, suggesting that the classification of the structure and overall
KNN 94.52 0.94 0.95 0.94
SVM 95.3 0.95 0.96 0.95 prediction are more accurate.
Fig. 7 displays the confusion matrices resulting from the classifica­
tion outcomes of all pre-trained CNN models. The confusion matrices
Nasnetmobile CNN 90.03 0.9 0.9 0.9
DT 71.37 0.71 0.7 0.7 demonstrate that the no tumor class is the best-predicted class for each
NB 73.72 0.73 0.73 0.72 model, while the worst-predicted class varies among the models. Across
KNN 93.09 0.93 0.93 0.93 all CNN models, meningioma is the worst-predicted class. Meningioma
SVM 95.01 0.95 0.95 0.95
is also the worst-predicted class for the CNN-DT, CNN-NB, CNN-KNN,
and CNN-SVM architectures. The worst-predicted class for the CNN-
Resnet50 CNN 95.09 0.95 0.95 0.95 NB structures varies between meningioma and pituitary. The
DT 80.48 0.80 0.80 0.80
Darknet19-NB, Darknet53-NB, Inceptionv3-NB, and Nasnetmobile-NB
NB 78.70 0.78 0.79 0.78
KNN 96.15 0.96 0.96 0.96 structures perform poorly in classifying the pituitary class, while the
SVM 96.30 0.96 0.97 0.96 other CNN-NB structures are not successful in classifying the meningi­
oma class.
Resnet101 CNN 95.44 0.95 0.95 0.95 Based on the confusion matrices, the different hybrid models dis­
DT 76.85 0.76 0.76 0.76 played misclassifications. In particular, the DarkNet19-SVM hybrid
NB 75.43 0.75 0.75 0.75 model misclassified 62 tumors, with 10 Gliomas, 32 Meningiomas, 4 No
KNN 94.09 0.94 0.94 0.94
tumor, and 16 Pituitary labels being misclassified. The DarkNet53-SVM,
SVM 95.87 0.96 0.96 0.96
DenseNet201-SVM, EfficientNetB0-SVM, InceptionV3-SVM,
NasNetMobile-SVM, ResNet50-SVM, ResNet101-SVM, Xception-SVM,
Xception CNN 92.24 0.92 0.92 0.92
and proposed CNN-KNN hybrid models also had varying numbers of
DT 74.29 0.74 0.74 0.74
NB 79.49 0.79 0.79 0.79 misclassifications. The DarkNet53-SVM hybrid model also had some
KNN 95.51 0.95 0.95 0.96 misclassifications with 7 Gliomas, 33 Meningiomas, 4 No tumor, and 14
SVM 96.23 0.96 0.96 0.96 Pituitary labels misclassified as tumors. The DenseNet201-SVM hybrid
model showed misclassifications with 8 Gliomas, 25 Meningiomas, 1 No
Proposed CNN CNN 95.66 0.96 0.96 0.96 tumor, and 8 Pituitary labels misclassified as tumors. The
DT 71.72 0.71 0.71 0.71 EfficientNetB0-SVM hybrid model misclassified 4 Gliomas, 20 Menin­
NB 75.57 0.75 0.75 0.75
giomas, 1 No tumor, and 4 Pituitary labels as tumors. The InceptionV3-
KNN 97.15 0.97 0.97 0.97
SVM 93.86 0.94 0.94 0.94 SVM hybrid model also had misclassifications with 16 Gliomas, 36
Meningiomas, 3 No tumor, and 11 Pituitary labels misclassified as tu­
mors. The NasNetMobile-SVM hybrid model misclassified 22 Gliomas,
7, respectively. The CNN architecture was the second-best model with 25 Meningiomas, 9 No tumor, and 14 Pituitary labels as tumors. The
misclassification errors for glioma, meningioma, tumor, and pituitary ResNet50-SVM hybrid model misclassified 6 Gliomas, 28 Meningiomas,
calculated at 14, 28, 9, and 10, respectively. This suggests that menin­ 5 No tumor, and 12 Pituitary labels as tumors. The ResNet101-SVM
giomas are the most difficult type of brain tumor to classify accurately. hybrid model misclassified 9 Gliomas, 34 Meningiomas, 3 No tumor,
The accuracy of the CNN model and CNN-ML structures were 95.66 %, and 12 Pituitary labels as tumors. The Xception-SVM hybrid model
71.72 %, 75.57 %, 97.15 %, and 93.86 for the CNN, DT, NB, KNN, and misclassified 9 Gliomas, 31 Meningiomas, 4 No tumor, and 9 Pituitary
SVM, respectively. labels as tumors. The proposed CNN-KNN hybrid model misclassified 9
Gliomas, 20 Meningiomas, 4 No tumor, and 7 Pituitary labels as tumors,

10
M. Celik and O. Inik Expert Systems With Applications 238 (2024) 122159

Fig. 5. Accuracy and loss curves for proposed CNN model.

Fig. 6. Confusion matrices obtained from proposed CNN model and ML algorithms. Glioma = 0, Meningioma = 1, No tumor = 2, Pituitary = 3. The vertical axis of
each confusion matrix represents the actual classes while the horizontal axis of each confusion matrix represents predicted classes.

according to the confusion matrix. helps to identify the most efficient model, is shown in Fig. 10. This graph
The AUC-ROC curve, a critical evaluation metric in ML and DL, is is used to evaluate the size and performance of a model together. Size is
essential for assessing the effectiveness of a classification task, particu­ important as it determines the computational cost. This type of graph
larly for imbalanced datasets. It represents the accuracy of the model in has been used in studies (Tan and Le, 2019) where state-of-the-art
making predictions, with the ROC curve reflecting a probability curve models are compared with different models. In the graph, the x-axis is
for different classes. The X-axis denotes the False Positive Rate (FPR), the number of parameters, and the y-axis is the accuracy. The models are
while the Y-axis denotes the True Positive Rate (TPR). For the CNN-SVM given in ascending order according to the number of parameters. Ac­
architectures, Figs. 8 and 9 illustrate the ROC curves and AUC values of cording to this sequence, the graph was obtained by giving the accuracy
CNN-ML structures. The proposed CNN-KNN structure has AUC values value of each model. By looking at this graph, one can be easily seen the
of 0.98, 0.97, 0.98, and 1, respectively. It is evident that all CNN-SVM measure of the accuracy of the models according to the number of pa­
structures have AUC values of 1, indicating excellent predictive rameters. This graph is a tool to study the relationship between model
accuracy. size and accuracy. In this aspect, EfficientNetB0 and NasNetMobile
models are remarkable in terms of their size, but when analyzed in terms
of model accuracy and model size, EfficientNetB0 and the proposed CNN
4.5. Complexity of models
model attract more attention. Moreover, the performance of these
models is better than the other models on the used dataset, and the lower
In addition to accuracy, the complexity of deep learning models
model sizes provide the advantage of less computational cost.
should also be considered. As such, this study has included graphs that
Secondly, the time complexity of the models used for brain tumor
depict accuracy-number of parameters, accuracy-floating point opera­
classification was examined. All classification models took a significant
tions per second (FLOPs), and model-time to better express the
amount of time, depending on the complexity and architectural design
complexity of the models.
of the model. The time was measured in minutes and Fig. 11 shows the
The first of these, the model size and model accuracy diagram, which

11
M. Celik and O. Inik Expert Systems With Applications 238 (2024) 122159

Fig. 7. Confusion matrices of models.

training times of the models. The InceptionV3-SVM model gave good and completed the training time in 261 min. In addition, EfficientNetB0-
classification results despite having thse longest training time (370 min). KNN has both a low training time (79.06 min) and a high level of ac­
The proposed CNN-KNN model had the second-best classification results curacy. The DenseNet201-SVM model performed well in terms of both
and completed the classification process in the shortest time (67 min). training time and classification accuracy, with a training time of 236
The training of ResNet50-SVM and the proposed CNN-SVM models took min and ranking third in terms of performance. The Xception-SVM
218 min, which is the shortest time for CNN-SVM classifiers. The model was the second model to complete training in over 300 min
EfficientNetB0-SVM model achieved the highest classification accuracy (314 min). The ResNet101-SVM model lagged behind the ResNet50-

12
M. Celik and O. Inik Expert Systems With Applications 238 (2024) 122159

Fig. 8. ROC curves and AUC values of CNN-ML structures (a).

SVM model in terms of both training time and classification performance time, increases, which ultimately affects the efficiency of the network. In
and took 274 min to train. Finally, NasNetMobile-SVM has a training addition, this study concludes that EfficientNetB0 and the proposed CNN
time of 244 min, DarkNet53-SVM 242 min, and DarkNet19-SVM 292 models are the best approach for classifying brain tumors.
min. However, it can be seen from the results that as the depth of the Finally, the accuracy ratios of the models based on the FLOPs value,
network increases, the computational complexity, and thus the training which is an important parameter expressing computational complexity,

13
M. Celik and O. Inik Expert Systems With Applications 238 (2024) 122159

Fig. 9. ROC curves and AUC values of CNN-ML structures (b).

are presented in Fig. 12. The FLOPs value of a model is related to the advantageous to develop models with lower FLOPs while achieving high
amount of GPU hardware resources that the model uses. A high FLOPs accuracy in model design. Fig. 12 shows that although the model with
value indicates a high computational cost, which can be a disadvantage the lowest FLOPs is NasNetMobile, its accuracy is low. On the other
for the model. Models with high FLOPs may not be suitable for running hand, even though the FLOPs value of the proposed CNN model is
on lower hardware devices such as mobile devices. Therefore, it is relatively high, its accuracy is still higher than that of the NasNetMobile

14
M. Celik and O. Inik Expert Systems With Applications 238 (2024) 122159

Fig. 10. Model size and model accuracy comparison of DL models used in the proposed method.

model. Looking at all the models, it can be seen that the FLOPs/accuracy
value of EfficientNetB0 is higher.

4.6. Comparison with previous studies

Table 9 shows the comparison of the proposed method with the


existing studies on the classification of brain MRI images. Specifically,
the proposed method is compared with existing ML and DL methods.
Table 9 presents a detailed comparison of DL and ML models for brain
tumor classification. The table includes the accuracy values, which is the
most commonly used performance measure in all the studies. The first
column includes previous studies, while the second column describes
the method used in each study. This can be a model proposed by the
authors or the use of state-of-the-art models. The third column describes
the dataset used in each study. The data set is emphasized as it is an
indicator that significantly affects the results. In this study, three
Fig. 11. Training time of models.
different data sets were combined and contained a significant amount of
data. In some studies, the healthy brain class was not taken into account
when performing the classification task, but in this study, this class was
also taken into account. In addition, since data augmentation also has an
impact on classification accuracy, the presence or absence of data
augmentation was also taken into account in the table. To compare and
evaluate the best available approaches, the best method and the best

Fig. 12. Accuracy comparison of DL models used in the proposed method according to FLOPs value.

15
M. Celik and O. Inik Expert Systems With Applications 238 (2024) 122159

Table 9
Comparison of the proposed method with previous studies.
Referance Model Dataset Number of Data Best Model Accuracy
Classes Augmentation

(Díaz-Pernas et al., 2021) Multi Scale CNN Figshare 3 Yes – 0.9730


(Sultan et al., 2019) CNN Figshare 3 Yes – 0.9613
(Ullah et al., 2022a) TL SARTAJ 3 Yes InceptionResNetV2 0.9891
(Deepak and Ameer, CNN-SVM Figshare 3 Yes CNN-SVM 0.9582
2021)
(Nayak et al., 2022) CNN Figshare 3 Yes Dense Efficient-Net 0.9997
(Alanazi et al., 2022) TL-CNN Figshare, SARTAJ, Br35H 3 Yes Proposed TL-CNN 0.9575
(Wahlang et al., 2022) LeNet Inspired Model Figshare, Brainweb, Radiopedia 3 No – 0.8800
combination
(Raza et al., 2022) TL- DeepTumorNet Figshare 3 No DeepTumorNet 0.9967
(Maqsood et al., 2022) Hybrid MobileNetV2 -M− SVM Figshare and BraTS 2018 3 No Hybrid MobileNetV2 0.9892
(Amran et al., 2022) Hybrid GoogLeNet and TL Br35H 3 Yes Hybrid GoogLeNet 0.9910
(Ullah et al., 2022b) TumorResNet and TL Abhranta Panigrahi 2 Yes TumorResNet 0.9933
(Deepak and Ameer, CNN-SVM, CNN-KNN Figshare 3 Yes CNN-KNN 0.9800
2019)
(Latif et al., 2022) CNN-ML BraTS 2018 (Glioma) 3 No CNN-SVM 0.9616
(Gómez-Guzmán et al., CNN with TL Figshare, Br35H and SARTAJ 4 Yes Proposed Generic 0.8105
2023) combination CNN
Inception V3 0.9712
Proposed Method State of the art CNN- Figshare, Br35H and SARTAJ 4 No EfficientNetB0-SVM 0.9793
optimized ML combination
Proposed Method Proposed CNN-optimized ML Figshare, Br35H and SARTAJ 4 No CNN-KNN 0.9715
combination

result for each study are included in the table. According to Table 9, effective approach for comprehensive learning through the isolation of
EfficientNetB0-SVM and the proposed CNN-KNN models outperform handcrafted feature extraction components. These models have mainly
some state-of-the-art models. These methods are capable of extracting utilized TL for training. Furthermore, various studies (Deepak and
stronger and more distinguishable deep features and give the best results Ameer, 2019, 2021; Latif et al., 2022) have been conducted wherein the
for classification. Moreover, since the dataset used is balanced, there is a features derived from CNN models are classified using conventional ML
sufficient number of images to train the network. In contrast, the data­ algorithms. Previous studies have certain limitations, including the use
sets used in some studies have only two classes and not enough data for of default hyperparameters for traditional ML methods, which can have
classification. Therefore, the dataset used in this study is more complex a considerable impact on classification accuracy. This paper introduces a
in terms of the number of images and classes and is a more difficult novel approach to brain tumor classification by presenting a newly
classification problem than others. When Table 9 is analyzed, it was developed CNN model that is trained from scratch, without relying on
observed that the highest classification accuracy value was obtained for transfer learning techniques. The proposed methodology offers a hybrid
the dataset utilized in this study. The second-highest classification ac­ architecture that integrates both DL and ML methods and is different
curacy was achieved by the proposed CNN method. It is noteworthy that from previous research on brain tumor classification. Specifically, the
while state-of-the-art models have been trained on millions of images proposed method adopts a novel hybrid approach that combines DL and
previously, the proposed CNN model was trained solely on this dataset. ML techniques, and includes feature extraction to improve classification
This outcome implies that the proposed model can be utilized effectively accuracy. The novelty of the proposed method lies in feeding the
for the classification of brain MRI images. Furthermore, despite not extracted features into ML algorithms that are optimized using a
having the same datasets, the proposed CNN has been observed to Bayesian optimization algorithm to improve the classification accuracy.
achieve superior outcomes compared to studies that perform triple By leveraging the strengths of DL and ML methods, the proposed
classification to address the same problem. approach achieves higher accuracy in the classification of brain tumors.
The hybrid model is proven to be efficient, rapid, and reliable, producing
5. Discussion promising results both through direct use of the CNN model and through
ML algorithms utilizing features extracted from the CNN model. Overall,
The task of classifying brain tumors is a challenging and time- the proposed method offers several benefits over existing state-of-the-art
consuming process. Radiologists typically use brain MRI images to di­ models, including its innovative training approach, the elimination of
agnose and categorize malignant tumors, but the possibility of human the need for a separate feature extraction engine, and the use of a hybrid
error remains. To overcome this issue, DL + ML algorithms have been model that combines CNN and ML techniques.
employed to accurately classify brain tumors, as well as other internal Based on the findings presented in Figs. 7, 8, and 9, as well as Ta­
abnormalities. In this filed of study, both ML algorithms and DL models bles 8 and 9, it is evident that the proposed CNN and CNN-ML structures
are commonly used. The use of DL is preferred for its ability to directly exhibit exceptional performance in classifying brain tumors. The results
extract features and classify outputs from images via DNN. However, indicate that the proposed CNN-KNN structure outperforms other CNN-
this approach demands large datasets and a considerable number of ML structures, except for the EfficientnetB0-SVM structure, which
parameters in the deep network. In contrast, traditional ML algorithms demonstrated high accuracy in image classification. Furthermore, the
only accept specific image features as inputs, which can lead to time- proposed CNN model achieved the highest accuracy rate among stand­
consuming preprocessing. As a result, researchers have focused on alone CNN models and surpassed state-of-the-art CNN models in terms
combining DL and ML techniques to enhance classification accuracy. of accuracy when combined with the KNN algorithm. These results
In this section, the proposed model is assessed and compared to highlight the potential of the proposed CNN and CNN-ML models for
existing state-of-the-art models, acknowledging that direct comparisons accurate classification of brain tumors. In addition, in section 4.5, model
may be challenging due to variations in datasets, analytical methods, complexities are analyzed and evaluated in terms of time and compu­
and simulation settings. CNN-based models(Díaz-Pernas et al., 2021; tation. The classification accuracies and parameter numbers of the
Nayak et al., 2022; Sultan et al., 2019) have been proposed as an models are also compared. Fıgure 10 shows the accuracy of different

16
M. Celik and O. Inik Expert Systems With Applications 238 (2024) 122159

models in relation to the number of parameters. It is useful for analyzing Data availability
the relationship between model size and accuracy. EfficientNetB0 and
NasNetMobile are notable for their small size, but when considering Data will be made available on request.
both size and accuracy, EfficientNetB0 and the proposed CNN model
stand out. These models perform better than others on the dataset used References
and their smaller size makes them computationally efficient. Fig. 11
shows training times of the models. The different models were evaluated Aamir, M., Rahman, Z., Dayo, Z. A., Abro, W. A., Uddin, M. I., Khan, I., Imran, A. S.,
Ali, Z., Ishfaq, M., Guan, Y., & Hu, Z. (2022). A deep learning approach for brain
in terms of their training times and classification accuracy. The tumor classification using MRI images. Computers and Electrical Engineering, 101,
InceptionV3-SVM model had the longest training time (370 min), but Article 108105. https://doi.org/10.1016/J.COMPELECENG.2022.108105
gave good classification results. The proposed CNN-KNN model had the Abd-Ellah, M. K., Awad, A. I., Khalaf, A. A. M., & Hamed, H. F. A. (2018). Two-phase
multi-model automatic brain tumour diagnosis system from magnetic resonance
second-best classification results and completed the classification pro­ images using convolutional neural networks. Eurasip Journal on Image and Video
cess in the shortest time (67 min). The EfficientNetB0-SVM model ach­ Processing, 2018(1), 1–10. https://doi.org/10.1186/S13640-018-0332-4/TABLES/7
ieved the highest classification accuracy and completed training in 261 Alam, T. M., Shaukat, K., Khan, W. A., Hameed, I. A., Almuqren, L. A., Raza, M. A.,
Aslam, M., & Luo, S. (2022). An Efficient Deep Learning-Based Skin Cancer Classifier
min. As the depth of the network increased, the training time also for an Imbalanced Dataset. Diagnostics 2022, Vol. 12, Page 2115, 12(9), 2115.
increased, which ultimately affected the efficiency of the network. The https://doi.org/10.3390/DIAGNOSTICS12092115.
research findings suggest that for the classification of brain tumors, the Alanazi, M. F., Ali, M. U., Hussain, S. J., Zafar, A., Mohatram, M., Irfan, M., Alruwaili, R.,
Alruwaili, M., Ali, N. H., & Albarrak, A. M. (2022). Brain Tumor/Mass Classification
most effective approaches are the EfficientNetB0 and the proposed CNN
Framework Using Magnetic-Resonance-Imaging-Based Isolated and Developed
models. In addition, Fig. 12 provides information on the accuracy ratios Transfer Deep-Learning Model. Sensors 2022, Vol. 22, Page 372, 22(1), 372. https://
of various models based on the FLOPs value, a crucial parameter for doi.org/10.3390/S22010372.
determining computational complexity. NasNetMobile exhibited the Al-Badarneh, A., Najadat, H., & Alraziqi, A. M. (2012). A classifier to detect tumor
disease in MRI brain images. Proceedings of the 2012 IEEE/ACM International
lowest FLOPs value but with poor accuracy, whereas the proposed CNN Conference on Advances in Social Networks Analysis and Mining, ASONAM 2012,
model demonstrated relatively high FLOPs value and still achieved 784–787. https://doi.org/10.1109/ASONAM.2012.142.
higher accuracy compared to the NasNetMobile model. Notably, the Alharbi, F., Luo, S., Zhang, H., Shaukat, K., Yang, G., Wheeler, C. A., & Chen, Z. (2023). A
Brief Review of Acoustic and Vibration Signal-Based Fault Detection for Belt
EfficientNetB0 model demonstrated the highest FLOPs/accuracy ratio Conveyor Idlers Using Machine Learning Models. Sensors 2023, Vol. 23, Page 1902,
compared to all the models evaluated. 23(4), 1902. https://doi.org/10.3390/S23041902.
Ali, Z., Hayat, M. F., Shaukat, K., Alam, T. M., Hameed, I. A., Luo, S., Basheer, S., Ayadi,
M., & Ksibi, A. (2022). A Proposed Framework for Early Prediction of
6. Conclusion Schistosomiasis. Diagnostics 2022, Vol. 12, Page 3138, 12(12), 3138. https://doi.org/
10.3390/DIAGNOSTICS12123138.
This study demonstrates the potential of state-of-the-art CNN models Amran, G. A., Alsharam, M. S., Blajam, A. O. A., Hasan, A. A., Alfaifi, M. Y., Amran, M.
H., Gumaei, A., & Eldin, S. M. (2022). Brain Tumor Classification and Detection
and a novel CNN model in accurately classifying brain tumours from Using Hybrid Deep Tumor Network. Electronics 2022, Vol. 11, Page 3457, 11(21),
brain MRI images. ML algorithms, SVM, KNN, DT and NB, optimised 3457. https://doi.org/10.3390/ELECTRONICS11213457.
with Bayesian optimisation algorithm, were efficiently used to classify Ananda Kumar, K. S., Prasad, A. Y., & Metan, J. (2022). A hybrid deep CNN-Cov-19-Res-
Net Transfer learning architype for an enhanced Brain tumor Detection and
features extracted from the models. Among the models tested, the
Classification scheme in medical image processing. Biomedical Signal Processing and
EfficientNetB0-SVM structure achieved the highest classification accu­ Control, 76, Article 103631. https://doi.org/10.1016/J.BSPC.2022.103631
racy of 97.93 %. While SVM outperformed the other three ML algo­ Aslan, M. F., Sabanci, K., Durdu, A., & Unlersen, M. F. (2022). COVID-19 diagnosis using
rithms, the proposed CNN model achieved the highest accuracy of 97.15 state-of-the-art CNN architecture features and Bayesian Optimization. Computers in
Biology and Medicine, 142, Article 105244. https://doi.org/10.1016/J.
% when paired with the KNN algorithm. Furthermore, the ten different COMPBIOMED.2022.105244
CNN models built on SVM had an average accuracy of 95.99 %. Effi­ Aurna, N. F., Yousuf, M. A., Taher, K. A., Azad, A. K. M., & Moni, M. A. (2022).
cientNetB0 and the proposed CNN model performed better than other A classification of MRI brain tumor based on two stage feature level ensemble of
deep CNN models. Computers in Biology and Medicine, 146, Article 105539. https://
models in terms of accuracy and computational efficiency, making them doi.org/10.1016/J.COMPBIOMED.2022.105539
the most suitable models for classifying brain tumors. It is worth noting Badža, M. M., & Barjaktarović, M. C. (2020). Classification of Brain Tumors from MRI
that the proposed CNN-KNN model completed the classification process Images Using a Convolutional Neural Network. Applied Sciences 2020, Vol. 10, Page
1999, 10(6), 1999. https://doi.org/10.3390/APP10061999.
in the shortest time of 67 min, while achieving the second-best classi­ Bhupendra, Moses, K., Miglani, A., & Kumar Kankar, P. (2022). Deep CNN-based damage
fication results. The proposed CNN model has a relatively high FLOPs classification of milled rice grains using a high-magnification image dataset.
value but still achieved higher accuracy than the NasNetMobile model. Computers and Electronics in Agriculture, 195, 106811. https://doi.org/10.1016/J.
COMPAG.2022.106811.
EfficientNetB0 had the highest FLOPs/accuracy ratio among all the Bhuvaji, S., Kadam, A., Bhumkar, P., & Dedge, S. (2020). Brain Tumor Classification
models, indicating its superiority in terms of computational complexity (MRI) | Kaggle. https://doi.org/10.34740/kaggle/dsv/1183165.
and accuracy. Bishnoi, S., Ravinder, R., Grover, H. S., Kodamana, H., & Krishnan, N. M. A. (2021).
Scalable Gaussian processes for predicting the optical, physical, thermal, and
Overall, the findings of this study highlight the reliability and po­
mechanical properties of inorganic glasses with large datasets. Materials Advances, 2
tential of the proposed method in accurately classifying brain tumors, (1), 477–487. https://doi.org/10.1039/D0MA00764A
which can lead to better patient outcomes and more effective treatment. Cases | radiopaedia.org. (n.d.). Radiopaedia. Retrieved March 31, 2023, from https://ra
diopaedia.org/cases.
Chattopadhyay, A., & Maitra, M. (2022). MRI-based brain tumour image detection using
CRediT authorship contribution statement CNN based deep learning method. Neuroscience Informatics, 2(4), Article 100060.
https://doi.org/10.1016/J.NEURI.2022.100060
Muhammed Celik: Methodology, Conceptualization, Data curation, Cheng, J. (2017). Brain Tumor MRI Dataset. https://figshare.com/articles/dataset/
brain_tumor_dataset/1512427.
Software, Visualization, Writing – original draft. Ozkan Inik: Method­ Cheng, J., Huang, W., Cao, S., Yang, R., Yang, W., Yun, Z., Wang, Z., & Feng, Q. (2015).
ology, Supervision, Conceptualization, Writing – review & editing. Enhanced Performance of Brain Tumor Classification via Tumor Region
Augmentation and Partition. PLoS One1, 10(10), e0140381.
Chollet, F. (2016). Xception: Deep Learning with Depthwise Separable Convolutions.
Declaration of Competing Interest Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR
2017, 2017-January, 1800–1807. https://doi.org/10.48550/arxiv.1610.02357.
The authors declare that they have no known competing financial Collins, D. L., Zijdenbos, A. P., Kollokian, V., Sled, J. G., Kabani, N. J., Holmes, C. J., &
Evans, A. C. (1998). Design and construction of a realistic digital brain phantom.
interests or personal relationships that could have appeared to influence IEEE Transactions on Medical Imaging, 17(3), 463–468. https://doi.org/10.1109/
the work reported in this paper. 42.712135
Das, S., Aranya, O. F. M. R. R., & Labiba, N. N. (2019). Brain Tumor Classification Using
Convolutional Neural Network. 1st International Conference on Advances in Science,

17
M. Celik and O. Inik Expert Systems With Applications 238 (2024) 122159

Engineering and Robotics Technology 2019, ICASERT 2019. https://doi.org/10.1109/ (BRATS). IEEE Transactions on Medical Imaging, 34(10), 1993–2024. https://doi.org/
ICASERT.2019.8934603. 10.1109/TMI.2014.2377694
Deepak, S., & Ameer, P. M. (2019). Brain tumor classification using deep CNN features Mohsen, H., El-Dahshan, E.-S.-A., El-Horbaty, E.-S.-M., & Salem, A.-B.-M. (2018).
via transfer learning. Computers in Biology and Medicine, 111, Article 103345. Classification using deep learning neural networks for brain tumors. Future
https://doi.org/10.1016/J.COMPBIOMED.2019.103345 Computing and Informatics Journal, 3(1), 68–71. https://doi.org/10.1016/j.
Deepak, S., & Ameer, P. M. (2021). Automated Categorization of Brain Tumor from MRI fcij.2017.12.001
Using CNN features and SVM. Journal of Ambient Intelligence and Humanized Nayak, D. R., Padhy, N., Mallick, P. K., Zymbler, M., & Kumar, S. (2022). Brain Tumor
Computing, 12(8), 8357–8369. https://doi.org/10.1007/S12652-020-02568-W/ Classification Using Dense Efficient-Net. Axioms 2022, Vol. 11, Page 34, 11(1), 34.
FIGURES/7 https://doi.org/10.3390/AXIOMS11010034.
Deng, J., Dong, W., Socher, R., Li, L.-J., Kai Li, & Li Fei-Fei. (2010). ImageNet: A large- Nickparvar, M. (2021). Brain Tumor MRI Dataset | Kaggle. https://doi.org/10.34740/
scale hierarchical image database. 248–255. https://doi.org/10.1109/ kaggle/dsv/2645886.
CVPR.2009.5206848. Panigrahi, A. (2021). Brain_Tumor_Detection_MRI | Kaggle. https://www.kaggle.com/
Deng, X., Liu, Q., Deng, Y., & Mahadevan, S. (2016). An improved method to construct datasets/abhranta/brain-tumor-detection-mri.
basic probability assignment based on the confusion matrix for classification Raza, A., Ayub, H., Khan, J. A., Ahmad, I., Salama, A. S., Daradkeh, Y. I., Javeed, D.,
problem. Information Sciences, 340–341, 250–261. https://doi.org/10.1016/J. Rehman, A. U., & Hamam, H. (2022). A Hybrid Deep Learning-Based Approach for
INS.2016.01.033 Brain Tumor Classification. Electronics 2022, Vol. 11, Page 1146, 11(7), 1146.
Devnath, L., Luo, S., Summons, P., Wang, D., Shaukat, K., Hameed, I. A., & Alrayes, F. S. https://doi.org/10.3390/ELECTRONICS11071146.
(2022). Deep Ensemble Learning for the Automatic Detection of Pneumoconiosis in Redmon, J., & Farhadi, A. (2016). YOLO9000: Better, Faster, Stronger. Proceedings - 30th
Coal Worker’s Chest X-ray Radiography. Journal of Clinical Medicine 2022, Vol. 11, IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, 2017-
Page 5342, 11(18), 5342. https://doi.org/10.3390/JCM11185342. January, 6517–6525. https://doi.org/10.1109/CVPR.2017.690.
Díaz-Pernas, F. J., Martínez-Zarzuela, M., González-Ortega, D., & Antón-Rodríguez, M. Redmon, J., & Farhadi, A. (2018). YOLOv3: An Incremental Improvement. https://doi.
(2021). A Deep Learning Approach for Brain Tumor Classification and Segmentation org/10.48550/arxiv.1804.02767.
Using a Multiscale Convolutional Neural Network. Healthcare 2021, Vol. 9, Page 153, Samee, N. A., Mahmoud, N. F., Atteia, G., Abdallah, H. A., Alabdulhafith, M., Al-
9(2), 153. https://doi.org/10.3390/HEALTHCARE9020153. Gaashani, M. S. A. M., Ahmad, S., & Muthanna, M. S. A. (2022). Classification
El-Dahshan, E. S. A., Hosny, T., & Salem, A. B. M. (2010). Hybrid intelligent techniques Framework for Medical Diagnosis of Brain Tumor with an Effective Hybrid Transfer
for MRI brain images classification. Digital Signal Processing, 20(2), 433–441. https:// Learning Model. Diagnostics 2022, Vol. 12, Page 2541, 12(10), 2541. https://doi.org/
doi.org/10.1016/J.DSP.2009.07.002 10.3390/DIAGNOSTICS12102541.
Frazier, P. I. (2018). A Tutorial on Bayesian Optimization. https://doi.org/10.48550/ Sharif, M. I., Li, J. P., Khan, M. A., & Saleem, M. A. (2020). Active deep neural network
arxiv.1807.02811. features selection for segmentation and recognition of brain tumors using MRI
Gómez-Guzmán, M. A., Jiménez-Beristaín, L., García-Guerrero, E. E., López-Bonilla, O. images. Pattern Recognition Letters, 129, 181–189. https://doi.org/10.1016/J.
R., Tamayo-Perez, U. J., Esqueda-Elizondo, J. J., Palomino-Vizcaino, K., & Inzunza- PATREC.2019.11.019
González, E. (2023). Classifying Brain Tumors on Magnetic Resonance Imaging by Shaukat, K., Luo, S., & Varadharajan, V. (2023). A novel deep learning-based approach
Using Convolutional Neural Networks. Electronics 2023, Vol. 12, Page 955, 12(4), for malware detection. Engineering Applications of Artificial Intelligence, 122, Article
955. https://doi.org/10.3390/ELECTRONICS12040955. 106030. https://doi.org/10.1016/J.ENGAPPAI.2023.106030
Hamada, A. (2020). Br35H :: Brain Tumor Detection 2020 | Kaggle. https://www.kaggle. Shaukat, K., Luo, S., Varadharajan, V., Hameed, I. A., & Xu, M. (2020). A Survey on
com/datasets/ahmedhamada0/brain-tumor-detection?select=no. Machine Learning Techniques for Cyber Security in the Last Decade. IEEE Access, 8,
He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep Residual Learning for Image 222310–222354. https://doi.org/10.1109/ACCESS.2020.3041951
Recognition. Proceedings of the IEEE Computer Society Conference on Computer Vision Srinivas, C., Nandini, N. P., Zakariah, M., Alothaibi, Y. A., Shaukat, K., Partibane, B., &
and Pattern Recognition, 2016-December, 770–778. https://doi.org/10.48550/ Awal, H. (2022). Deep Transfer Learning Approaches in Performance Analysis of
arxiv.1512.03385. Brain Tumor Classification Using MRI Images. Journal of Healthcare Engineering,
Hemanth, G., Janardhan, M., & Sujihelen, L. (2019). Design and implementing brain 2022. https://doi.org/10.1155/2022/3264367
tumor detection using machine learning approach. Proceedings of the International Sultan, H. H., Salem, N. M., & Al-Atabany, W. (2019). Multi-Classification of Brain Tumor
Conference on Trends in Electronics and Informatics, ICOEI 2019, 2019-April, Images Using Deep Neural Network. IEEE Access, 7, 69215–69225. https://doi.org/
1289–1294. https://doi.org/10.1109/ICOEI.2019.8862553. 10.1109/ACCESS.2019.2919122
Huang, G., Liu, Z., van der Maaten, L., & Weinberger, K. Q. (2016). Densely Connected Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2015). Rethinking the
Convolutional Networks. Proceedings - 30th IEEE Conference on Computer Vision and Inception Architecture for Computer Vision. Proceedings of the IEEE Computer Society
Pattern Recognition, CVPR 2017, 2017-January, 2261–2269. https://doi.org/ Conference on Computer Vision and Pattern Recognition, 2016-December, 2818–2826.
10.48550/arxiv.1608.06993. https://doi.org/10.48550/arxiv.1512.00567.
Inik, O. (2023). CNN hyper-parameter optimization for environmental sound Tan, M., & Le, Q. v. (2019). EfficientNet: Rethinking Model Scaling for Convolutional
classification. Applied Acoustics, 202, Article 109168. https://doi.org/10.1016/J. Neural Networks. 36th International Conference on Machine Learning, ICML 2019,
APACOUST.2022.109168 2019-June, 10691–10700. https://doi.org/10.48550/arxiv.1905.11946.
Jibon, F. A., Khandaker, M. U., Miraz, M. H., Thakur, H., Rabby, F., Tamam, N., Ullah, N., Khan, J. A., Khan, M. S., Khan, W., Hassan, I., Obayya, M., Negm, N., &
Sulieman, A., Itas, Y. S., & Osman, H. (2022). Cancerous and Non-Cancerous Brain Salama, A. S. (2022). An Effective Approach to Detect and Identify Brain Tumors
MRI Classification Method Based on Convolutional Neural Network and Log-Polar Using Transfer Learning. Applied Sciences 2022, Vol. 12, Page 5645, 12(11), 5645.
Transformation. Healthcare 2022, Vol. 10, Page 1801, 10(9), 1801. https://doi.org/ https://doi.org/10.3390/APP12115645.
10.3390/HEALTHCARE10091801. Ullah, N., Khan, M. S., Khan, J. A., Choi, A., & Anwar, M. S. (2022). A Robust End-to-End
Johnson, K. A., & Becker, J. A. (1999). The Whole Brain Atlas. https://www.med. Deep Learning-Based Approach for Effective and Reliable BTD Using MR Images.
harvard.edu/AANLIB/. Sensors 2022, Vol. 22, Page 7575, 22(19), 7575. https://doi.org/10.3390/
Inik, O., & Ulker, E. (2022). Optimization of deep learning based segmentation method. S22197575.
Soft Computing, 26(7), 3329–3344. https://doi.org/10.1007/S00500-021-06711-3/ Wahlang, I., Maji, A. K., Saha, G., Chakrabarti, P., Jasinski, M., Leonowicz, Z., &
TABLES/9 Jasinska, E. (2022). Brain Magnetic Resonance Imaging Classification Using Deep
Ini˙k, O., Altıok, M., Ulker, E., & Kocer, B. (2021). MODE-CNN: A fast converging multi- Learning Architectures with Gender and Age. Sensors 2022, Vol. 22, Page 1766, 22
objective optimization algorithm for CNN-based models. Applied Soft Computing, 109, (5), 1766. https://doi.org/10.3390/S22051766.
107582. https://doi.org/10.1016/J.ASOC.2021.107582 Wu, J., Chen, X. Y., Zhang, H., Xiong, L. D., Lei, H., & Deng, S. H. (2019).
Khan, H. A., Jue, W., Mushtaq, M., & Mushtaq, M. U. (2020). Brain tumor classification in Hyperparameter Optimization for Machine Learning Models Based on Bayesian
MRI image using convolutional neural network. Mathematical Biosciences and Optimization. Journal of Electronic Science and Technology, 17(1), 26–40. https://doi.
Engineering : MBE, 17(5), 6203–6216. https://doi.org/10.3934/MBE.2020328 org/10.11989/JEST.1674-862X.80904120.
Kumar, M. R., Vekkot, S., Lalitha, S., Gupta, D., Govindraj, V. J., Shaukat, K., Alotaibi, Y. Yazdan, S. A., Ahmad, R., Iqbal, N., Rizwan, A., Khan, A. N., & Kim, D. H. (2022). An
A., & Zakariah, M. (2022). Dementia Detection from Speech Using Machine Learning Efficient Multi-Scale Convolutional Neural Network Based Multi-Class Brain MRI
and Deep Learning Architectures. Sensors 2022, Vol. 22, Page 9311, 22(23), 9311. Classification for SaMD. Tomography 2022, Vol. 8, Pages 1905-1927, 8(4),
https://doi.org/10.3390/S22239311. 1905–1927. https://doi.org/10.3390/TOMOGRAPHY8040161.
Kumar, S., & Mankame, D. P. (2020). Optimization driven Deep Convolution Neural Zacharaki, E. I., Wang, S., Chawla, S., Yoo, D. S., Wolf, R., Melhem, E. R., &
Network for brain tumor classification. Biocybernetics and Biomedical Engineering, 40 Davatzikos, C. (2009). Classification of brain tumor type and grade using MRI
(3), 1190–1204. https://doi.org/10.1016/J.BBE.2020.05.009 texture and shape in a machine learning scheme. Magnetic Resonance in Medicine, 62
Latif, G., Brahim, G. Ben, Awang Iskandar, D. N. F., Bashar, A., & Alghazo, J. (2022). (6), 1609–1618. https://doi.org/10.1002/MRM.22147
Glioma Tumors’ Classification Using Deep-Neural-Network-Based Features with Zhang, Y., Dong, Z., Wu, L., & Wang, S. (2011). A hybrid method for MRI brain image
SVM Classifier. Diagnostics, 12(4), 1018. https://doi.org/10.3390/ classification. Expert Systems with Applications, 38(8), 10049–10053. https://doi.org/
DIAGNOSTICS12041018 10.1016/J.ESWA.2011.02.012
Maqsood, S., Damaševičius, R., & Maskeliūnas, R. (2022). Multi-Modal Brain Tumor Zoph, B., Vasudevan, V., Shlens, J., & Le, Q. V. (2017). Learning Transferable
Detection Using Deep Neural Network and Multiclass SVM. Medicina 2022, Vol. 58, Architectures for Scalable Image Recognition. Proceedings of the IEEE Computer
Page 1090, 58(8), 1090. https://doi.org/10.3390/MEDICINA58081090. Society Conference on Computer Vision and Pattern Recognition, 8697–8710. https://
Menze, B. H., Jakab, A., Bauer, S., Kalpathy-Cramer, J., Farahani, K., Kirby, J., … Van doi.org/10.48550/arxiv.1707.07012.
Leemput, K. (2015). The Multimodal Brain Tumor Image Segmentation Benchmark

18

You might also like