You are on page 1of 11

Computers in Biology and Medicine 152 (2023) 106426

Contents lists available at ScienceDirect

Computers in Biology and Medicine


journal homepage: www.elsevier.com/locate/compbiomed

RAAGR2-Net: A brain tumor segmentation network using parallel processing


of multiple spatial frames
Mobeen Ur Rehman a ,1 , Jihyoung Ryu b ,1 , Imran Fareed Nizami c , Kil To Chong a,d ,∗
a
Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea
b
Electronics and Telecommunications Research Institute, 176-11 Cheomdan Gwagi-ro, Buk-gu, Gwangju 61012, Republic of Korea
c
Department of Electrical Engineering, Bahria University, Islamabad, Pakistan
d
Advances Electronics and Information Research Center, Jeonbuk National University, Jeonju 54896, South Korea

ARTICLE INFO ABSTRACT

Keywords: Brain tumors are one of the most fatal cancers. Magnetic Resonance Imaging (MRI) is a non-invasive method
Magnetic Resonance Imaging (MRI) that provides multi-modal images containing important information regarding the tumor. Many contemporary
Multimodal brain tumor image segmentation techniques employ four modalities: T1-weighted (T1), T1-weighted with contrast (T1c), T2-weighted (T2),
benchmark (braTS)
and fluid-attenuation-inversion-recovery (FLAIR), each of which provides unique and important characteristics
Residual Atrous Spatial Pyramid Pooling
for the location of each tumor. Although several modern procedures provide decent segmentation results on
(RASPP)
Attention gate (AG)
the multimodal brain tumor image segmentation benchmark (BraTS) dataset, they lack performance when
Recursive residual (R2) block evaluated simultaneously on all the regions of MRI images. Furthermore, there is still room for improvement
due to parameter limitations and computational complexity. Therefore, in this work, a novel encoder–decoder-
based architecture is proposed for the effective segmentation of brain tumor regions. Data pre-processing is
performed by applying N4 bias field correction, z-score, and 0 to 1 resampling to facilitate model training.
To minimize the loss of location information in different modules, a residual spatial pyramid pooling (RASPP)
module is proposed. RASPP is a set of parallel layers using dilated convolution. In addition, an attention gate
(AG) module is used to efficiently emphasize and restore the segmented output from extracted feature maps.
The proposed modules attempt to acquire rich feature representations by combining knowledge from diverse
feature maps and retaining their local information. The performance of the proposed deep network based on
RASPP, AG, and recursive residual (R2) block termed RAAGR2-Net is evaluated on the BraTS benchmarks.
The experimental results show that the suggested network outperforms existing networks that exhibit the
usefulness of the proposed modules for ‘‘fine’’ segmentation. The code for this work is made available online
at: https://github.com/Rehman1995/RAAGR2-Net.

1. Introduction Gliomas are divided into two types: low-grade glioma (LGG) and high-
grade glioma (HGG). LGG progresses slowly, therefore it is easy to
Brain tumors are formed when the brain starts developing abnormal treat. HGG is aggressive and progresses rapidly, therefore it requires
cells. The present rate of malignant brain tumors is rather high, which rapid treatment [5]. Despite maximum surgical and medicinal care, the
has a significant impact on persons and society [1]. The ability to seg- life expectancy for a patient with an HGG tumor is approximately 14
ment tumors accurately is critical since it provides information required months, with 5-year survival rate near zero [6]. As a result, prompt
for analysis and diagnosis of malignant tumors, along with mapping diagnosis becomes critical for optimal patient care.
out therapy choices and tracking disease progression. Brain tumors To get strong soft tissue contrast without radiation, magnetic reso-
are among the most fatal malignancies, and they are divided into two nance imaging (MRI) is an extensively utilized imaging tool to detect
categories i.e., primary and secondary tumor. The category is assigned and diagnose brain tumors [2]. The segmentation classes for MRI
based on the origin of the tumor [2]. Glioma, which arises from brain
images are T1-weighted (T1), T1-weighted with contrast enhancement
glial cells, accounts for 80% of all malignant brain tumors and it is
(T1ce), T2-weighted (T2), and Fluid-Attenuation-Inversion-Recovery
the most frequent histopathological type of primary brain cancer [3,4].

∗ Corresponding author at: Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea.
E-mail addresses: cmobeenrahman@jbnu.ac.kr (M.U. Rehman), jihyoung@etri.re.kr (J. Ryu), imnizami.buic@bahria.edu.pk (I.F. Nizami),
kitchong@jbnu.ac.kr (K.T. Chong).
1
Mobeen Ur Rehman and Jihyoung Ryu contributed equally.

https://doi.org/10.1016/j.compbiomed.2022.106426
Received 29 August 2022; Received in revised form 16 November 2022; Accepted 13 December 2022
Available online 20 December 2022
0010-4825/© 2022 Elsevier Ltd. All rights reserved.
M.U. Rehman et al. Computers in Biology and Medicine 152 (2023) 106426

(FLAIR). These regions are dependent on the degree of stimulation feature extraction [20], support vector machine [21], statistical model
and the number of repetitions, giving additional information to assess [22] have shown improvements. On the other hand, brain tumor seg-
multiple subregions of gliomas [7]. T2 and FLAIR, emphasize the tumor mentation remains a difficult process, particularly when multi-modality
with peritumoral edema, which is referred to be the whole tumor. T1 data is involved. Three major reasons impede the performance of the
and T1ce reveal the tumor core, which is free of peritumoral edema [8]. aforementioned methods: (i) the variations in brain anatomy across
In T1ce, an enhancing tumor core can be seen, which is referred to as patients, (ii) the variations in gliomas’ sizes, (iii) the variations in MRI
an enhancing tumor core. image characteristics, i.e., , the tumor boundary is blurry due to poor
Brain tumor segmentation has emerged as an essential element in contrast.
the area of medical imaging and therefore, it is an active area of Deep learning-based techniques have recently gained popularity as
research to fulfill the clinical demands. The objective of brain tumor a result of their excellent feature learning capabilities [23,24]. Many
segmentation is to extract the tumor region from healthy tissues in techniques based on neural networks have been proposed recently for
the MRI image. It can be performed either manually or automatically brain tumor segmentation [25,26]. The one-pass multi-task network
with the help of computer-aided diagnosis (CAD). Manual segmentation (OM-Net) is a brain tumor segmentation algorithm that is specifically
suffers from intra-person and inter-person variation. During manual designed to address the problem of class imbalance [27]. A cascaded
segmentation, intra-personal and inter-personal variability influence network with two sub-networks is proposed in [28]. The first network
the accurate segmentation of the tumor and causes the variation within uses an MRI slice to identify tumor region, while the second network
the same image. Therefore, computer-aided segmentation of tumor is labels the identified region into sub-regions of a tumor. Havaei et al. de-
an important step towards efficient detection and treatment of brain signed an architecture having multiple pathways where local features of
tumors. In recent times due to advancement in artificial intelligence, tumor are learnt along with the depth contextual features [29]. In [30],
many image related research problems are addressed in a better man- the authors explored a framework with three networks to separate brain
ner [9–11]. Consequently, automated segmentation approaches have tumor fragments hierarchically and sequentially. EMMA (Ensembles of
gained more importance in comparison to manual segmentation [12– Multiple Models and Architectures) was developed in [31] and it was
14]. Despite the benefits, automated segmentation of brain tumors ranked top in the BraTS 2017 competition. Zhang et al. introduced a
is still a difficult task in terms of robustness due to the dense and unique cross-modality effective feature learning approach for multi-
contiguous spread of lesion regions with irregular boundaries. modal brain tumor segmentation, based on minimal medical images
There are only a few networks that can segment and classify the that had abundant details in the modality property [32]. It comprises
lesion regions simultaneously. Furthermore, such networks often re- of two processes: a cross-modality feature transfer and a cross-modality
quire a substantial memory and computational overhead, because the feature merging.
complexity of segmentation techniques is high [15,16]. Additionally, U-Net architecture is considered being a significant contribution
these networks confront training and optimization challenges, which to the semantic segmentation of biomedical imagery [33]. Recently,
are often caused by high tuning costs and wide design ranges for many frameworks based on U-Net architecture are proposed in the
hyper-parameter tuning [17]. In order to address the aforementioned literature for brain tumor segmentation. In [34], authors utilized the
drawbacks, in this work, we have constructed a network that is read- original U-Net architecture for brain tumor detection and segmentation.
ily scalable and can provide parallel processing of multiple spatial Chen et al. presented a network for brain tumor segmentation that
frames. In comparison to current state-of-the-art models, this allows combines multi-scale forecasts from the decoder section of U-Net [35].
the network to obtain optimal minima in a shorter period. The major Aboelenein et al. modified the U-Net architecture into a two track
contributions of this work are as follows: model [36]. Each track has a different number of layers and a distinct
kernel size. Final brain segmentation is achieved by merging the two
• A pre-processing algorithm is proposed that enables the model to tracks.
learn the insight and important features of the tumor regions from MRI usually consists of three-dimensional data and used to pre-
the dataset. dict the type of brain tumor through data learning. In the learning
• A segmentation model utilizing optimized modules for the pro- process, data is used as a three-dimensional image or divided into two-
posed CNN architecture based on encoder and decoder configu- dimensional images and stacked to learn from the data. In the case
ration is introduced. of learning from 3D data, the restoration performance is relatively
• Taking into account the high variation between different brain high and the prediction performance is excellent because the restora-
tumor regions, such modules are designed that minimize the tion part is wider than that of a 2D image due to the 3-dimensional
information loss during deep feature extraction in the proposed location information [37]. Most of the brain tumors are subdivided
RAAGR2-Net. using U-Net [33] through 3D convolution. Conversely, the loss of
• Depthwise convolution for brain tumor segmentation is explored positional information is greater in the case of two-dimensional, and
and proposed in this work. therefore a large-capacity computing device is required because of
• It is of paramount importance to keep track of the precise location the large amount of computation [38]. Since the image data of brain
of the tumor regions during the image reconstruction process, so tumors is limited, the amount of data for learning is insufficient. To
the modules employed help prevent the loss of this information. solve this problem, artificial intelligence (AI) models are trained using
two-dimensional data.
The remainder of the work is organized as follows. Section 2 Research related to the detection of brain tumors through MRI and
discusses the related work, and Section 3 gives the details of datasets AI is embodied through the U-Net model. In the case of the fully
utilized in this work. Section 4 discusses the pre-processing unit. Sec- convolution network (FCN) model, brain tumors are predicted by up-
tion 5 provides details about the architecture of the proposed method- sampling the MRI images to the original size, where the final feature
ology, Section 6 gives details about the experimental setup, Section 7 map is obtained through the compression stage of the image. The FCN
discusses and analyze the results of the proposed methodology and model restores only a specific layer and does not update the layer
Section 8 concludes the work. by applying convolution, so many pixels lose their location informa-
tion, creating a relatively blurred image. While, U-Net architecture
2. Related work concatenates the output image of the pre-compression layer at each
restoration step, playing a crucial role in supplementing the location
In recent years, conventional brain tumor segmentation approaches information of the original image. As a result, the restored image by U-
like conditional random fields [18], random forests [19], kernel based Net architecture is more similar to the original image, when compared

2
M.U. Rehman et al. Computers in Biology and Medicine 152 (2023) 106426

with the existing FCN, therefore the segmented image by U-Net shows
better performance.
In [39], an FCN model is applied to 3D data using 3D convolution,
and the network involves a dual-pathway design that concurrently an-
alyzes incoming images at various sizes. Furthermore, the performance
was increased by applying an ensemble learning between the dual-
pathway design [31]. The Autoencoder normalization method was used
with 3D convolution to perform brain tumor detection. The images
are divided into 160 × 192 × 128 using a random crop and the
batch size was set to 1 [40]. The model showed good performance
however, such a model has a high computational expense and requires
a high-performance computational device, furthermore, it takes a large
amount of time in the training process and requires a large amount of
memory due to a large number of parameters [41].
In [42], the 2D segmentation model is applied by transforming the
3D data into 2D data in such a manner that an attention gate (AG) is
used in the restoration layer to restore and decode an image that is
more emphasized than the image restored with U-Net by convolution
from the existing feature map. This resulted in performance improve-
ment. Furthermore, through the skip connection, the correction value
is continuously added in order not to lose the current value each time
it is passed through a layer. This technique suffers from the drawback
of added computational expense of converting the 3D data to 2D data
which may also cause loss of information.
Fig. 1. Brain tumor data labeling and classification of brain tumor regions.
DeepLab [43] is a model that is subdivided using extended con-
volution, a new method rather than the existing U-Net type. The first
phase of this model was constructed by applying a simple VGG-16 [44]
model, but in later phases, ASPP (Atrous Spatial Pyramid Pooling) was BraTS2017 dataset. The dataset consists of MRI images of 285 pa-
applied at the same time along with Res-Net [45] model to emphasize tients suffering from primary tumor glioma. Among them, 210 cases
the feature points. The performance is increased due to the method of correspond to high-grade glioma (HGG), and the remaining 75 cases
combining multiple convolutional layers in parallel. However, there is a correspond to low-grade gliomas (LGG). The difference between BraTS
limitation in performance improvement because the technique applied 2017 and BraTS 2018 datasets is the validation data. In BraTS2018, an
to each parallel branch is different and the characteristics required additional 66 MRI imaging data is provided, which can be used for the
in the layer are different. In [46], a prediction image is generated validation process.
using feature synthesis through ensemble learning after splitting the Unlike the BraTS2017 and BraTS2018 datasets, the BraTS2019 con-
model at each axis for axial, coronal, and sagittal. Since, this model sists of 259 glioblastoma data and 76 LGG i.e., dataset is larger in
uses each axis, the prediction parameters become three fold. The accu- comparison to the BraTS2017 and BraTS2018 datasets. The dataset im-
racy shows an improvement when the conditional random field (CRF) ages have 4-channels which include T1, T2, T1ce, and Flair data of MRI.
approach and post-processing method is used. As a result, artificial Every channel of input data has the same size which is 240 × 240 × 155
intelligence segmentation models are being investigated utilizing a pixels. The provided data is interpolated to the same resolution (1
mm3 ) and skull-stripped. The slice thickness for T1 is 1–6 mm, while
variety of methodologies, but they all require high computing power
for T2 and Flair it is 2–6 mm [47]. The labeled data provided by
and computational resources.
BraTS is defined into three classes which consist of necrotic & non-
In this work, a RAAGR2-Net architecture is proposed that helps in
enhancing tumor (NCR & NET), enhancing tumor (ET), and edema.
solving the major drawback of state-of-the-art techniques i.e., loss of
Fig. 1 illustrates the visualization of the labeled data and detailed
location information in the reconstruction of the segmented image. The
information regarding the different regions involved in brain tumor
proposed architecture consists of residual atrous spatial pyramid pool-
segmentation. The enhanced core (EC) is a region that shows high
ing (RASPP), attention gate (AG), and recursive residual (R2) modules.
intensity in T1ce when compared with the T1 weighted image data.
RASPP and R2 modules help in obtaining deep features without loss of
Tumor core (TC) generally represents most of the primary tumors and
location information. They are also capable of learning the discriminant
is defined as a class that includes necrotic, non-enhanced and enhanced
features belonging to lesion areas. At the image regenerating step, an
tumor cores. The objective of this work is to identify the entire tumor
attention approach is used to reduce the loss of position information
(WT) i.e., the region including the tumor core and surrounding external
value by highlighting feature points.
edema, which is expressed as a high-intensity signal in FLAIR. In the
BraTS (2017, 2018 and 2019) dataset, the region occupancy difference
3. Dataset between different classes of tumor is very high. Therefore, data imbal-
ance causes a problem, and the imbalance problem is further alleviated
In composing medical image data, it is usually difficult to obtain by the addition of two more classes i.e., one class of background region
permission to use patients’ personal information and data. Therefore, in and the other class of non-tumor region. The correct label data is
this work, three datasets i.e., BraTS2017, BraTS2018, and BraTS2019 defined by dividing the brain into five classes and reconstructing TC,
are utilized for the performance evaluation of the proposed RAAGR2- EC, and WT according to the inclusion relationship shown in Fig. 1.
Net methodology. BraTS (Brain Tumor Segmentation) uses data pro-
vided in a challenge hosted by the University of Pennsylvania. The 4. Pre-processing
BraTS2017 dataset consists of MRI images related to 285 patients suf-
fering from primary tumor glioma. Among them, 210 cases correspond Generalized and normalized data is required to train any neural
to high-grade glioma (HGG), and the remaining 75 cases correspond network architecture better. The reason is that the learning technique
to low-grade gliomas (LGG). The BraTS2018 dataset is similar to the of neural networks optimizes the values of each node using pattern

3
M.U. Rehman et al. Computers in Biology and Medicine 152 (2023) 106426

Fig. 2. Proposed framework using novel RAAGR2-net architecture for brain tumor segmentation.

analysis and different features extracted from the training data. If the where, 𝑥𝑖 is the pixel value, 𝜇 is the mean value of the image and
data values are irregular, it is more difficult to predict the pattern and 𝑁 represents the total number of pixels of the image. After z-score
resulting in reduced performance. Therefore, data pre-processing is re- normalization, the value of the image is scaled and adjusted to equalize
quired for generalizing and normalizing the MRI images for segmenting the range of the values between 0 and 1. In this way, it reduces the
brain tumors. The pre-processing consists of N4 bias-field correction, z- learning time of the architecture, protects the model learning from
score normalization and re-sampling, and data augmentation as shown irregular data, and suppresses the problem of finding the local minima
in Fig. 2. in the learning process instead of the global minima [50].

4.1. N4 bias-field-correction 4.3. Data augmentation

The MRI images may contain artifacts due to the patient movement The availability of annotated medical data depends on multiple
or the MR scanner hardware itself. Usually, the brightness of the factors that include the number of people suffering from a particular
images is locally adjusted i.e., the images are made brighter or darker
ailment, the number of patients volunteering for a study, availability
using low-frequency information [48]. This is called the intensity non-
of specialist doctors for a particular ailment. Therefore, the amount of
uniformity problem of MRI, and the data captured through anatomical
data available for a particular study varies. In order to solve the prob-
signals of the human body causes a problem in which the characteristics
lem of data scarcity, data augmentation is used to increase the training
of the lesion are blurred, which can cause problems in the learning
data. Effects are applied to data in data augmentation, including image
process of the neural network segmentation architecture [49]. N4
enlargement, image shrinking, image movement, image inversion, im-
bias field correction is applied to uniformly represent the MRI due
age rotation, image distortion, and image brightness adjustment. CNN
to the intensity non-uniformity problem. In this work, the intensity
model is trained using the extended data set. The augmented data is
non-uniformity problem is solved through the non-parametric and non-
uniform normalization (N3) method, which configures the field by applied to the convolutional neural network architecture in an undam-
applying the B-spline of the intensity histogram using the Gaussian aged state making sure that no information is either deformed or lost.
formula and corrects the image shape. However, the N3 method applies In this work, the range of enlargement, reduction, width movement,
a wide B-spline distance, resulting in higher frequency modulation. and height movement was applied at a rate of 0.2 from the existing
In order to solve this problem, it was seen that the performance of data [51].
the system improves compared to the N3 regularization by reducing
the distance of the spline, parallelizing it with multiple resolutions, 5. Network architecture and methods
and applying the convolution overlapping regularization in terms of
intensity imbalance using N4. Therefore, the dataset applied to training Fig. 2 shows the framework of the proposed methodology for brain
was normalized by N4 bias field correction. tumor segmentation. The 3D data first undergoes the pre-processing
stage where firstly it is sliced into 2D data and then normalization
4.2. Z-score and re-sampling is performed using N4 correction and Z-score. Data augmentation is
carried out in order to solve the limitation associated with smaller
The MRI images corrected by the N4 algorithm solve the problem of dataset availability to train the convolutional neural network. Once the
local intensity imbalance, but the issue of edges and artifacts introduced data is pre-processed, it is given as input to the proposed RAAGR2-
due to noise is still present. To solve this problem, standardization Net. The RAAGR2-Net is an encoder–decoder based architecture that
of data values is required. Therefore, the values are normalized by comprises of three modules. These modules are RASPP, AG and R2
utilizing the z-score method, which is given as follows, modules. These modules play an important role in the efficient and
∑ robust segmentation of different brain tumor regions. Fig. 3 shows the
𝑥𝑖 − 𝑁1 𝑁 𝑖=1 𝑥𝑖
𝑧= √ (1) proposed RAAGR2-Net architecture. The architecture is based on the
1 ∑ 𝑁 2
𝑖=1 (𝑥 𝑖 − 𝜇) encoder–decoder method. The encoder helps in feature extraction while
𝑁

4
M.U. Rehman et al. Computers in Biology and Medicine 152 (2023) 106426

Fig. 3. Network Architecture of proposed RAAGR2-Net.

the decoder is applied to restore the original image using the extracted
features. RASPP is utilized in the encoder part since, it is necessary
for the extraction of efficient features that can be used for restoration
in the decoder part. Compared to the existing U-Net, RASPP requires
a large amount of memory and computational expense because each
layer is multi-connected in parallel. Therefore, the calculated param-
eters are reduced by applying separable-Conv2D rather than applying
Conv2D in the RASPP module. Furthermore, by applying the AG and R2
modules, the parameters of the existing U-Net are reduced from about
34, 000, 000 to 30, 026, 061. Consequently, the computational expense of
the proposed RAAGR2-Net is reduced in comparison to U-Net.
The location information and pixel values are not preserved when
the data passes through each layer because each layer is overlapped.
Therefore, the lost information regarding the location information be-
comes more adverse and affects the performance of the system as
the data passes through more layers. In the existing U-Net, the im-
age convoluted in each layer is delivered to each layer of the de-
coder through the concatenate method to maintain the restoration
performance. However, the predictive performance is reduced due to
poor feature extraction through convolution or loss of pixel location
information during restoration.
The proposed RAAGR2-Net resolves the aforementioned drawbacks
using the AG module instead of concatenation. The decoder part is Fig. 4. Residual ASPP Module (RASPP).
focused on restoration. The features extracted in the encoder part
are utilized through an AG and the R2 module to restore the image.
Usually, the modules are applied by stacking convolutional layers, (ASPP) module feature maps are obtained at multiple dilation ratios
which has the disadvantage of the input disappearing in some cases. and then concatenated. The ASPP module comprises four branches
This drawback can be overcome using the R2 module since it uses with an increasing dilation ratio. The dilation ratio of four branches
the recurrent technique, where the image transmitted by the AG is are 6, 12, 18, 24 i.e., they are multiples of 6. Since, the influence
added to each convolutional layer for correction. Using the RAAGR2- of peripheral location over a wide area is significant for brain tumor
Net method, an image of the original size can be restored through segmentation, which is the reason for using a high dilation ratio i.e., to
upsampling. In the last layer, all images are combined and convolved apply the characteristics of a wide area to a particular region. However,
into a 5-channel image, where each channel refers to one of the to classify images at the pixel level the system architecture needs to find
5 classes i.e., background, non-tumor, tumor core, enhancing tumor, a locally narrow class. Consequently, it is advantageous to maintain an
whole tumor, and prediction data is computed using a sigmoid function appropriate ratio for separable convolution, and therefore in the RASPP
with output values being 0 or 1. module, the ratio was reduced to the multiples of 3. In the RASPP
module, the dilation ratio of the four branches are 1,3,6,9 respectively.
5.1. Residual Atrous Spatial Pyramid Pooling (RASPP) module Through the last 1 × 1 convolution layer, the complex feature image
of the previous layer is maintained as a predetermined channel and
The RASPP module proposed by DeepLab extracts important texture transmitted to the subsequent layer.
features by applying the parallelization of the multi-ratio extended Though the features are strengthened through parallelization in
convolutional layer. Once the features are extracted two 1 × 1 con- ASPP, the original value may be lost as the architecture grows deeper
volutional layers are applied to enhance features at the pixel level. and the data passes through deep layers. To overcome this problem, the
However, each time the data in the aforementioned module under- skip connection method is applied to each branch in the RASPP module.
goes convolution, the original value undergoes transition, while the Since the existing values are corrected while reinforcing the features of
features are enhanced. Therefore, the newly proposed RASPP (Residual each layer, the possibility of data transformation is reduced and it is
Atrous Spatial pyramid Pooling) in this paper is applied in the form advantageous for maintaining the features. Through this method, all the
of a module as shown in Fig. 4. In atrous spatial pyramid pooling features of the existing ASPP module can be used, and at the same time,

5
M.U. Rehman et al. Computers in Biology and Medicine 152 (2023) 106426

Fig. 5. Attention Gate (AG) Module.

data loss can be prevented. Unlike ASPP, the RASPP module comprises
an additional residual branch that concatenates the raw features to the
extracted features from different dilation ratios. Furthermore, there are
two 1 × 1 convolution layers in the ASPP module whereas in the RASPP
module there is only one 1 × 1 convolution layer, which minimizes the
amount of computation and loss of data.

5.2. Attention gate (AG) module

AG is modeled after the human visual learning algorithm that


focuses on the targeted location and learns to minimize redundant
features in the feature maps while emphasizing important feature in-
formation for performing a specific task. Deep learning models trained
using AG inherently increase the network performance [52,53]. Fig. 5
illustrates the schematic of the AG gate used in RAAGR2-Net. The
output equation of the AG module can be defined as,

𝑂𝑢𝑡𝑝𝑢𝑡 = 𝑥𝑓 ∗ 𝑎𝑖 (2)

Where 𝑥𝑓 is the attained feature map from the encoder and 𝑔𝑖 is the gate
signal (i is representing a particular pixel). 𝑎𝑖 is the attention coefficient
(it takes on values between 0 and 1) of the corresponding pixel. The
attention coefficient assigns a higher value to the pixels that are more
relevant to the given problem. The feature map 𝑥𝑖 is element-wise
Fig. 6. Recursive Residual (R2) Block.
multiplied with the attention coefficient 𝑎𝑖 , giving the focus regions as
an output. This process outputs the features that are relevant to the
specific task and suppresses irrelevant feature data.
In RAAGR2-Net attention, coefficients are calculated using additive consists of a residual connection after two convolution layers, while in
attention rather than multiplicative attention, which produces more en- this work a residual connection after every convolution layer is used.
couraging segmentation results [54]. We use the multi-dimensional at- Furthermore, the conventional R2 block consists of 2 convolution layers
tention coefficient to focus on a selection of target locations since brain in each recursive block, but it is increased to 4 convolution layers in
tumor identification is a task with numerous semantic classes [54]. The this work by preserving the important information by utilizing more
multi-dimensional attention coefficient is calculated as follows: residual connections.
Most of the existing neural networks are feed-forward neural net-
𝑎𝑖 = 𝜎2 (𝜓 𝑇 (𝜎1 (𝑊𝑥𝑇 𝑥𝑓 + 𝑊𝑔𝑇 𝑔𝑖 + 𝑏𝑔 )) + 𝑏𝜓 ) (3) works, and the coefficients of the hidden layer are applied in the
where 𝜎1 is Relu function and 𝜎2 is a Sigmoid function. 𝑏𝑔 and 𝑏𝜓 are direction of the output layer. However, the recursive neural network
the bias terms while the linear transformations are represented by 𝑊𝑥 , has the characteristic of not only passing the coefficients to the output
𝑊𝑔 and 𝜓. 1 × 1 convolution is applied to the input feature map and layer through the activation function but also passing the coefficients to
the gating unit to execute linear transformation. the input of the next operation of the hidden layer, so it has an excellent
performance in correcting the existing image before convolution is
5.3. Recursive residual (R2) block applied. The relationship between the input and output relationship of
the recursive block in Fig. 6 can be expressed as,
In U-Net [33], the original image is restored by applying the upsam- 𝐻 𝑢 = 𝐹 (𝐻 𝑢−1 , 𝑊 ) + 𝐻 𝑢−1 (4)
pling layer and the convolutional layer, and in DeepLab, the segmented
image was predicted by applying the upsampling layer and CRF [55]. where 𝑢 is the residual unit under consideration, 𝐻 𝑢 and 𝐻 𝑢−1 are the
However, these methods have a limit in correcting the location infor- output and input of 𝑢th and (𝑢 − 1)th residual units respectively, 𝑊 is
mation, which results in data loss. Therefore, in this work, we propose a a set of weights being shared between the residual units belonging to
modified R2 block-based methodology to prevent information loss [56]. recursive block and 𝐹 represents the residual function. The R2 block
R2 block is a neural network method using a recursive method for the used in RAAGR2-Net architecture consists of 3 residual units in each
convolutional layer illustrated in Fig. 6. The conventional R2 block recursive block, and a total of 2 recursive blocks are present in the R2

6
M.U. Rehman et al. Computers in Biology and Medicine 152 (2023) 106426

Table 1 6. Experimental setup


Percentage of area occupied by each class in
a brain MRI image (except background).
6.1. Implementation details
Class % area
Non-tumor 98.46
The environment used in this study is Keras, a Tensorflow-based
Edema 1.02
ET 0.29 deep learning library in a Linux OS environment. The architecture is
NCR & NET 0.23 implemented with the same structure, and the data for each dataset
i.e., BraTS 2017, BraTS 2018 and BraTS 2019 separately is used by
dividing the data set into two-dimensional data based on the axial axis.
For model training, NVIDIA’s Geforce RTX 2080 Ti (10 GB RAM) and
block. Firstly, 1 × 1 convolution is performed for matching the number
Tensor flow-gpu 2.2.0, CUDA v8.0, cuDNN v10.1 are used. Learning
of filters for inter-layer synthesis in the R2 block. Since the value is
parameters are selected by gradually decreasing the learning rate from
continuously corrected by synthesizing the input data for each layer
0.001 to 0.00001 with 16 batch size, 40 epochs, and a learning rate
through the recursive block, the value of the hidden layer coefficient reduce callback function. We have also developed a front end API for
is minimized through back-propagation. The applied recursive block is RAAGR2-Net which can be easily used by biological professionals. The
stacked twice to form a residual block and synthesized with the input implementation of the proposed work is made available on GitHub at:
value that matches the number of filters to output the precisely restored https://github.com/Rehman1995/RAAGR2-Net.
data.
6.2. Evaluation metric
5.4. Loss functions
The quantitative evaluation of the proposed RAAGR2-Net on dif-
The RAAGR2-Net architecture has a customized objective function ferent BraTS datasets is carried out for performance evaluation. The
𝐿𝑡𝑜𝑡𝑎𝑙 which is the sum of DiceScore and weighted cross entropy (WCE) performance evaluation in terms of statistics is covered in quantitative
as defined by the following equation, analysis. The DiceScore coefficient is utilized as the evaluation metric
for measuring the performance of the proposed model. Since DiceScore
𝐿𝑡𝑜𝑡𝑎𝑙 = 𝑊 𝐶𝐸 + 𝐷𝑖𝑐𝑒𝑆𝑐𝑜𝑟𝑒 (5) coefficient is usually employed in state-of-the-art methodologies, it
helps in conducting a fair quantitative performance comparison be-
Brain tumor represent a small part of the entire brain area. To
tween state-of-the-art methodologies and the proposed RAAGR2-Net
segment a tumor in a brain image, the boundary of an object needs to be
architecture. The DiceScore coefficient is given as,
detected, but when using the existing cross-entropy loss, the boundary
is ambiguous and cannot be distinguished. To solve this problem, a loss 2 × |𝑋 ∩ 𝐺|
𝐷𝑖𝑐𝑒𝑆𝑐𝑜𝑟𝑒 𝑐𝑜𝑒𝑓 𝑓 𝑖𝑐𝑖𝑒𝑛𝑡 = (8)
function called DiceScore is applied which is mathematically defined |𝑋| + |𝐺|
as, where, 𝑋 is the predicted output and G is the ground truth. The
∑ DiceScore coefficient ranges between 0 and 1 where 0 is considered
2 𝑁 𝑝 𝑖 𝑘𝑖
𝐷𝑖𝑐𝑒𝑆𝑐𝑜𝑟𝑒 = ∑𝑁 𝑖 ∑𝑁 (6) to be a poor performance and 1 is considered to be ideal performance.
2 2
𝑖 𝑝𝑖 + 𝑖 𝑘𝑖

While WCE is mathematically expressed as, 7. Results and discussion


𝑁
Extensive experiments are performed to evaluate the performance of
𝑊 𝐶𝐸 = − 𝑤𝑖 ∗ 𝑘𝑖 ∗ 𝑙𝑜𝑔(𝑝𝑖 ) (7)
𝑖
the proposed architecture. Firstly, performance evaluation and compar-
ison is performed based on improvement in MRI brain image segmenta-
where, 𝑝𝑖 and 𝑘𝑖 represent the corresponding predicted and ground
tion using the pre-processing block regardless of the architecture being
truth pixel values and 𝑤𝑖 represents the allocated weight to 𝑖th label.
used. Secondly, an ablation study of the proposed architecture is carried
The pixel values can either be 0 or 1. The numerator is the sum of all the
out which is followed by its comparison with existing state-of-the-art
boundary pixels that are correctly identified, whereas, the denominator
techniques.
is the sum of all the boundary pixels i.e., both predicted and ground
truth. 7.1. Performance evaluation utilizing pre-processing module
The DiceScore offers the advantage of higher precision in detecting
the boundary pixels in comparison to the existing loss function. Further- The performance improvement is assessed by utilizing the pre-
more, the DiceScore loss-based function alleviates the data imbalance processing module. The results of base-line U-Net architecture com-
problem between the existing classes to a certain extent. The average puted on the original dataset are compared as the effect of each
composition of each class for the brain MRI is shown in Table 1. It can pre-processing module on the dataset is evaluated on the performance
be observed that non-tumor region consists of approximately 98.46% of base-line U-Net. First, the brain MRI segmentation results on the
of the overall brain area. Edema consists of 1.02%, ET consists of BraTs 2017 dataset are obtained without applying any pre-processing to
0.29% and NCR & NET consists of 0.23% of the overall brain area. WT the dataset and results are obtained. Secondly, the results are computed
includes edema, necrotic and non-enhanced tumors (NCR & NET), and when the dataset is normalized and resampled using z-score normal-
enhanced tumors (ET). The tumor class includes NCR & NET and ET. ization and N4 bias field correction. Thirdly, the results are computed
Since ET represents reinforcing tumors, the problem of class imbalance when the dataset is normalized and data augmentation is applied.
appears due to the area occupied by each class. To solve this problem, Fig. 7 shows the segmentation results achieved using original data,
weighted cross entropy (WCE) & Dice loss are combined. When only normalized data and, augmented and normalized data using the U-
the DiceScore is considered, the system performs well, only for the Net architecture on the BraTs 2017 dataset. Column (a) and (b) in
entire tumor (WT) i.e., tumor with a large area. Whereas, the prediction Fig. 7 represent the ground truth and predicted output, respectively.
performance of ET i.e., tumor with a small area is relatively low and the Whereas, columns (c)–(e) represents different class regions of the tumor
prediction performance of the model decreases. However, when only i.e., TC, ET and WT respectively. The first row in Fig. 7 represents the
the WCE loss function is considered, the biased class is alleviated by segmentation results using U-Net and the original data i.e., data with no
giving high weight to the class with a relatively smaller area. pre-processing. The second row shows the segmentation results using

7
M.U. Rehman et al. Computers in Biology and Medicine 152 (2023) 106426

Table 3
Analysis on impact of each module of the RAAGR2-Net on the perfor-
mance of U-Net (ablation study) in terms of dice coefficient on the BraTs
2017 database.
Architecture/Module TC ET WT
U-Net [33] 0.762 0.700 0.870
Attention Gate 0.772 0.711 0.874
R2 Block 0.773 0.721 0.870
RASP Module 0.779 0.722 0.880
RAAGR2-Net 0.821 0.776 0.896

Fig. 7. Segmentation and reconstruction results of U-Net architecture performance over


different pre-processing stages.

Table 2
Performance comparison of U-Net architecture in terms of DiceScore and
evaluated on BraTS 2017 dataset using different pre-processing modules.
Condition TC ET WT
Original data 0.546 0.387 0.742
Normalized data 0.721 0.659 0.823
Normalized + Augmented 0.762 0.700 0.870
Fig. 8. Performance comparison of different modules using reconstructed and
segmentation results in terms of ablation study.

U-Net and, normalized and resampled data i.e., data obtained using z-
score normalization and N4 bias field correction. The third row shows module is used dice coefficient for all the regions is improved. Finally,
the segmentation results using U-Net and, normalized, resampled and when all the modules are cascaded together to form the RAAGR2-Net
augmented data. The normalization of data has shown vital importance architecture, the dice coefficient shows an improvement from 0.762
as can be visualized from Fig. 7 where a model trained from original to 0.821 for the TC region. When considering the ET region the dice
data has learned some unwanted features specifically for the edema coefficient improves from 0.700 to 0.776 and when the WT region is
region, which can be seen in column (b) of row 1. The normalization considered, the dice coefficient shows an improvement from 0.870 to
process has enabled the deep learning model to concentrate on im-
0.896. It can be observed from the overall results that each module
portant features only which resulted in lesser false positives. Further,
helps in improving the performance for brain segmentation. It can also
the NCR tumor region identification is improved by augmenting the
be observed that when all the modules are cascaded in the proposed
normalized data, which can be observed by comparing row 2 column
RAAGR2-Net architecture, it shows the best performance in all the
(b) image with row 3 column (b) image of Fig. 7.
regions.
In Table 2, the segmentation results on the BraTs 2017 dataset
The results obtained in Table 3 can also be observed in Fig. 8. The
is presented under various conditions, namely, using original data,
first row Fig. 8(1) in Fig. 8 shows the prediction result on the whole
normalized data, augmented and normalized data based on the U-Net
area i.e., WT for the brain MRI image. The second row Fig. 8(2) in
architecture. It can be observed from Table 2 that the segmentation
Fig. 8 shows the closed view of the predicted output. The third row
results improve once normalization is applied to the dataset. The seg-
Fig. 8(3) in Fig. 8 represents the closer view of edema region prediction
mentation results improve further when data augmentation is applied
made by different architectures. The Fig. 8 shows the results with
along with normalization. The results in Table 2 demonstrate that the
different modules and at different zoom levels so that a comparison
performance improves when data provided by BraTS is pre-processed
can be made visually. It can be observed in Fig. 8 that a large amount
and hence, depicting the importance of data pre-processing.
of extra region is identified as tumor region, indicating high false
7.2. Ablation study positives. But the RAAGR2-Net shows a prediction that is very similar
to the ground truth. From the results in Fig. 8, it can be observed that
To improve the performance based on pre-processed data, several each module has contributed towards improvement in the performance.
modules are utilized. The applied modules include AG, R2 block, and In the WT region it can be observed that as more modules are added
RASPP. To know the importance of each module an ablation study to the architecture, the segmentation prediction results improve. It can
is carried out. In the ablation study for comparison purposes, U-Net be observed visually that the red region due to miscalculation of TC is
architecture is used since it is usually utilized as a baseline model in widely distributed in case of baseline architecture i.e., Fig. 8(b) U-Net.
the literature. However, when evaluating Fig. 8-(f) RAAGR2-Net, it can be confirmed
Table 3 evaluates the performance achieved by utilizing the dif- that the erroneous results are reduced. It can be noticed that the area
ferent modules that build up the proposed RAAGR2-Net architecture where the WT part is miscalculated in Fig. 8(3) is also more precise in
on the BraTs 2017 database. The first column in Table 3 represents Fig. 8(f) compared to the existing baseline model Fig. 8(b).
the architecture or the module used with the U-Net architecture for
performance analysis of each module in terms of dice coefficient. The 7.3. Performance comparison of RAAGR2-Net with existing state-of-the-art
second to fourth column shows the results on the TC, ET and WT architectures
regions in the brain MRI images respectively. It can be observed that
the performance over the TC region improves when AG is used with The performance comparison of the proposed RAAGR2-Net archi-
the U-Net architecture in terms of dice coefficient. When the R2 block is tecture for brain tumor segmentation is carried out with existing state-
used with U-Net the performance improves for the TC region and the ET of-the-art architecture utilizing the three BraTS (2017, 2018 and 2019)
region while it remains the same for the WT region. When the RASPP datasets, using the Dice coefficient as an evaluation parameter.

8
M.U. Rehman et al. Computers in Biology and Medicine 152 (2023) 106426

Table 4 Table 6
Performance comparison of proposed RAAGR2-Net with state-of-the-art Performance comparison of proposed RAAGR2-Net with state-of-the-art
techniques on BraTs 2017 dataset in terms of dice coefficient. techniques on BraTs 2019 dataset in terms of dice coefficient.
Architecture TC ET WT Architecture TC ET WT
U-Net [33] 0.762 0.700 0.870 U-Net [33] 0.746 0.696 0.864
AttU-Net [42] 0.772 0.711 0.874 AttU-Net [42] 0.772 0.711 0.874
ResU-Net [57] 0.768 0.716 0.873 ResU-Net [57] 0.760 0.704 0.867
Novel Net [58] 0.763 0.642 0.876 MC-Net [61] 0.813 0.771 0.886
BU-Net [26] 0.783 0.736 0.892 GAN [62] 0.790 0.766 0.896
RAAGR2-Net 0.821 0.776 0.896 RAAGR2-Net 0.814 0.763 0.884

Table 5
Performance comparison of proposed RAAGR2-Net with state-of-the-art images. The proposed deep model is based on an encoder–decoder
techniques on BraTs 2018 dataset in terms of dice coefficient.
architecture, where the encoder extracts feature maps and the decoder
Architecture TC ET WT
restores the original size image through the emphasized features. Cur-
U-Net [33] 0.793 0.716 0.867
rent techniques suffer from the drawback that the process of extracting
TTA [60] 0.783 0.754 0.873
Ensemble Net [59] 0.738 0.760 0.885 the feature map in the encoding process leads to the loss of data
location information, which results in performance reduction. Another
RAAGR2-Net 0.814 0.767 0.869
drawback of state-of-the-art techniques is that as the model becomes
larger, the parameters increase and the batch size decreases in the
process of training the model, resulting in a decrease in performance.
Table 4 shows the performance comparison of existing state-of-the- To solve the above-mentioned problem, the RASPP module is proposed
art techniques and proposed RAAGR2-Net architecture on the BraTS that minimizes the loss of location information. At the decoding stage,
2017 dataset in terms of dice coefficient. It can be observed that the an attention technique is applied to minimize the loss of the position
proposed RAAGR2-Net is ranked top with highest dice coefficient of information value by emphasizing the feature points through the layers
over the TC, ET and WT regions. of the existing image. In addition, instead of using a general Conv2D
Table 5 shows the performance comparison of existing state-of- in the convolution layer, depthwise-separable Conv2D was used to
the-art techniques and RAAGR2-Net architecture on the BraTS 2018 improve the learning speed by reducing the number of parameters to
dataset. The RAAGR2-Net architecture performed best in the TC and a large extent minimizing the batch size. The proposed architecture
ET regions but for the WT region, the best performance is demonstrated is evaluated by utilizing the data augmentation and pre-processing
by Ensemble Net [59]. It can be observed that the proposed RAAGR2- techniques. The experimental results show that the proposed RAAGR2-
Net is ranked top with a dice coefficient of 0.814 and Ensemble-Net is Net architecture shows good performance and it can be used to segment
ranked second with a dice coefficient of 0.738 over the TC region. When brain MRI images by slicing the 3D data into 2D images. This can help
considering the ET region, the proposed RAAGR2-Net is ranked at the in reducing the number of parameters, batch size and computational
top again with a dice coefficient of 0.767 and Ensemble-Net [59] is expense. In future we intend to extend the scope of this architecture
ranked second with a dice coefficient of 0.760. Ensemble-Net is ranked for 3D segmentation of MRI.
at the top for the WT region with a dice coefficient of 0.885, and
TTA [60] is ranked second with a dice coefficient of 0.873. Whereas, CRediT authorship contribution statement
the proposed RAAGR2-Net is ranked third with a dice coefficient of
0.869. It must be considered here that Ensemble-Net and TTA are Mobeen Ur Rehman: Conceptualization, Methodology, Software,
both models that work on 3D data directly and therefore, they will
Writing – original draft, Writing – review & editing. Jihyoung Ryu:
have a higher number of parameters and higher computational time
Conceptualization, Methodology, Software, Writing – original draft,
in comparison to the proposed RAAGR2-Net.
Writing – review & editing. Imran Fareed Nizami: Conceptualization,
Table 6 shows the performance comparison of state-of-the-art ex-
Methodology, Validation, Supervision, Writing – review & editing.
isting techniques with the proposed RAAGR2-Net architecture on the
Kil To Chong: Conceptualization, Validation, Supervision, Writing –
BraTS 2019 validation dataset. For BraTS 2019, the segmentation re-
review & editing, Funding acquisition.
sults were compared with a 3D model or a special hybrid model, GAN.
The RAAGR2-Net architecture performed best over the TC region with
Declaration of competing interest
a dice coefficient of 0.819, MC-Net [61] is ranked second with a dice
coefficient of 0.813 and GAN is ranked third with a dice coefficient
The authors declare that they have no known competing finan-
of 0.790. For the ET region, MC-Net [61] is ranked at the top with
cial interests or personal relationships that could have appeared to
a dice coefficient of 0.771, GAN [62] is ranked second with a dice
influence the work reported in this paper.
coefficient of 0.766 and the proposed RAAGR2-Net is ranked third with
a dice coefficient of 0.763. For the WT region, GAN [62] is ranked
at the top with a dice coefficient of 0.896, MC-Net [61] is ranked Acknowledgments
second with a dice coefficient of 0.886 and the proposed RAAGR2-
Net is ranked third with a dice coefficient of 0.884. Moreover, it is This work was supported by the National Research Foundation
important to mention that GAN [62] and MC-Net [61] are 3D models of Korea (NRF) grant funded by the Korea government (MSIT) (No.
with a higher number of training parameters and therefore, they will re- 2020R1A2C2005612).
quire more computational resources when compared with the proposed
RAAGR2-Net architecture. References

8. Conclusion [1] A. Işın, C. Direkoğlu, M. Şah, Review of MRI-based brain tumor image seg-
mentation using deep learning methods, Procedia Comput. Sci. 102 (2016)
317–324.
In this study, a new architecture termed RAAGR2-Net is proposed to [2] S. Bauer, R. Wiest, L.-P. Nolte, M. Reyes, A survey of MRI-based medical image
segment brain tumors by slicing the MRI images into two-dimensional analysis for brain tumor studies, Phys. Med. Biol. 58 (13) (2013) R97.

9
M.U. Rehman et al. Computers in Biology and Medicine 152 (2023) 106426

[3] R. Leece, J. Xu, Q.T. Ostrom, Y. Chen, C. Kruchko, J.S. Barnholtz-Sloan, Global [26] M.U. Rehman, S. Cho, J.H. Kim, K.T. Chong, Bu-net: Brain tumor segmentation
incidence of malignant brain and other central nervous system tumors by using modified U-Net architecture, Electronics 9 (12) (2020) 2203.
histology, 2003–2007, Neuro-Oncology 19 (11) (2017) 1553–1564. [27] C. Zhou, C. Ding, X. Wang, Z. Lu, D. Tao, One-pass multi-task networks with
[4] T.A. Dolecek, J.M. Propp, N.E. Stroup, C. Kruchko, CBTRUS statistical report: cross-task guided attention for brain tumor segmentation, IEEE Trans. Image
Primary brain and central nervous system tumors diagnosed in the United States Process. 29 (2020) 4516–4529.
in 2005–2009, Neuro-Oncology 14 (suppl_5) (2012) v1–v49. [28] S. Cui, L. Mao, J. Jiang, C. Liu, S. Xiong, Automatic semantic segmentation
[5] D.N. Louis, A. Perry, G. Reifenberger, A. Von Deimling, D. Figarella-Branger, of brain gliomas from MRI images using a deep cascaded neural network, J.
W.K. Cavenee, H. Ohgaki, O.D. Wiestler, P. Kleihues, D.W. Ellison, The 2016 Healthc. Eng. 2018 (2018).
world health organization classification of tumors of the central nervous system: [29] M. Havaei, A. Davy, D. Warde-Farley, A. Biard, A. Courville, Y. Bengio, C. Pal, P.-
A summary, Acta Neuropathol. 131 (6) (2016) 803–820. M. Jodoin, H. Larochelle, Brain tumor segmentation with deep neural networks,
[6] R. Stupp, W.P. Mason, M.J. Van Den Bent, M. Weller, B. Fisher, M.J. Taphoorn, Med. Image Anal. 35 (2017) 18–31.
K. Belanger, A.A. Brandes, C. Marosi, U. Bogdahn, et al., Radiotherapy plus [30] G. Wang, W. Li, S. Ourselin, T. Vercauteren, Automatic brain tumor segmentation
concomitant and adjuvant temozolomide for glioblastoma, N. Engl. J. Med. 352 using cascaded anisotropic convolutional neural networks, in: International
(10) (2005) 987–996. MICCAI Brainlesion Workshop, Springer, 2017, pp. 178–190.
[7] S. Bakas, M. Reyes, A. Jakab, S. Bauer, M. Rempfler, A. Crimi, R.T. Shinohara, [31] K. Kamnitsas, W. Bai, E. Ferrante, S. McDonagh, M. Sinclair, N. Pawlowski, M.
C. Berger, S.M. Ha, M. Rozycki, et al., Identifying the best machine learning Rajchl, M. Lee, B. Kainz, D. Rueckert, et al., Ensembles of multiple models and
algorithms for brain tumor segmentation, progression assessment, and overall architectures for robust brain tumour segmentation, in: International MICCAI
survival prediction in the BRATS challenge, 2018, arXiv preprint arXiv:1811. Brainlesion Workshop, Springer, 2017, pp. 450–462.
02629. [32] D. Zhang, G. Huang, Q. Zhang, J. Han, J. Han, Y. Yu, Cross-modality deep feature
[8] B.H. Menze, K. Van Leemput, D. Lashkari, M.-A. Weber, N. Ayache, P. Golland, learning for brain tumor segmentation, Pattern Recognit. 110 (2021) 107562.
A generative model for brain tumor segmentation in multi-modal images, in: [33] O. Ronneberger, P. Fischer, T. Brox, U-net: Convolutional networks for
International Conference on Medical Image Computing and Computer-Assisted biomedical image segmentation, in: International Conference on Medical Image
Intervention, Springer, 2010, pp. 151–159. Computing and Computer-Assisted Intervention, Springer, 2015, pp. 234–241.
[9] N.U. Islam, S. Lee, J. Park, Accurate and consistent image-to-image conditional [34] H. Dong, G. Yang, F. Liu, Y. Mo, Y. Guo, Automatic brain tumor detection
adversarial network, Electronics 9 (3) (2020) 395. and segmentation using U-Net based fully convolutional networks, in: Annual
[10] N.U. Islam, J. Park, Depth estimation from a single RGB image using fine-tuned Conference on Medical Image Understanding and Analysis, Springer, 2017, pp.
generative adversarial network, IEEE Access 9 (2021) 32781–32794. 506–517.
[11] M.U. Rehman, S. Akhtar, M. Zakwan, M.H. Mahmood, Novel architecture with [35] M. Chen, Y. Wu, J. Wu, Aggregating multi-scale prediction based on 3D U-net
selected feature vector for effective classification of mitotic and non-mitotic cells in brain tumor segmentation, in: International MICCAI Brainlesion Workshop,
in breast cancer histology images, Biomed. Signal Process. Control 71 (2022) Springer, 2019, pp. 142–152.
103212. [36] N.M. Aboelenein, P. Songhao, A. Koubaa, A. Noor, A. Afifi, HTTU-Net: hybrid
two track U-Net for automatic brain tumor segmentation, IEEE Access 8 (2020)
[12] S. Wang, Y. Cong, H. Zhu, X. Chen, L. Qu, H. Fan, Q. Zhang, M. Liu, Multi-scale
101406–101415.
context-guided deep network for automated lesion segmentation with endoscopy
[37] G. Karayegen, M.F. Aksahin, Brain tumor prediction on MR images with semantic
images of gastrointestinal tract, IEEE J. Biomed. Health Inf. 25 (2) (2020)
segmentation by using deep learning network and 3D imaging of tumor region,
514–525.
Biomed. Signal Process. Control 66 (2021) 102458.
[13] H. Su, D. Zhao, H. Elmannai, A.A. Heidari, S. Bourouis, Z. Wu, Z. Cai, W. Gui, M.
[38] P. Mlynarski, H. Delingette, A. Criminisi, N. Ayache, 3D convolutional neural
Chen, Multilevel threshold image segmentation for COVID-19 chest radiography:
networks for tumor segmentation using long-range 2D context, Comput. Med.
A framework using horizontal and vertical multiverse optimization, Comput. Biol.
Imaging Graph. 73 (2019) 60–72.
Med. (2022) 105618.
[39] K. Kamnitsas, C. Ledig, V.F. Newcombe, J.P. Simpson, A.D. Kane, D.K. Menon,
[14] A. Qi, D. Zhao, F. Yu, A.A. Heidari, Z. Wu, Z. Cai, F. Alenezi, R.F. Mansour,
D. Rueckert, B. Glocker, Efficient multi-scale 3D CNN with fully connected CRF
H. Chen, M. Chen, Directional mutation and crossover boosted ant colony
for accurate brain lesion segmentation, Med. Image Anal. 36 (2017) 61–78.
optimization with application to COVID-19 X-ray image segmentation, Comput.
[40] A. Myronenko, 3D MRI brain tumor segmentation using autoencoder regular-
Biol. Med. 148 (2022) 105810.
ization, in: International MICCAI Brainlesion Workshop, Springer, 2018, pp.
[15] R. Hemelings, B. Elen, M.B. Blaschko, J. Jacob, I. Stalmans, P. De Boever,
311–320.
Pathological myopia classification with simultaneous lesion segmentation using
[41] J.M.J. Valanarasu, V.A. Sindagi, I. Hacihaliloglu, V.M. Patel, Kiu-Net: Over-
deep learning, Comput. Methods Programs Biomed. 199 (2021) 105920.
complete convolutional architectures for biomedical image and volumetric
[16] R. McKinley, R. Wepfer, F. Aschwanden, L. Grunder, R. Muri, C. Rummel, R.
segmentation, IEEE Trans. Med. Imaging 41 (4) (2021) 965–976.
Verma, C. Weisstanner, M. Reyes, A. Salmen, et al., Simultaneous lesion and
[42] J. Zhang, Z. Jiang, J. Dong, Y. Hou, B. Liu, Attention gate resU-Net for automatic
brain segmentation in multiple sclerosis using deep neural networks, Sci. Rep.
MRI brain tumor segmentation, IEEE Access 8 (2020) 58533–58545.
11 (1) (2021) 1–11.
[43] A. Roy Choudhury, R. Vanguri, S.R. Jambawalikar, P. Kumar, Segmentation of
[17] M. Tan, Q. Le, Efficientnet: Rethinking model scaling for convolutional neural brain tumors using DeepLabv3+, in: International MICCAI Brainlesion Workshop,
networks, in: International Conference on Machine Learning, PMLR, 2019, pp. Springer, 2018, pp. 154–167.
6105–6114. [44] S. Ghosh, A. Chaki, K. Santosh, Improved U-Net architecture with VGG-16 for
[18] Y. Yu, P. Decazes, J. Lapuyade-Lahorgue, I. Gardin, P. Vera, S. Ruan, Semi- brain tumor segmentation, Phys. Eng. Sci. Med. 44 (3) (2021) 703–712.
automatic lymphoma detection and segmentation using fully conditional random [45] T. Kalaiselvi, S. Padmapriya, Multimodal MRI brain tumor segmentation—A
fields, Comput. Med. Imaging Graph. 70 (2018) 1–7. ResNet-based U-Net approach, in: Brain Tumor MRI Image Segmentation using
[19] D. Zikic, B. Glocker, E. Konukoglu, A. Criminisi, C. Demiralp, J. Shotton, Deep Learning Techniques, Elsevier, 2022, pp. 123–135.
O.M. Thomas, T. Das, R. Jena, S.J. Price, Decision forests for tissue-specific [46] K. Hu, Q. Gan, Y. Zhang, S. Deng, F. Xiao, W. Huang, C. Cao, X. Gao, Brain
segmentation of high-grade gliomas in multi-channel MR, in: International tumor segmentation using multi-cascaded convolutional neural networks and
Conference on Medical Image Computing and Computer-Assisted Intervention, conditional random field, IEEE Access 7 (2019) 92615–92629.
Springer, 2012, pp. 369–376. [47] B.H. Menze, A. Jakab, S. Bauer, J. Kalpathy-Cramer, K. Farahani, J. Kirby, Y.
[20] N. Zhang, S. Ruan, S. Lebonvallet, Q. Liao, Y. Zhu, Kernel feature selection to Burren, N. Porz, J. Slotboom, R. Wiest, et al., The multimodal brain tumor image
fuse multi-spectral MRI images for brain tumor segmentation, Comput. Vis. Image segmentation benchmark (BRATS), IEEE Trans. Med. Imaging 34 (10) (2014)
Underst. 115 (2) (2011) 256–269. 1993–2024.
[21] S. Bauer, L.-P. Nolte, M. Reyes, Fully automatic segmentation of brain tumor [48] B.B. Avants, N. Tustison, G. Song, et al., Advanced normalization tools (ANTS),
images using support vector machine classification in combination with hierar- Insight J. 2 (365) (2009) 1–35.
chical conditional random field regularization, in: International Conference on [49] A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep
Medical Image Computing and Computer-Assisted Intervention, Springer, 2011, convolutional neural networks, Adv. Neural Inf. Process. Syst. 25 (2012).
pp. 354–361. [50] D.A. Van Dyk, X.-L. Meng, The art of data augmentation, J. Comput. Graph.
[22] J. Lapuyade-Lahorgue, J.-H. Xue, S. Ruan, Segmenting multi-source images using Statist. 10 (1) (2001) 1–50.
hidden Markov fields with copula-based multivariate statistical distributions, [51] D. Tellez, G. Litjens, P. Bándi, W. Bulten, J.-M. Bokhorst, F. Ciompi, J.
IEEE Trans. Image Process. 26 (7) (2017) 3187–3195. Van Der Laak, Quantifying the effects of data augmentation and stain color
[23] H. Liu, X. Xu, E. Li, S. Zhang, X. Li, Anomaly detection with representative normalization in convolutional neural networks for computational pathology,
neighbors, IEEE Trans. Neural Netw. Learn. Syst. (2021). Med. Image Anal. 58 (2019) 101544.
[24] K. Hu, L. Zhao, S. Feng, S. Zhang, Q. Zhou, X. Gao, Y. Guo, Colorectal [52] F. Wang, M. Jiang, C. Qian, S. Yang, C. Li, H. Zhang, X. Wang, X. Tang, Residual
polyp region extraction using saliency detection network with neutrosophic attention network for image classification, in: Proceedings of the IEEE Conference
enhancement, Comput. Biol. Med. 147 (2022) 105760. on Computer Vision and Pattern Recognition, 2017, pp. 3156–3164.
[25] M.U. Rehman, S. Cho, J. Kim, K.T. Chong, Brainseg-net: Brain tumor mr image [53] N. Abraham, N.M. Khan, A novel focal tversky loss function with improved atten-
segmentation via enhanced encoder–decoder network, Diagnostics 11 (2) (2021) tion u-net for lesion segmentation, in: 2019 IEEE 16th International Symposium
169. on Biomedical Imaging, ISBI 2019, IEEE, 2019, pp. 683–687.

10
M.U. Rehman et al. Computers in Biology and Medicine 152 (2023) 106426

[54] O. Oktay, J. Schlemper, L.L. Folgoc, M. Lee, M. Heinrich, K. Misawa, K. Mori, [59] A. Albiol, A. Albiol, F. Albiol, Extending 2D deep learning architectures to 3D
S. McDonagh, N.Y. Hammerla, B. Kainz, et al., Attention U-Net: Learning where image segmentation problems, in: International MICCAI Brainlesion Workshop,
to look for the pancreas, 2018, arXiv preprint arXiv:1804.03999. Springer, 2018, pp. 73–82.
[55] P. Krähenbühl, V. Koltun, Efficient inference in fully connected CRFs with [60] G. Wang, W. Li, S. Ourselin, T. Vercauteren, Automatic brain tumor segmen-
Gaussian edge potentials, Adv. Neural Inf. Process. Syst. 24 (2011). tation using convolutional neural networks with test-time augmentation, in:
[56] Y. Tai, J. Yang, X. Liu, Image super-resolution via deep recursive residual International MICCAI Brainlesion Workshop, Springer, 2018, pp. 61–72.
network, in: Proceedings of the IEEE Conference on Computer Vision and Pattern [61] X. Li, G. Luo, K. Wang, Multi-step cascaded networks for brain tumor seg-
Recognition, 2017, pp. 3147–3155. mentation, in: International MICCAI Brainlesion Workshop, Springer, 2019, pp.
[57] Z. Zhang, Q. Liu, Y. Wang, Road extraction by deep residual U-Net, IEEE Geosci. 163–173.
Remote Sens. Lett. 15 (5) (2018) 749–753. [62] M. Hamghalam, B. Lei, T. Wang, Brain tumor synthetic segmentation in 3D
[58] H. Li, A. Li, M. Wang, A novel end-to-end brain tumor segmentation method multimodal MRI scans, in: International MICCAI Brainlesion Workshop, Springer,
using improved fully convolutional networks, Comput. Biol. Med. 108 (2019) 2019, pp. 153–162.
150–160.

11

You might also like