You are on page 1of 19

Multimedia Tools and Applications

https://doi.org/10.1007/s11042-020-09087-y

A deep feature-based real-time system for Alzheimer


disease stage detection

Hina Nawaz 1 & Muazzam Maqsood 1 & Sitara Afzal 1 & Farhan Aadil 1 & Irfan Mehmood 2 &
Seungmin Rho 3

Received: 10 February 2020 / Revised: 28 April 2020 / Accepted: 15 May 2020

# Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract
The origin of dementia can be largely attributed to Alzheimer’s disease (AD). The
progressive nature of AD causes the brain cell deterioration that eventfully leads to
physical dependency and mental disability which hinders a person’s normal life. A
computer-aided diagnostic system is required that can aid physicians in diagnosing AD
in real-time. The AD stages classification remains an important research area. To extract
the deep-features, the traditional machine learning-based and deep learning-based
methods often require large dataset and that leads to class imbalance and overfitting
issues. To overcome this problem, the use an efficient transfer learning architecture to
extract deep features which are further used for AD stage classification. In this study, an
Alzheimer’s stage detection system is proposed based on deep features using a pre-trained
AlexNet model, by transferring the initial layers from pre-trained AlexNet model and
extract the deep features from the Convolutional Neural Network (CNN). For the
classification of extracted deep-features, we have used the widely used machine learning
algorithms including support vector machine (SVM), k-nearest neighbor (KNN), and
Random Forest (RF). The evaluation results of the proposed scheme show that a deep
feature-based model outperformed handcrafted and deep learning method with 99.21%
accuracy. The proposed model also outperforms existing state-of-the-art methods.

Keywords Alzheimer stage detection . Dementia . CNN . AlexNet . Deep features

* Muazzam Maqsood
muazzam.maqsood@cuiatk.edu.pk

1
Department of Computer Science, COMSATS University Islamabad, Attock Campus, Attock,
Pakistan
2
Department of Media Design and Technology, Faculty of Engineering & Informatics, University of
Bradford, Bradford, UK
3
Department of Software, Sejong University, Seoul 05006, South Korea
Multimedia Tools and Applications

1 Introduction

Alzheimer’s disease (Alzheimer’s or AD) is a progressive cognitive degenerative


disease leading to dementia that is marked with physical and mental disability. This
disease comprised of multi-stages from the subject with zero dementia signs to very
mild, mild, and moderate dementia. It hinders normal behavior, thought process, and
cause dysfunction in daily-life routine work [8, 23]. The disease progressively degen-
erate brain cells leading to short-term memory loss and cause linguistic and motor
impairment due to brain damage. Annually millions of people around the globe suffer
from AD. One study showed that around 5 million people alone in the US are suffering
from AD and if the ratio remains the same the number will be tripled by 2050 [7].
Despite the enormous research efforts being put in this matter AD still have a huge
deficit in its particular treatment [9, 36]. As AD is a progressive disease comprise of
multiple stages. If it can be diagnosed in early stages it can save many Alzheimer’s
patients from fatal fate [5]. Contemporary neuro-imaging based studies are showing
healthy signs for prior and reliable Alzheimer’s detection [2, 10, 37]. The neuro-
imaging based studies are mostly based on MRI signal as it is easily available with
higher disparity and spatial resolution [16, 24, 25].
Alzheimer’s disease is a multiclass problem but most of the available studies are based on
binary classification which only presents either subject have Alzheimer’s disease or not. This
remains insignificant in AD diagnosis as it is more important to diagnose the stage of
Alzheimer’s disease. The diagnosis of Alzheimer’s disease requires different clinical assess-
ments, which then leads to a large number of data samples. So this is not possible to manually
analyze data for AD stage detection. Though the term machine and deep learning are not
invented so long ago, and their techniques have been utilized in various medicinal image
problems especially, in the domain of CAD i.e. computer-aided diagnosis. The overall
performance of the model is enhanced by utilizing machine learning algorithms. To extract
and classify AD, many studies have utilized deep learning and machine learning techniques.
Over time, the image processing and machine learning domain progressed dramatically. For
the diagnosis of AD using a convolutional neural network, a huge amount of labeled data
samples is required to train the network. The availability of large labeled data remains an issue.
In image processing, ImageNet is one of the easily available datasets that are the most widely
used as it provides over one million natural images with a thousand distinguished classes [13].
CNN trained with such a huge amount of datasets is likely to produce highly accurate and
efficient results in the medical imagery genre [15, 29]. CNN can be used with diverse
techniques to help enhance the medical imagery category including a pre-trained network
model with an enormous amount of dataset with few tweaks in the classification model. As
traditional image description techniques with a low amount of feature amalgamation have
produced promising results. A state-of-the-art strategy was presented by [17] on 2D CT images
and 3D segmented axial images from a deep learning-based approach for the classification of
CT brain images. The [12] presented a comparative study of mild cognitive impairment (MCI)
versus AD. The existing work is based on handpicked features which are time taking process
which needs very sound knowledge of computer vision to understand which features suit this
problem. There is a strong need to design a real-time computer-aided diagnostic system that
can process the input MRI images and classify the patients to healthy or AD patients. For this,
the system should be able to efficiently classify AD stages of the patients in real-time and this
can also be proved helpful to physicians.
Multimedia Tools and Applications

In this paper, we propose a model that uses a transfer learning method to extract
discriminative deep features. The problem of class imbalance issues can be overcome
in this way that we transfer the initial layers of the pre-trained AlexNet model to our
AD dataset. we have proposed three separate models for the early detection of AD
diseases in real-time. At first, a handcrafted feature extraction method based on
textural and statistical features later assessed with multiple classifiers. Secondly a
deep leaning CNN model for automatic analysis of brain MRI signals/images for
Alzheimer’s projection. Lastly a transfer learning model for the analysis of deep
features trained over pre-trained AlexNet model. We evaluated the models by training
them over the OASIS dataset and classified the dataset in a progressive manner of
AD. The dataset comprising four stages from subjects without Alzheimer’s or normal,
very mild, mild, and moderate dementia using deep and transfer learning. The results
show the proposed deep feature-based method outperforms all other models and
published work.
The proposed research study has contributed to the following aspects:

& We propose and evaluate the performance of deep features using a transfer learning-based
approach for Alzheimer’s disease stage detection.
& We analyzed the images based on their Clinical Dementia Rating (CDR) as they represent
the progressive stages of Alzheimer’s.
& We also compare the performance of deep features and handcrafted features and their
performance for AD stage detection using multiple classifiers.

The rest of the paper is organized in the following manner. In section 2 related work is
presented, section 3 proposed methodology is presented, while section 4 presents results, and
the study is concluded in section 5.

2 Related work

In recent decades, many techniques based on classification for Alzheimer’s disease detection
have been proposed. After analysis and consideration, we have categorized the techniques. The
classification types include both binary and multiclass classification schemes.

2.1 Techniques based on binary classification

In a study [8, 23], a multi-level technique was proposed for Alzheimer’s detection is
presented. This method partitioned input images in the preprocessing stage to Cerebral
Spinal Fluid, White matter, and GM. In this method, the ROI based model was proposed
and with the help of this approach statistical-based features were extracted. These features
were later used with clinical features to classify subjects with and without AD symptoms.
In this study, the focus was on GM’s volume contraction. It was observed that both locally
and globally the gray matter diminishes with the help of voxel-based morphometry
(VBM). The areas which showed a reduction in volume were later used sub-divisions of
the volume of interest (VOI). The VOIs were optimized with a genetic algorithm after
feature harvest based on VOI. Then these optimized features were classified with SVM
that produced 84.17% accuracy. Likewise in a prior study [7], VOIs were selected based
Multimedia Tools and Applications

on areas with reduced volumes in GM. With the use of feature categorization techniques,
the VOIs voxel values were used as raw features. Then SVM classification was applied on
these designated features which produced 92.48% accuracy. In another study [18] a feature
extracting technique has been presented based on inter-subject substantial volatility. In
another study presented an attribute model that was established on the sulcal medical
surface that was used to distinguish between subjects with Alzheimer’s and normal
patients [31]. The data used in this regard were gathered from diverse backgrounds of
patients that produced an accuracy of 87.9%. In another study [32] presented that was
based on the corpus-callosum diverse layout with T1 weighted MRI segmentation. With
the help of Laplace Beltrami eigenvalue shape descriptor, it helps morphological attributes
extraction. On the bases of information gain (IG) ranking it was later classified with KNN
and SVM. KNN outperformed the SVM and produced an accuracy of 93.7%. The study of
deep learning-based on CNN was proposed by [33]. The CNN showed promising accuracy
of 98.4% and which was used with auto-encoder.

2.2 Techniques based on multiclass classification

A heterogeneous vector based on attributes of the hippocampus region attributed to layout and
texture with cortical thickness was proposed and evaluated [3]. An algorithm of linear discriminant
analysis was used for MRI image classification with a feature vector. The ADNI used for
Alzheimer’s disease neuro-imaging based technique and produced an accuracy of 62.7%. Another
study proposed based on features of different parts of the brain including CSF, GM, and WM with
integration was demonstrated [6]. These extracted features; textural and statistical were later used to
determine the progression of AD stages either normal or MCI, the dataset of ADNI was used for this
study [3] and provided an accuracy of 79.8%. D. Chitradevi et al. [11] proposed an approach to
diagnosing the Alzheimer’s disease by automatically segmented the sub-region of the cerebral. They
segment the brain regions into white matter, grey matter, and hippocampus. After segmentation,
they applied distinct machine learning classifiers including PSO and Grey-wolf optimization
techniques for the diagnosis of AD and attained 98% highest performance accuracy by using the
Grey-Wolf optimization approach. Hao et al. [20] proposed a multi-class study to detect AD. They
have utilized ADNI data samples to distinguish MCI, NC, and AD. By applying the thresholding
and selecting points they attained an accuracy of 95% for multimodal AD classification. Chihun
Park et al. [30] proposed a multi-class study to predict the AD by using large expression and DNA
data samples. For their findings, they attained an 82.3% performance accuracy. Arifa et al. [35]
proposed an approach to diagnosing the AD by extracting the hybrid feature and utilized the CNN.
They attained performance accuracy for their findings.
The above techniques seem to perform well with both binary classification and
multiclass classification to classify Alzheimer’s disease. The computer-aided diagnos-
tic (CAD) was designed using MRI images as they were using traditional machine
learning architectures in the field of medical or simple image processing and produced
encouraging results. The above-mentioned approaches performed well but have some
limitations like overfitting and class imbalance issue, which is the major problem in
the multiclass classification of the AD. Moreover, insufficient data samples are also
challenging for the researcher to effectively classify Alzheimer’s. In our proposed
method, we address these limitations by transferring the data from a pre-trained neural
network to extract deep features that are more discriminative as compared to simple
deep learning features and achieved high results in multiclass AD.
Multimedia Tools and Applications

3 Proposed methodology

To determine the early stage of Alzheimer’s disease based on multiclass classification we have
proposed the following models shown in Fig. 1. In the first model, we have proposed a
handcrafted features model based on textural and statistical features. Then we took the
preprocessed image dataset and extracted the handcrafted features from these images and then
we fed them to the classifiers like KNN, Support Vector Machine, and Random Forest for
multiclass classification. In the second step for CNN based deep learning model, we have
taken multiclass classification data based on its CDR dataset that distinguishes progression
stages of AD. Then we have run the model with the dataset, to find out the accuracy of the
multiclass classification model. In CNN deep learning-based model, we trained and tested the
model from the scratch on the preprocessed dataset. In the third experiment, we have taken
AlexNet based deep features model to extract from the dataset and used these features from
classifiers like KNN, SVM, and RF to determine the best classifiers for deep feature detection.
In the deep features model, we passed the preprocessed image data for CNN i-e AlexNet for
automatic deep feature extraction. That helps to detect the early AD offset. Then the deep
features model uses those features to train the classifiers. The deep features were extracted
from the fully connected layers of (FCL6, FCL7, and FCL8) and convolutional layers of
(Conv4 & Conv5) of the pre-trained AlexNet model. Then we applied the KNN, SVM, and RF
classifiers to extracted features for the early detection of Alzheimer’s disease.

3.1 Alzheimer’s detection using handcrafted features

In this regard initially, we extracted key features to differentiate the phenomenon under
consideration of early detection of AD. Few features were categorized each one representing

Fig. 1 The methodology of the proposed model The detailed description of the model is elaborated
below sections
Multimedia Tools and Applications

textural and statistical base properties. We used textural features and statistical features
elaborated in Table 1. Our data was in sparse form, so we preprocessed the dataset to tackle
the sparsity issue. In the last step, we efficiently restricted the feature space by collecting the
best-served features to pass them to classifiers for ad detection.

3.1.1 Data preprocessing

In this step, MRI images were sliced to get the view of different brain angles i.e. Axial,
Coronal, and Sagittal. Feature vectors of all these views were formed on their CDR basis
individually and later concatenated to pass them to different classifiers for classification.

3.1.2 Feature extraction

The elicitation of detailed information on compact and concise data estimation of color,
texture, consistency, boundary information, contour, and edge details are termed as
feature extraction [19]. These features paint a complete assessment of an image to
perform different computational, statistical, and textual analysis. These features extrac-
tion strategies reduce the computational overload in resource management i.e. memory
and computational cost as they represent complex information in simple feature vectors.
Further details are in the following sections.

Gray level co-occurrence matrix (GLCM) The method of GLCM for texture assessment was
initially presented by Haralick [21] in 1973. Later this algorithm experimented further, and
14 new statistical features were projected which were also known as GLCM features.
Based on the image texture, these features are used for texture-based extraction edge
identification, boundary detection, and more estimation, calculation, and classification
tasks [34]. In a specific image different gray levels are definitively identified for GLCM
working. In most of the grayscale images, the pixels of a specific region are highly

Table 1 Description of Handcrafted features

Features Description

Statistical Features
Mean Mean is the central likelihood of the data under probe
Median It measures the inclination of centrality
Geo-mean Entropy It is also an average value of data but unlike the arithmetic mean, it multiplies
items to get output.
Inter-qualitative It is a measure of image disorder.
Textual Features
Skewness It measures how “lopsided” the distributions of pixel values are
Contrast An aggregate measure of square divergence
Correlation Associated probability of specified concurrent pairs of pixels
HOG Measure the adjacent gradient that occurs in a specific region in a specific number
of times.
Kurtosis Interpreted in combination with noise and resolution measurement.
Homogeneity Proximity measure of component, like how adjacent the GLCM components to
its diagonal.
Graycomatrix It scales the input image.
Graycoprops It normalizes GLCM.
Multimedia Tools and Applications

correspondent as they are in the same vicinity. Hence these types of images are likely to
produce diagonally scattered in a concurrent vector. The GLCM measure includes pixel
frequency of a gray level value ‘i’ that got a spatial affinity with ‘p’ which represents a
gray level value within a specific area [27]. The GLCM vector is the most squared size that
is equal to the number of the highest image’s gray level. Every discrete concurrent element
‘j’ of the concurrent vector that shows the probability of concurrent ‘i’ and ‘p’ that have a
mutual spatial affinity. In MRI images the pixel values of gray levels modify progressive-
ly, hence the model of GLCM is likely to produce a concurrent vector that is an enabling
factor for better classification.

Contrast Contrast is the ‘aggregate of square divergence’. As the amount of (k-l) increases the
amount of contrast also increases, but when the values are correspondent then the values
remain the same. The contrast in terms of GLCM measured as follows
n−1
∑ Qk; lðk−lÞ^2 ð1Þ
k;l¼0

Correlation Correlation is the associated probability of specified concurrent pairs of pixels.


The images with the same correlation have the same means as μ1, μ2, and the same variance
as σ1, σ2. Correlation is intersected probability and measured as
∑a;b ða−μ1Þðb−μ2ÞPrθða; bÞ=σ1σ2 ð2Þ

Homogeneity Homogeneity is the measure of proximity of component, like how adjacent the
GLCM components to its diagonal. The measure of homogeneity is reversed of contrast
measure and can be computed as
 
∑a;b Prθða; bÞ=1 þ ja−bj2 ð3Þ

Entropy Entropy is the computation of image disorder. When the images have an uneven texture,
the measure of entropy value is high. Entropy and GLCM are usually inversely proportional as
with increasing entropy the GLCM values decreases. The measure of entropy is given below

−∑na;b Prθða; bÞlogPrθða; bÞ ð4Þ

Hog The HOG is the frequency measurement of the adjacent gradients that are located in an
explicit region. The boundary information and object shapes are identified by HOG for
classification. HOG descriptor is used to extract feature vector by saving the regional shape
information of the image. The default cell size of HOG is 8*8. The MRI images have the same
information in a compact environment so based on its specifications the large window formed
better findings [28]

Statistical features The statistical features are general stats-based features that are not related
do the image’s texture but are mathematical measures of statistical properties of the images.
Their descriptions of used statistical features can be observed in Table 1.
Multimedia Tools and Applications

3.1.3 Classification

A multiclass classification was done with Alzheimer’s dataset. For the classification of
multiclass Alzheimer’s dataset, different classifiers were trained and analyzed like SVM,
KNN, and RF to validate the performance of features that we have extracted. All mentioned
classifiers are trained and tested individually with CDR values data based on their stages as;
normal, very mild, mild, and moderate for AD detection.

SVM An SVM is a machine learning-based algorithm that is broadly used for classi-
fication matter. In SVM based environment to locate the best hyper-plane every
feature of a vector is plotted in an N-Dimensional field, which is later used to detach
data points under consideration for training purposes. When the algorithm is trained
then it is tested with testing data of the hyper-plane and distinctly assigns the classes
to relevant testing data. For input data classification firstly the ci are identified as
support vectors, their bias is b, and weight is wgi. Following equation is used for the
classification of data
 d
K ðc; ciÞ ¼ γcT *ci þ m ð5Þ

Where γ>0 for a polynomial function


C ¼ ∑ wgik ðci; cÞ þ b ð6Þ
i

Here the kernel function is represented with k, wgi is the weight of i, support vector is
represented as ci, c is input data which has been passed on to the classifier, and b is biased.

K-nearest neighbor The KNN classifier classifies the test sample based on k- neighbors. The
output classes of the samples are determined based on test results in terms of feature vector’s
proximity. KNN is a machine learning technique which uses distance measure like Euclidean
distance Eq. (7) to determine the class sample.
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
n
jq1−q2j ¼ ∑ ðq1−q2Þ2 ð7Þ
i−1

Two distance points q1 and q2 can be measured with Euclidean distance showed in
Eq. (7) where n = current feature set of q1 and q2. We have used k = 9 for feature
classification. KNN finds the nearest nine neighbors with the help of Euclidean
distance and forecasted the result.

Random Forest The RF is an ensemble learning classification and regression method.


It is consisting of multiple trees that are not related to each other and estimates varied
numbers of data samples and compute their average to improve the accuracy value of
prediction. It also limits and controls the problem of overfitting. The following can be
used to measure RF.

M ðxÞ ¼ arg maxY ∈L1 = ð∑T ¼1ðyjxÞÞ ð8Þ


T i
Multimedia Tools and Applications

Here T represents forest trees; tree nodes are i with a probability of pi (y|x) of yɛL which
is measure forest training. The acquisition of classification regulation is possible through
the tree I which gave the maximum probability label. It can be obtained through Eq. (8).
Using fewer trees in the forest can be efficient in energy consumption wise but it hinders
the classification results.

3.2 Alzheimer’s detection: A deep learning CNN model

3.2.1 Data preprocessing

To accomplish a desirable outcome the preprocessing is an important step for data preparation.
The images from the OASIS dataset were sliced to get three dimensions of the image i.e.
Axial, Coronal, and Sagittal are with the dimension of 256*256. As the CNN based network
images should be 227*227*3, so these images are resized according to a given requirement.
After this step of preprocessing, we applied the deep learning CNN model for classification.
As the deep learning CNN is an efficient mode in medical imaging for classification. Then
these three angles of images were used to the extraction of a feature vector.

3.2.2 Convolutional neural network

The input layer In the image input layer, the size of images has been specified as 227*227*3.
These numbers translate the height, width, and channel size of the images. We do not need to
shuffle the data as ‘train network’ at the beginning of the training of network and each epoch
initiation automatically shuffles the dataset by itself.

The convolutional layer The convolutional layer is the most integral part of CNN. The whole
utility of the convolutional layer is around its performance with learnable kernels’. It is also
known as the master layer of the network. It contains the parameter and defined number of
feature maps. Padding is used as name measure pair to use padding as input for feature map.
For the convolution layer (CL) with a pre-set value of 1 defined as the same padding
guarantees similar spatial input-output size.

Batch normalization layer The Batch Normalization Layer is used to make network
training an easy optimization hassle. It helps normalize gradient and trigger the
propagation of the network. The layer of batch normalization is used between
nonlinearities and convolutional layers like Rectified Linear Unit (ReLU) layers that
accelerate the training of the network.

ReLU layer ReLU layer stands for “Rectified Linear Unit”. It is an activation function, most
widely utilized in the neural network frameworks. After Batch Normalization Layer a nonlin-
ear operation ReLU is placed.

Max-pooling layer This layer is used for spatial size feature map reduction and it discards the
redundancy from spatial information. Due to sampling reduction, the numbers of filters in deep
convolutional layers can be multiplied without increasing the per layer computation. This layer
gave back the highest number of rectangular inputs in a locality.
Multimedia Tools and Applications

Fully connected layer As the name of the layer suggests that the neuron of this layer are connected
to all the neurons in all the previous layers. The fully connected layer merges the entire previously
learned layers feature from all images to establish a wide pattern. The concluding layer incorporates
all the features for classification. The ‘outputs’ of the fully connected layer are equal to the number of
classes presented in the image dataset and here the number of classes is equal to 4.

Softmax layer To normalize the output of a fully connected layer softmax layer is used. This
layer output consists of positive integers that are sum to 1, which can be used by the
classification layer as classification probability.

Classification layer The final layer is the classification layer. This layer uses softmax layers
returned probabilities to each input to authorize to any of manually privileged classes and
calculate the loss.

Training options After network architecture, the training options are specified in this step.
Stochastic Gradient Descent with Momentum (SDGM) with maximum epochs of ‘10’ is used
for network training, which is the entire training data cycle. Data is shuffled on the initiation of
each epoch. The network train the data on training dataset and accuracy is calculated with
validation data at regular intervals.

Network training The network is trained using the above-mentioned architecture. The train-
ing options were identified, training the network for the early detection of Alzheimer’s. These
options comprised of the epoch, learning rate, batch size, and validation frequency. The mini-
batch loss, validation loss, and accuracy can be observed with plot projection which is a cross-
entropy loss. The produced accuracy is the number of images that have been classified by
network unbiased and correctly.

Network testing To measure the performance and efficiency of the training network the
remaining data was fed to network as testing data. With an accuracy gauge, the performance
assessment of the network is measured. On the assessment of data testing, it can be determined
how fine a network has classified the AD offset.

3.3 Deep features model for Alzheimer’s detection

The comprehensive sequences of steps are laid out for Alzheimer’s disease early detection via
deep feature extraction from CNN is elaborated in Figs. 2 and 3. In the first figure, a transfer
learning approach is explained while the second figure explains the feature extraction process.
In the first place, we collected the MRI dataset from the OASIS repository. In the next step to
automate the process of extraction of features begin, for this purpose, we used AlexNet a pre-
trained convolution neural network, and extracted features from it. For this reason, we
preprocessed the data and pass it to CNN for feature extraction.
For feature extraction, we have utilized the multiple parts of the CNN layers i-e fully
connected layers of layer 6, layer 7, & layer 8 and from convolution layers of the model which
include layer including layer 4 & layer 5. This process of feature extraction took a lot of
processing time for classification as the CNN layers have high volume. This layer is created
with the ‘classification layer’ function. The final layer is the classification layer. This layer uses
Multimedia Tools and Applications

Fig. 2 Transfer Learning Architecture for Alzheimer’s Detection

softmax layers returned probabilities to each input to authorize to any of manually privileged
classes and calculate the loss.

3.3.1 Data preprocessing

To accomplish a desirable outcome the preprocessing is an important step for data


preparation. As the images taken from the OASIS dataset have dimensions of 256*256

Fig. 3 Alzheimer’s detection using deep feature extraction and classification model
Multimedia Tools and Applications

and the CNN based on pre-learned AlexNet only work with images, input size of image of
227*227*3, so we resize all the MRI scans to fit them with AlexNet input layer. After
resizing we fed these images to AlexNet which is a pre-trained network on ImageNet
repository to acquire deep features.

3.3.2 Convolutional neural network: Feature extraction

In the next step after preprocessing, we started the extraction of deep features from AlexNet. It
is a pre-trained CNN model which is used for feature extraction is based in ImageNet dataset
that is broadly used for classification complexities. AlexNet used the transfer learning method
for network training and testing. AlexNet is commonly comprised of five convolutional layers,
three max-pooling layers, and three fully-connected layers. The true accomplishment spirit of
AlexNet lies in ReLU a nonsystematic tier and non-conformed uniformed system. It can be
seen in eq. (9) the training rate acceleration that can avert the problem of overfitting as it is a
half-wave corrective measure.
f ðxÞD maxðx; 0Þ ð9Þ

A nonsystematic approach as uninformed as some hidden or visible neurons to value 0 to


lower the effects of co-regulation made by neurons. A comprehensive overview of AlexNet
architecture is given below section:

The input layer l In this step, the preprocessed MRI whole images including all the three
slices namely sagittal, coronal, and axial of size 227*227*3 are fed to the input layer.

The convolutional layer (CL1) The convolutional layer is the most vital part of the convolu-
tion neural network. The whole idea of this CL is revolved the learnable kernels’ utility. For
the feature extraction process, the master layer is a convolutional layer. Usually, the
convolutional layer comprised of characteristic maps. The efficient way is to gather productive
feature from the image according to CL size. That’s why CL kernel is important to extract
productive features that will eventually help enhance the processing of the convolutional
neural process. The convolution layer has a layout of to be exact 11*11. This first
convolutional layer contains a total of 256 kernels in it. The spread of CL1 is on each unit
to be specific 11*1. The functioning of the convolutional layer is presented below
 
xlk ¼ f ∑i∈Mk xil−1 *Qlik þ blk ð10Þ

Here convolution feature map is represented with k, the input map selection is represented with
Mk, the filter is Qik and lastly, the bias of the feature map is bk.

Max-pooling Layerl (PL1) This layered work on the convolution attribute maps with the
highest pooling scheme that works under the collective neighborhood aperture. The max-
pooling usually has 96 attribute maps with a size of 27*27. The max-pooling layer’s main
objective is to identify the highest value in the pooling area.

Convolutional layer 2 (CL2) The next max-pooling layer for feature extraction is
convolutional layer2 that has some contrast and commonalities with CL1. The CL2 layer
takes the product of CL1 as its input and carrier it as a size of 5*5 convolutional chunks. This
Multimedia Tools and Applications

second convolutional layer contains a total of 384 kernels in it. That will produce the 384-
feature map of size 27*27. The attribute maps of CL2 cannot be achieved directly rather by
combining all or a few chunks of the PL1 feature map. The remaining convolutional layers
contain 384, 256, and 256 kernels in them.

Remaining max-Pooling & Convolutional Layers These max-pooling and convolutional


layers have distinct attribute maps, yet they are quite similar to the initial two layers. The
range of CL5 is 13*13 after convolution. The max-pooling layer pursues the course of CL2
and CL5 with a kernel measure of 3*3. The precision rate is only 40% when the initial two
layers are evaluated in a convolution neural network. With the inward convolution, the features
are more particular and vivid. This shows that inward convolution has an exceptional impact
on the working of the convolution neural network.

Fully connected layer (FCL) These layers’ neurons are related to neurons of neighboring
layers. The arrangements of neurons are similar to ANN; artificial neural network. In AlexNet
the three layers are fully connected. These layers are combined carriers to obtain dimensional
feature vector in our case containing 4096 instances from the 382-row sample. The features are
directly extracted from these connected layers to be used for forecast and classification
purposes. Here the features are being extracted from the CL’s; conv4 and conv5 and entirely
fc6 and fc7 connected layers. The range of layers 6 and 7 is 1–4096. Then this feature vector is
fed to different algorithms of classification.

Replace the last layer As the last fully connected layers of the AlexNet learn target
classification in comparison with the early layer of the network as they contain low-level
features. Since we are working and training with our processed dataset so there is a need to
alternate the classification layers with new layers for feature classification. We have replaced
the classification layers and shifted the initial five layers of the AlexNet. The framework used
to produce the fully connected layers comprises of following; output size, bias-learn, weight
learn the factor. The number of output classes and fully connected output volume is equal. For
layers bias, a learning ratio controller restricts the learned bias parameter. For layers weight,
the learning rate is controlled by weight learn components.

Network training The AlexNet network is comprised of transferred layers from; a pre-trained
network and 3 fully connected layers, softmax layer, and classification layer which are newly
adopted. This makes the number to a total of five transfer layers. The newly adopted layers are
only trained with Alzheimer’s data for accurate classification assessment. For network training
purposes only 77% whole MRI data is passed to the network. The training options of the
network include batch volume, number of epochs, learning volume, and validation frequency.
For training purposes, a maximum of 10 epochs has been utilized. The algorithm notifies the
bias and weight measures via minimizing the outcome loss, as it applies the training param-
eters on training data. The final adopted layer of the network learns the features of Alzheimer’s
multiclass data. During network training, we changed the learning parameters of the network,
factor of the learning rate, and learning of bias rate factor and especially altered the epoch
number to check the impacts of these parameter alterations. To acquire the optimal results, we
vary these options. The learning rates oscillate from 1e- 1 to 1e- 10. Likewise, the bias rate and
weight learn 1e- 4 and 10–100 respectively to get optimal results. Here the MiniBatchSize is
equal to 10.
Multimedia Tools and Applications

Network testing For the analysis and assessment of the training performance, the rest of 23%
of the dataset was passed as a testing batch of data to the trained network. On the bases of the
accuracy metric, we draw our assessment of the trained network. The testing result of data
represents how well is the network is trained for Alzheimer’s classification.

3.3.3 Classification algorithms

We used a CNN for feature vector extraction. This feature vector is later used to plot a model
for training purpose to anticipate legitimate ideograph or not. For training purposes, we have
used SVM, KNN, and RF. These classification algorithms have been already elaborated in
earlier section 3.

4 Experimental results

This section is about the dataset, experimentation of the proposed framework, and their
outcome are discussed.

4.1 Dataset details

The dataset for Alzheimer’s disease is from publicly available OASIS. The repository
of the dataset consists of MRI scans. The detailed description of the dataset is
elaborated in Table 2 and Table 3. The dataset consists of a cross-sectional MRI
with 3Dimensional main views of the brain that are comprised of the following parts
Coronal, Axial, and Sagittal. These images were later split into testing and training
samples. The training images were 77% and the testing ratio is 23%. These samples
were made sure to have all four stages of data.

4.2 Performance evaluation

For the performance analysis of the proposed models, the accuracy was computed. The
accuracy metric can be defined as

Accuracy ¼ Aa Ac *100 ð11Þ

Where Aa is defined as accurately classified results and Ac as completely classified results.


The contrasts between the accuracies of machine learning algorithms are woven below.

Table 2 Dataset description

Clinical Dementia No of Corresponding


Rate (CDR) Images Mental State

0 167 No Dementia
0.5 87 Very Mild Dementia
1 105 Mild Dementia
2 23 Moderate Dementia
Multimedia Tools and Applications

Table 3 Statistical summary of the dataset

Attributes Normal Early Stages

Age(years) 76.4 ± 7.8 76 ± 7.5


Gender(M/FM) 29/46 20/55
Education(years) 3.16 ± 1.2 20.85 ± 1.3
Mini-Mental Exam Score 28.89 ± 1.3 24 ± 4.0
CDR 0 1

4.3 Results for handcrafted features based model

The handcrafted features are extracted and used for Alzheimer’s disease early detection. These
features are textural and statistical i.e. contrast, correlation, homogeneity, HOG, mean, kurtosis, and
skewness, etc. Then later these features were run with multiple machine learning classifiers. After
feature vector extraction we pass these vectors to classifiers like SVM, KNN, and RF individually
based on their CDR value. For the features vector of SVM, the average accuracy of all CDR values
was 79.63%, for KNN the average feature vector accuracy computed as 80.31%. For the classifier
of RF, the average accuracy of all CDR values results as 84.93%. This shows that for handcrafted
features the highest performing classifier is RF. For classification purposes, the testing option of all
classifiers was 10 folds cross-validation with a batch size of 100. For the KNN classifier, the value
was k = 9. Later the average values were used to compute the accuracy value. The performance
analyzes of the classifiers are presented in Fig. 4. With regard to their accuracies. The experimental
results show that the RF classifier performed best with an accuracy of 84.93% in compression of
remaining classifiers KNN and SVM. Because of its generalization and random average point
intake and produced the highest accuracy value for handcrafted features for Alzheimer’s detection.

4.4 Results of a deep learning-based model

With the early detection of Alzheimer’s disease with multiclass classification, we have created
a deep learning-based widely used CNN. For this, we have used CNN architecture that is built

Fig. 4 Classification analysis of handcrafted features performance


Multimedia Tools and Applications

from scratch. CNN as an essential tool for deep learning and are especially suited for the
recognition and classification of images. The classification procedure is completely automated
in deep learning CNN. The desired output is produced automatically by model while learning
from the given input.
While experimenting we have played with networks deep learning process. We
changed the learning rate of the network, factor of the learning rate, and learning of
bias rate factor and especially altered the epoch number to check the impacts of these
parameter alterations. To acquire the optimal results, we vary these options. The learning
rates oscillate from 1e- 1 to 1e- 10. Likewise, the bias rate and weight learn 1e- 4 and
10–100 respectively to get optimal results. Here the Mini Batch Size is equal to 10. This
deep learning for multiclass Alzheimer’s detection approach was tested with 6 epochs, 8
epochs, 10 epochs, and 15 epochs to determine the optimal epoch’s number. We
achieved the classification accuracy of 92.85% with 10 epoch size.

4.5 Results of CNN: Deep features based model

The parameters of the CNN are pre-trained over the dataset of ImageNet for deep feature
extraction and classification. The AlexNet model is pre-trained with labeled images of over
one million. These images are comprised of around 1000 different categories across the
network. The model results show that it has learned large feature representations from a very
large chunk of images. There is a large number of labeled images are present in this dataset
which helps to learn the classifier low-level features. In this deep feature transfer learning-
based model, the automation of classification and feature extraction has been done. In this
model, MRI images have been passed as input, and Alzheimer’s detection results are attained
as output. The deep feature model learns from the input and produces output for us. For
classification, we have used three different machine learning classifiers SVM, KNN, and RF to
assess the performance as used for the handcrafted feature model but we did not use the same
feature vectors. Instead, here we used deep features extracted from AlexNet. Table 4 represents
the results gathered from AlexNet. These results are obtained from AlexNet’s fully connected
layer FC7 which is layer 7 of the network. The dimensional features are 4096 from a sample of
382. Result analysis showed that SVM in comparison with KNN and RF performed the best as
SVM achieves the highest accuracy of 99.21% and the accuracy of KNN 57.32% and RF
achieve the accuracy of 93.97% respectively.
If we compare both handcrafted and deep features with each other it is clear that deep
features achieved higher accuracy. The handcrafted features achieve an accuracy of 84.93%.
We have drawn results from a comparative study of three models where each model is
designed for early detection of Alzheimer’s disease. Our observations show that deep features
based on transfer learning outperformed remaining models. It obtained an accuracy of 99.21%
for the given dataset exhibited in Fig. 5. Alzheimer’s detection using deep learning CNN based

Table 4 Results of deep feature classifiers

Classifiers Accuracy MAE RMSE Relative-Absolute Root-Relative


Error Absolute Error

SVM 99.21% 0.2507 0.3124 73.8602% 75.8777%


KNN(K = 9) 57.32% 0.2369 0.3409 69.8116% 82.7922%
Random Forest 93.97% 0.2041 0.2628 60.1295% 63.8364%
Multimedia Tools and Applications

Fig. 5 Comparison of proposed models in terms of accuracy

model stood second in the running as it achieves the accuracy of 92.85% for multiclass
classification. The last model achieved an accuracy of 84.93% for handcrafted feature classi-
fication. For classification, we have used an SVM, KNN, and RF for both deep features and
handcrafted feature classification. Figure 4 represents the performance of handcrafted features.
This side by side review of multiclass classification showed; that for early detection of AD
deep features based on transfer learning model are more accurate in contrast with remaining
models. The superiority of the proposed transfer learning assisted deep feature-based model is
because it combined the low-level feature learning from the transfer learning part and then also
learns and fine-tune on an actual dataset. The comparative analysis with existing work is
presented in Table 5.

Table 5 The table shows the comparison between state-of-the-art techniques their respective accuracies

Author Techniques Dataset Targets Accuracy

Beheshti et al. [8, 23] Image Segmentation ADNI AD vs NC 84.07%


S. Wang et al. [9, 36] SVM OASIS 3D Replacement 93.05%
Muazzam et al. [26] Transfer Learning OASIS Multi-classification AD 92.85%
Altaf et al. [5] Bag of words ADNI AD vs NC 79.08%
Islam et al. [22] Deep learning OASIS AD 73.75%
Alkawabi et al. [4] Deep learning OASIS AD 74.93%
Farouk et al. [14] SVM ADNI Voxel-based morphometry 88%
Choi et al. [12] Deep learning ADNI MCI vs AD 84.2%
Sitara et al. [1] Transfer Learning OASIS AD 98.41%
Proposed Techniques
Method 1 Handcrafted Feature OASIS Multiclass Classification 84.93%
Method 2 Deep Features OASIS Multiclass Classification 99.21%
Method 3 Deep learning OASIS Multiclass Classification 92.85%
Multimedia Tools and Applications

5 Conclusions

The detection of Alzheimer’s disease in early stages using multiclass classification appears to
be a challenging task as it usually achieves ordinary results in case of AD stage detection. In
this study, we have proposed a real-time deep and transfer learning features and classification
approaches that efficiently identify the multiclass classification of Alzheimer’s disease. For
transfer learning assisted deep feature detection, we have used a pre-learned AlexNet network.
We optimized and modified these models to meet the requirement of our problem. Handcrafted
features are comprised of textural and statistical features. For deep features model and
handcrafted feature extraction model’s assessment, SVM, KNN, and RF are used as classifiers.
we achieved the best accuracy of 99.21% for a deep feature and 92.85% for deep learning
CNN. Results showed that apparently, studies based on transfer learning models performed
well in comparison to other methods. These models are pre-trained with large amounts of a
dataset which reflects in these networks results as they have achieved the highest accuracies.
Our proposed models showed quite promising results. As they have produced high accuracy
for multiclass classification for early detection of Alzheimer’s disease.

Acknowledgments This work was supported by the National Research Foundation of Korea (NRF) grant
funded by the Korea government (MSIT) (NRF-2019R1F1A1060668).

Compliance with ethical standards

Conflict of interest The authors declare no competing interest.

References

1. Afzal S, Maqsood M, Nazir F, Khan U, Aadil F, Awan KM, Mehmood I, Song OY (2019) A data
augmentation-based framework to handle class imbalance problem for Alzheimer’s stage detection. IEEE
Access 7:115528–115539
2. Ahmed OB et al (2015) Classification of Alzheimer’s disease subjects from MRI using hippocampal visual
features. Multimed Tools Appl 74(4):1249–1266
3. Ahmed OB et al (2015) Alzheimer's disease diagnosis on structural MR images using circular harmonic
functions descriptors on hippocampus and posterior cingulate cortex. Comput Med Imaging Graph 44:13–25
4. Alkabawi, E.M., A.R. Hilal, and O.A. Basir 2017. Computer-aided classification of multi-types of dementia
via convolutional neural networks. In 2017 IEEE International Symposium on Medical Measurements and
Applications (MeMeA). IEEE.
5. Altaf, T., et al. Multi-class Alzheimer disease classification using hybrid features. in IEEE Future
Technologies Conference. 2017.
6. Altaf T, Anwar SM, Gul N, Majeed MN, Majid M (2018) Multi-class Alzheimer's disease classification
using image and clinical features. Biomed. Signal Process. Control 43:64–74
7. Beheshti I, Demirel H (2016) And a.s.D.N. initiative, Feature-ranking-based Alzheimer’s disease classifi-
cation from structural MRI. Magn Reson Imaging 34(3):252–263
8. Beheshti I, Demirel H, Matsuda H, Alzheimer's Disease Neuroimaging Initiative (2017) Classification of
Alzheimer's disease and prediction of mild cognitive impairment-to-Alzheimer's conversion from structural
magnetic resource imaging using feature ranking and a genetic algorithm. Comput Biol Med 83:109–119
9. Belleville S et al (2014) Detecting early preclinical Alzheimer's disease via cognition, neuropsychiatry, and
neuroimaging: qualitative review and recommendations for testing. J Alzheimers Dis 42(s4):S375–S382
10. Chincarini A, Bosco P, Calvini P, Gemme G, Esposito M, Olivieri C, Rei L, Squarcia S, Rodriguez G, Bellotti
R, Cerello P, de Mitri I, Retico A, Nobili F, Alzheimer's Disease Neuroimaging Initiative (2011) Local MRI
analysis approach in the diagnosis of early and prodromal Alzheimer's disease. Neuroimage 58(2):469–480
11. Chitradevi D, Prabha S (2020) Analysis of brain sub regions using optimization techniques and deep
learning method in Alzheimer disease. Appl Soft Comput 86:105857
Multimedia Tools and Applications

12. Choi H, Jin KH, A.s.D.N. Initiative (2018) Predicting cognitive decline with deep learning of brain
metabolism and amyloid imaging. Behav Brain Res 344:103–109
13. Deng, J., et al. 2009 Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on
computer vision and pattern recognition. Ieee.
14. Farouk, Y., S. Rady, and H. Faheem 2018. Statistical features and voxel-based morphometry for alzheimer's
disease classification. In 2018 9th International Conference on Information and Communication Systems
(ICICS). IEEE.
15. Frid-Adar M, Diamant I, Klang E, Amitai M, Goldberger J, Greenspan H (2018) GAN-based synthetic
medical image augmentation for increased CNN performance in liver lesion classification. Neurocomputing
321:321–331
16. Fung G, Stoeckel J (2007) SVM feature selection for classification of SPECT images of Alzheimer's disease
using spatial information. Knowl Inf Syst 11(2):243–258
17. Gao XW, Hui R, Tian Z (2017) Classification of CT brain images based on deep learning networks.
Comput Methods Prog Biomed 138:49–56
18. Guerrero R, Wolz R, Rao AW, Rueckert D (2014) Manifold population modeling as a neuro-imaging
biomarker: application to ADNI and ADNI-GO. NeuroImage 94:275–286
19. Guyon, I., et al. 2008, Feature extraction: foundations and applications. Vol. 207: Springer.
20. Hao X, Bao Y, Guo Y, Yu M, Zhang D, Risacher SL, Saykin AJ, Yao X, Shen L, Alzheimer's Disease
Neuroimaging Initiative (2020) Multi-modal neuroimaging feature selection with consistent metric con-
straint for diagnosis of Alzheimer's disease. Med Image Anal 60:101625
21. Haralick RM, Shanmugam K, Dinstein IH (1973) Textural features for image classification. IEEE Trans
Syst Man Cybern 6:610–621
22. Islam, J. and Y. Zhang 2017. A novel deep learning based multi-class classification method for Alzheimer’s
disease detection using brain MRI data. In International Conference on Brain Informatics. Springer.
23. Klöppel S et al (2008) Automatic classification of MR scans in Alzheimer's disease. Brain 131(3):681–689
24. Lao Z, Shen D, Xue Z, Karacali B, Resnick SM, Davatzikos C (2004) Morphological classification of brains
via high-dimensional shape transformations and machine learning methods. Neuroimage 21(1):46–57
25. Liu, Y., et al. 2004.Discriminative MR image feature analysis for automatic schizophrenia and Alzheimer’s
disease classification. In International conference on medical image computing and computer-assisted
intervention. Springer.
26. Maqsood M, Nazir F, Khan U, Aadil F, Jamal H, Mehmood I, Song OY (2019) Transfer learning assisted
classification and detection of Alzheimer’s disease stages using 3D MRI scans. Sensors 19(11):2645
27. Mishra S, Majhi B, Sa PK, Sharma L (2017) Gray level co-occurrence matrix and random forest based acute
lymphoblastic leukemia detection. Biomed Signal Process Control 33:272–280
28. Nanni L, Salvatore C, Cerasa A, Castiglioni I (2016) Combining multiple approaches for the early diagnosis
of Alzheimer's disease. Pattern Recogn Lett 84:259–266
29. Noothout, J.M., et al. 2018, CNN-based Landmark Detection in Cardiac CTA Scans. arXiv preprint arXiv:
1804.04963,.
30. Park C, Ha J, Park S (2020) Prediction of Alzheimer's disease based on deep neural network by integrating
gene expression and DNA methylation dataset. Expert Syst Appl 140:112873
31. Plocharski M, Østergaard LR, A.s.D.N. Initiative (2016) Extraction of sulcal medial surface and classifi-
cation of Alzheimer's disease using sulcal features. Comput Methods Prog Biomed 133:35–44
32. Ramaniharan AK, Manoharan SC, Swaminathan R (2016) Laplace Beltrami eigen value based classification of
normal and Alzheimer MR images using parametric and non-parametric classifiers. Expert Syst Appl 59:208–216
33. Sarraf S, Tofighi G (2016) DeepAD: Alzheimer’s disease classification via deep convolutional neural
networks using MRI and fMRI. BioRxiv:070441
34. Shi, Y.Q., H.-J. Kim, and F. Perez-Gonzalez 2012, Digital Forensics and Watermarking: 10th International
Workshop, IWDW 2011, Atlantic City, NJ, USA, Oct. 23–26, 2011, Revised Selected Papers. Vol. 7128: Springer.
35. Shikalgar A, Sonavane S (2020) Hybrid Deep Learning Approach for Classifying Alzheimer Disease Based
on Multimodal Data. In: Computing in Engineering and Technology. Springer, pp 511–520
36. Wang S, Zhang Y, Liu G, Phillips P, Yuan TF (2016) Detection of Alzheimer’s disease by three-
dimensional displacement field estimation in structural magnetic resonance imaging. J Alzheimers Dis
50(1):233–248
37. Westman E, Cavallin L, Muehlboeck JS, Zhang Y, Mecocci P, Vellas B, Tsolaki M, Kłoszewska I,
Soininen H, Spenger C, Lovestone S, Simmons A, Wahlund LO, for the AddNeuroMed consortium
(2011) Sensitivity and specificity of medial temporal lobe visual ratings and multivariate regional MRI
classification in Alzheimer's disease. PLoS One 6(7):e22506

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.

You might also like