You are on page 1of 13

Engineering Applications of Artificial Intelligence 123 (2023) 106254

Contents lists available at ScienceDirect

Engineering Applications of Artificial Intelligence


journal homepage: www.elsevier.com/locate/engappai

The promise of convolutional neural networks for the early diagnosis of the
Alzheimer’s disease
Pakize Erdogmus ∗, Abdullah Talha Kabakus
Department of Computer Engineering, Faculty of Engineering, Duzce University, Duzce, Turkiye

ARTICLE INFO ABSTRACT


Keywords: Alzheimer’s Disease (AD) is one of the most devastating neurologic disorders, if not the most, as there is no
Deep neural network cure for this disease, and its symptoms eventually become severe enough to interfere with daily tasks. The
Convolutional neural network early diagnosis of AD, which might be up to 8 years before the onset of dementia symptoms, comes with many
Computer-aided diagnosis
promises. To this end, we propose a novel Convolutional Neural Network (CNN) as a cheap, fast, yet accurate
Computer vision
solution. First, a gold-standard dataset, namely DARWIN, that was proposed for the detection of AD through
Alzheimer’s disease
handwriting and consisted of 1𝐷 features, was used to generate the 2𝐷 features, which were yielded into the
proposed novel model. Then, the proposed novel model was trained and evaluated on this dataset. According
to the experimental result, the proposed novel model obtained an accuracy as high as 90.4%, which was
higher than the accuracies obtained by the state-of-the-art baselines, which covered a total of 17 widely-used
classifiers.

1. Introduction types of neural networks used to drive insights from massive data sets
(Goodfellow et al., 2016). This type of neural network is called ‘‘deep’’
Alzheimer’s Disease (AD) is a progressive neurologic disorder, which since being deep/having multiple hidden layers where the output of a
means that there is no cure for this disease as the symptoms develop lower layer is the input of a higher layer. As a result of this architecture,
gradually over many years and eventually become severe enough to low-level features are transformed into high-level abstract features and
interfere with daily tasks as it affects memory, thinking, and behavior
the model eventually learns the complex relationship between the input
(Singhal et al., 2012). According to a recent study (Baek et al., 2022),
and the output. Because of this characteristic, deep neural networks
AD (as well as other dementia) affects 50 million people worldwide.
It has been reported that the global number of people with dementia are known to be better than traditional machine learning techniques
has increased to 43.8 million in 2016, an increase of 117% compared in feature representation (Du et al., 2016a). From this point of view,
with 20.3 million in 1990 (Nichols et al., 2019). As a result of the we employed deep learning for the early, yet accurate diagnosis of
worldwide lifespan lengthening, it is expected that the incidence of Alzheimer’s Disease which is the main motivation for this study. Specif-
neurodegenerative disorders will dramatically increase in the coming ically, we propose a novel deep neural network that was trained on
decades (Cilia et al., 2022b). It is estimated that there will be 152 a gold-standard labeled dataset, which was proposed for the detection
million people with AD and other dementias by 2050 (Nichols et al., of Alzheimer’s Disease. The proposed model was intentionally designed
2022). The worldwide trend of AD is visualized in Fig. 1. to be lightweight in order to infer a classification result for a given
The progress of AD can be more accurately described per the 7-
input in the shortest time even when it is deployed on devices, which
stage model (Reisberg et al., 1982), whose stages are listed in Table 1.
are limited in terms of processing power (e.g., mobile devices). To
The early diagnosis of AD comes with many promises (Rasmussen and
benchmark the classification performance of the proposed novel model,
Langerman, 2019) as the patient is still functionally independent (Stage
2 or 3 of the 7-stage model) and free of dementia (Rasmussen and we employed a wide range of state-of-the-art classifiers covering both
Langerman, 2019). This ‘‘earliness’’ might be up to 8 years before the traditional machine learning algorithms and deep neural networks. The
onset of dementia symptoms (Saxton et al., 2004). main contributions of this study can be summarized as follows:
To the end of the promise of early diagnosis of Alzheimer’s Disease,
we propose a novel model based on deep learning, which is a subset • A novel, yet highly accurate model. We propose a novel model, a
of machine learning that tries to mimic the human brain, to provide Convolutional Neural Network (CNN), that is lightweight, fast, yet
a cheap, fast, yet accurate solution. Deep learning is a set of various accurate as it outperformed the state-of-the-art baselines.

∗ Corresponding author.
E-mail address: pakizeerdogmus@duzce.edu.tr (P. Erdogmus).

https://doi.org/10.1016/j.engappai.2023.106254
Received 15 December 2022; Received in revised form 28 March 2023; Accepted 30 March 2023
Available online 11 April 2023
0952-1976/© 2023 Elsevier Ltd. All rights reserved.
P. Erdogmus and A.T. Kabakus Engineering Applications of Artificial Intelligence 123 (2023) 106254

Loconsele et al. proposed a model-free technique for the classification


of Parkinson’s Disease (PD). Selected static and dynamic features from
the handwritten analysis have been used for the classification. They
presented five different ANN models optimized with GA. The number of
hidden layers is 2–3. The number of selected features and classification
accuracy are 8–75.75%, 9–82.32%, and 44–86.15% (Loconsole et al.,
2019).
Gupta et al. proposed a sex-specific and age-dependent Support Vec-
tor Machine (SVM ) classifier. An accuracy of 83.75% with the female-
specific classifier and an accuracy of 79.55% with the old-age depen-
dent classifier have been observed in comparison to an accuracy of
75.76% with the generalized classifier (Gupta et al., 2020).
Diaz et al. proposed a novel classification model based on 1𝐷 convo-
lution and bidirectional GRUs (Bi-GRUs). The proposed method outper-
formed the state-of-the-art approaches on the PaHaW and NewHandPD
datasets with 93.75% and 94.44% accuracy for spiral, respectively
(Diaz et al., 2021).
Pereira et al. transformed 1D signals provided by smart pen to
Fig. 1. A visualization of the worldwide trend of AD. The number of AD patients is 2𝐷 images, to classify PD with CNNs. The outputs of six different
estimated to reach 152 million by 2050. CNN classifiers were combined over the individual exam in a major-
ity voting-based schema to produce the results. The proposed model
Table 1 obtained an accuracy of nearly 93.50%, which outperforms previous
The 7-stage model of the progress of AD (Reisberg et al., 1982). works stated in the paper (Pereira et al., 2018).
Stage Level of impairment Taleb et al. also transformed 1𝐷 time series to 2𝐷 images for CNN
1 No impairment classification. They proposed an automatic classification system for
2 Very mild cognitive decline PD detection based on online handwriting. Two deep-learning models,
3 Mild cognitive decline
CNN and CNN-BLSTM, have been proposed. They used each task for
4 Moderate cognitive decline (early-stage dementia)
5 Moderately severe cognitive decline (early mid-stage dementia) input for CNN and classify for PD. Since there are seven tasks, fourteen
6 Severe cognitive decline (late mid-stage dementia) outputs have been used as two Multilayer Perceptrons’ (MLP) inputs.
7 Very severe cognitive decline (late-stage dementia) The four outputs were used as an MLP input. They have evaluated the
models both with augmentation and no augmentation. The accuracy
improves from 83.33% (no data augmentation) to 97.62% (Taleb et al.,
2020).
• A lightweight model. The proposed model was able to classify a
As it is stated in the article, the studies published in the literature
given test image in an average of 2 ms, which was faster by a
are generally Deep Learning techniques for AD diagnosis from MRI
large margin than even the lightweight state-of-the-art baselines.
as input signals. Cilia et al. classified AD using three sets of features,
• 1𝐷 to 2𝐷 feature conversion. We have converted the 1-dimensional
handcrafted features and CNN features extracted color and binary
(1𝐷) feature vectors into 2-dimensional (2𝐷) data to make them
images drawn by participants. Two sets of features taken from RGB
ready to be yielded into the proposed novel 2𝐷 CNN model as and binary images were extracted by VGG19, ResNet50, InceptionV3,
well as the pre-trained state-of-the-art deep neural networks. This and InceptionResNetV2. To verify the importance of dynamic features
conversion also demonstrates a practical example of how 1𝐷 taken from the color and binary images, they classify these three types
features can be converted to 2𝐷 features. of features with the same machine learning classification algorithms.
• An extensive set of baselines. We employed a wide range of state- They employed k-Nearest Neighbors (kNN), MLP, Random Forest, and
of-the-art classifiers, a total of 17 classifiers, as the baselines to SVM. According to the experimental results, CNN features were found
benchmark the efficiency of the proposed novel model. to be more discriminative than the handcrafted one (Cilia et al., 2021a).
• Extensive hyperparameter optimization task. We employed an ex- In a recent study published by the same author, the sets of features
tensive hyperparameter optimization task in an automated way, are enlarged. Instead of using the binary images as input, they used
which covered a total of 470 trials, to reveal the best value for multi-channel TIFF images and added extra drawing tasks. They also
each hyperparameter of the proposed model. Consequently, we used transfer learning and presented that the classification accuracy
ensured that we concluded with the most optimum model in terms with the features extracted from colored images outperforms the other
of obtaining the best classification accuracy. features (Cilia et al., 2022a).
In a recent study, Carfora et al. used Archimedes Spiral images
The rest of the paper is structured as follows: Section 2 overviews the embedding dynamic parameters such as pen pressure, pen altitude, pen
related work. Section 3 describes the material and method used for velocity, and pen acceleration into the images. Feature extraction at
this study. Section 4 presents the experimental results and discussion. different layers of AlexNet was used for the classification of AD with
Finally, Section 5 concludes the paper by summarizing the findings with SVM. They compared raw images to hybrid images. The classification
directions for future work. performance has increased by 7%, in accuracy (Carfora et al., 2022).
Cilia et al. presented a feature analysis performed on the data
2. Related work acquired using the protocol proposed by the same researchers (Cilia
et al., 2021b).
Kinematic analysis of handwriting reveals the pathologies in the Meng et al. used Archimedes spiral and labyrinth lattice and ap-
handwriting process, with the characterization of handwriting move- plied 2𝐷 discrete Fourier transform, detection corner and calculate a
ments (Accardo et al., 2007). One of the important abilities that are gray co-occurrence matrix for corresponding handwriting images. They
affected by neurodegenerative diseases (NDs) is handwriting (de Ste- employed the Decision Tree algorithm with 5-fold cross-validation and
fano et al., 2019). For this reason, handwriting analysis for supporting obtained a mean AUC of 0.94 (Meng et al., 2022).
the diagnosis of Alzheimer’s and Parkinson’s disease has been studied Stefano et al. presented a literature review of handwriting analysis
by a lot of researchers. for the diagnosis of Alzheimer’s and Parkinson’s disease as well as of

2
P. Erdogmus and A.T. Kabakus Engineering Applications of Artificial Intelligence 123 (2023) 106254

Table 2
A comparison of the related work.
Related work Employed method Accuracy (%) Contributions Limitations Dataset Name/Size
(s)
Loconsole et al. ANN (PD) 86.15 Model-free Handcrafted static –/11PD
(2019) technique and dynamic 7 HC
features
Gupta et al. (2020) SVM (PD) 83.75 Sex-specific and Class imbalance PaHaW /
age-dependent problem 37 PD
38 HC
Diaz et al. (2021) Bi-GRU (PD) 94.44 1𝐷 convolution Small size of dataset PaHaW /
37 PD
38 HC
Pereira et al. (2018) CNN (PD) 93.5 Deep Feature size HandPD/
learning-oriented 59M + 15F PD
approach 6M + 12F HC
Taleb et al. (2020) CNN + BiLSTM + 97.62 CNN-BiLSTM Small size of dataset HandPD MultiMC/
MLP (PD) performance 21 PD
21 HCl
Cilia et al. (2021a) CNN (AD) 74.65 Largest publicly Low accuracy, and –/90 AD
available dataset usage of 90 HC
handcrafted features
Carfora et al. (2022) AlexNet + SVM 81.5 Spiral images, Low accuracy and –/30 AD
(AD) discriminating some pre-trained 45 HC
dynamic and static models were not
features employed
Meng et al. (2022) Decision Tree (AD) 94 (AUC) Archimedes spirals Small size of dataset Not stated in the
and labyrinth lattice article
Mwamsojo et al. CNN, RNN, BiLSTM 83, 85, 88 Energy efficiency Low accuracy HW /
(2022) (AD) 27 AD
27 HC
Cilia et al. (2022b) Random Forest 85.29 Novel large dataset No comparison was DARWIN /
provided 89 AD
85 HC

mild cognitive impairments (MCI) (de Stefano et al., 2019). Because


NDs affect the movement of the patients, the analysis of handwriting
dynamics can help to support the early diagnosis of diseases. With this
aim, Cilia et al., propose a protocol specifically designed for the early
detection of Alzheimer’s (Cilia et al., 2018). Cilia et al., also introduced
the DARWIN (Diagnosis AlzheimeR WIth haNdwriting ) dataset, which
contains 174 participants’ handwriting samples. Handwriting data are Fig. 2. The illustration of the process of converting 1𝐷 features into 2𝐷 images. First,
acquired during the execution of handwriting from the participants, the raw data were normalized to the 𝑢𝑖𝑛𝑡8 data type. Then, the normalized data were
converted to 2𝐷 RGB images.
affected by Alzheimer’s and the healthy control group (Cilia et al.,
2022b).
In one of the recent studies, Mwamsojo et al. proposed an RNN. They
also proposed Bidirectional Long Short-Term Memory (BiLSTM) and DARWIN dataset consisted of 1𝐷 features that were extracted from the
CNN methods for comparison. Besides accuracy, they also considered analysis of handwriting through a proposed protocol (Cilia et al., 2018),
energy costs to assess the accuracy–efficiency trade-off. RNN classifi- which comprised 25 tasks belonging to four categories as follows: (1)
cation accuracy of 85%, which is 3% worse than that of BiLSTM and graphic tasks that investigate participant’s ability in writing elementary
2% better than that of CNN (Mwamsojo et al., 2022). A comparison of traits, (2) copy tasks that investigate the participant’s ability in repeating
the related work in terms of employed method(s), obtained accuracy, complex graphic gestures, (3) memory tasks that investigate the changes
contributions, limitations, and the used dataset is given in Table 2. in writing previous memorized, and (4) dictation tasks that investigate
how handwriting varies when the working memory is used. These tasks
3. Material and method target different areas of the brain. The 1𝐷 features, that are available in
DARWIN, were converted to 2𝐷 features to prepare them to be ready
In this section, (𝑖) the dataset construction, (𝑖𝑖) the software stack of to be yielded into the proposed model as well as the state-of-the-art
the implemented end-to-end software, (𝑖𝑖𝑖) the proposed novel model, pre-trained DNNs through transfer learning. Another motivation for
(𝑖𝑣) baseline models, and (𝑣) the training of the proposed model are this conversion process was the success of the related works (Taleb
described in detail in the following subsections, respectively. et al., 2020; Pereira et al., 2018), which applied the same conversion
approach. The process of converting 1𝐷 features into 2𝐷 images is
3.1. Dataset construction illustrated in Fig. 2.
As a result of this feature dimension conversion process, a dataset,
We used a gold-standard dataset, namely DARWIN (Cilia et al., consisting of a total of 173 images in the shape of (75 × 75 × 3) was
2022b), which was proposed for the diagnosis of AD and contains constructed. Some samples from the constructed dataset are presented
data from 174 participants, 89 AD patients and 85 healthy people in Fig. 3. As the distribution of the constructed dataset is presented in
(a.k.a. control group), which is, to the best of our knowledge, the Fig. 4, it was a balanced dataset, consisting of 85 normal and 88 samples
largest publicly available dataset proposed for the detection of AD. The with AD. 70% and 30% of this dataset were split as the training set and

3
P. Erdogmus and A.T. Kabakus Engineering Applications of Artificial Intelligence 123 (2023) 106254

Table 4
The software stack of the implemented end-to-end software for this study.
Software Version
Operating system macOS Ventura 13.0.1
Programming language Python 3.10
TensorFlow 2.10
Keras 2.10
scikit-learn 1.1.3
sklvq 0.1.2
LightGBM 3.3.3
XGBoost 1.7.1
NumPy 1.23.5
pandas 1.5.2
Fig. 3. Some samples from the constructed dataset. matplotlib 3.6.2
seaborn 0.12.1

and eXtreme Gradient Boosting (XGBoost) algorithms, which were in-


cluded in the employed baselines of the proposed novel model as
described in the following subsections. Therefore, the implementations
of these algorithms were acquired from the third-party scikit-learning
vector quantization (sklvq) (van Veen, 2022), LightGBM (‘‘Welcome To
LightGBM’s Documentation’’, 2022), and XGBoost (‘‘XGBoost Documen-
tation’’, 2022) Python packages, respectively. NumPy (Harris et al.,
2020) was employed for the operations of large, multi-dimensional
arrays and matrices. pandas (The pandas development team, 2020)
was employed for data manipulation on numerical tables (a.k.a. data
frame). Both NumPy and pandas are also prerequisites for Keras. The
plots to visualize the experimental results of the conducted experi-
ments were implemented using gold-standard Python libraries, namely,
matplotlib (Hunter, 2007), and seaborn (Waskom, 2021). The software
stack of the implemented end-to-end software for this study is given in
Table 4.
Fig. 4. The distribution of the constructed dataset, which comprised 85 healthy
(normal) and 88 samples with AD.
3.3. Proposed novel model

Table 3 The proposed model, a novel Convolutional Neural Network, con-


The details of the dataset constructed for the proposed study.
sisted of 12 layers and started with an Input layer, that accepts (75 × 75)
Set Number of samples (Percentage) Shape of samples
RGB images, which are the 2𝐷 forms of the extracted features of the
Training 121 (70%) (75 × 75 × 3) handwriting samples collected from the subjects as the conversion pro-
Test 52 (30%) (75 × 75 × 3)
cess is described in detail in previous subsections. Then, a Convolutional
(denoted by Conv) layer with 16 filters which accepts the input and
performs convolution operations on it, was employed. As the result
the test set, respectively. The details of the dataset constructed for the of the employed hyperparameter optimization task, which covered
proposed study are given in Table 3. various widely-used activation functions, such as the Rectified Linear
Unit (ReLU) variants as well as softmax, and tanh, the Exponential Linear
3.2. Software stack Unit (eLU) (Clevert et al., 2016), a specific form of the ReLU (Agarap,
2018), was employed as the activation function of this Dense layer.
End-to-end software powered by the Python 3 programming lan- A Batch Normalization (denoted by Batch Norm) (Ioffe and Szegedy,
guage was designed and implemented for this study. During the imple- 2015) layer, which normalizes the output of each activation by the
mentation of this software, several state-of-the-art open-source Python mean and standard of deviation of the outputs calculated over the
libraries were employed in addition to the Python SDK. The pro- samples in the minibatch (Salimans and Kingma, 2016), followed the
posed novel model as well as the baseline models that were based on employed Conv layer. Then, a Max Pooling, which is a sub-sampling
deep neural networks were implemented using Keras (Chollet, 2015), procedure, was employed to reduce the input size of images by applying
which is a high-level API for the implementation of deep neural net- the maximum function over the input/pooling window (Chen et al.,
works on several deep learning backends such as TensorFlow (Abadi 2015; Christlein et al., 2019; Sun et al., 2017; Zheng et al., 2018),
et al., 2016), and Theano. Since being recommended (Chollet, 2015) which not only reduces the required (1) time to train the CNN model
by the developer of Keras, and being the default backend option of and (2) hardware to store available space to store tensors as a natural
Keras, TensorFlow, a state-of-the-art deep learning backend owned by consequence of the reduced size of the images but also improves the
Google, opted as the deep learning backend for the proposed model. performance (Shen et al., 2016). We have employed a Max Pooling
To employ widely-used traditional machine learning models as the layer with a pool size of (2 × 2) that reduced the width and height by
baselines for the proposed model and apply several operations on the a factor of two, and eventually discarded 75% of activations from the
dataset such as cross-validation, and splitting into subsets, a gold- previous layer. Following the Max Pooling, Dropout (Srivastava et al.,
standard Python library, namely, scikit-learn (Pedregosa et al., 2011) 2014) layer, which is a widely-used technique to prevent the well-
was used. It is worth mentioning that despite providing implemen- known ‘overfitting’ problem, which is one of the biggest challenges
tations for a wide range of traditional machine learning algorithms, of deep neural networks (Amin et al., 2019; Wang et al., 2019) as a
scikit-learn does not provide the implementations of the Global Learning result of the increased depth and complexity of deep neural networks
Vector Quantization (GLVQ), Light Gradient Boosting Machine (LGBM), (Liu et al., 2008; Mollahosseini et al., 2016), was employed with a

4
P. Erdogmus and A.T. Kabakus Engineering Applications of Artificial Intelligence 123 (2023) 106254

Fig. 5. An overview of the architecture of the proposed novel model, which comprised 12 sequential layers.

dropout rate of 0.3. Dropout randomly drops units (neurons) from the Conv layers of the proposed model. No padding was applied during the
neural network during training in order to prevent co-adapting. In employed convolution operations. An overview of the architecture of
order to prevent overfitting, several Dropout layers were employed in the proposed novel model is presented in Fig. 5.
various positions. In addition to this, Max Pooling layers also help to Hyperparameters are the parameters of a DNN model that affect the
control overfitting (‘‘CS231n Convolutional Neural Networks for Visual learning process and are set empirically (Chollet, 2017; Zhang et al.,
Recognition’’, 2020). Then, another Conv layer, but this time with 32 2019). For the optimization of the employed hyperparameters, as listed
filters followed. Another series of Batch Norm, Max Pooling with a pool in Table 5, a wide range of commonly used values for each hyperpa-
size of (2 × 2), and Dropout with a dropout rate of 0.3 were employed. rameter was evaluated to reveal the best hyperparameter combination.
Then, a Flattening (a.k.a. Flatten) layer was employed as a connector KerasTuner (O’Malley et al., 2019) is a scalable hyperparameter op-
between Conv and Fully Connected (a.k.a. Dense) layers, which are timization framework provided by the Keras ecosystem. KerasTuner
deeply connected neural network components, in order to reshape the with a widely-used optimization technique, namely, Hyperband (Li
input into a vector (Du et al., 2016b). A Dense layer with 256 units et al., 2018) was employed for the hyperparameter optimization task.
followed the Flatten layer. Similar to the employed Conv layers, the An extensive hyperparameter optimization task was conducted by the
eLU was employed as the activation function of this Dense layer. Then, employed KerasTuner, which covered a total of 470 trials. The accuracy
another Dropout layer, but this time with a dropout rate of 0.2 was obtained for the validation set was selected as the objective of the
employed. Finally, a Dense layer with 2 units and the sigmoid activation employed optimization task. Thanks to the employed hyperparameter
function was employed as the output layer of the proposed model, optimization task, (1) the dropout rate of the Conv layers, (2) the
which is responsible for the classification of the yielded image into two dropout rate of the final Dense layer, (3) the activation function of the
classes, namely, (1) 𝐴𝑙𝑧ℎ𝑒𝑖𝑚𝑒𝑟’𝑠 𝑑𝑖𝑠𝑒𝑎𝑠𝑒𝑑, and (2) 𝑛𝑜𝑟𝑚𝑎𝑙∕ℎ𝑒𝑎𝑙𝑡ℎ𝑦. The Conv and Dense layers (except for the last Dense layer, which was fixed
Glorot (a.k.a. Xavier) Uniform (Glorot and Bengio, 2010) initializer was to the sigmoid Han and Moraga, 1995 activation function since the han-
employed as the initializer of the kernel weights matrix of the Conv and dled problem is a binary classification problem), (4) the optimization
Dense layers of the proposed model. The bias vectors of the Conv and algorithm, which is responsible for updating the weight matrix and bias
Dense layers were initialized with zeros. As the result of the employed vector based on the output of the loss function, which was the Binary
hyperparameter optimization task, a filter (a.k.a. kernel) size of (3 × 3), Cross-Entropy for the proposed model since the handled problem is a
which indicates the shape of the convolution window, was used for the binary classification problem, (5) the filter size of the Conv layers, (6)
Conv layers of the proposed model. A stride of (1 × 1), which indicates the pooling type, (7) the kernel and bias regularization penalties, (8) the
1 step to be taken along the height and width, was employed for the learning rate of the employed optimization algorithm, (9) the batch size

5
P. Erdogmus and A.T. Kabakus Engineering Applications of Artificial Intelligence 123 (2023) 106254

Table 5
The list of the evaluated values for each hyperparameter during the employed hyperparameter optimization task. The obtained
best values are given in bold.
Hyperparameter Evaluated values
Dropout rate of Conv layers 0.2, 0.3, 0.4, 0.5, 0.6
Dropout rate of the final Dense layer 0.2, 0.3, 0.4, 0.5, 0.6
Activation function ReLU, eLU , PReLU, Leaky ReLU, tanh, softmax
Optimization algorithm Adam, RMSprop, SGD, Adadelta
Filter size of Conv layers (𝟑 × 𝟑), (5 × 5), (7 × 7), (9 × 9)
Pooling type Max Pooling , Average Pooling
Kernel regularization penalty 1 × 𝑒−5 , 1 × 𝑒−6 , 𝟏 × 𝒆−𝟕 , 1 × 𝑒−8
Bias regularization penalty 1 × 𝑒−5 , 1 × 𝑒−6 , 𝟏 × 𝒆−𝟕 , 1 × 𝑒−8
Learning rate 𝟏 × 𝒆−𝟑 , 5 × 𝑒−3 , 1 × 𝑒−4 , 1 × 𝑒−5 , 1 × 𝑒−6
Batch size 16, 32, 64, 128, 256
Number of folds (𝑘) 3, 4, 5, 6, 7, 8, 9, 10
Test set ratio 0.2, 0.3, 0.4

which is the number of training samples used to estimate the gradient Since these pre-trained models were originally designed for their
direction before the model’s internal parameters are updated, (10) the own classification tasks, the layers responsible for the classification
number of folds of the employed cross-validation technique, and (11) were excluded during the employed transfer learning process. Then, a
the ratio of the test set were determined and they are listed in bold Dense layer with 2 units and the sigmoid activation function that is re-
in Table 5. According to the result of the employed hyperparameter sponsible for the classification of the proposed model was sequentially
optimization task, which covered various widely-used optimization added to each pre-trained model. Unlike the layers of the pre-trained
algorithms such as Adaptive Moment Estimation (Adam), Root Mean models, this Dense layer was trainable. The weights of the layers of
Squared Propagation (RMSprop), and Stochastic Gradient Descent (SGD), the pre-trained models were transferred from the ImageNet dataset,
the Adadelta (Johny et al., 2018) was employed as the optimization which significantly reduced the number of trainable parameters. For
algorithm of the proposed model. Regularization is a technique that example, as it is given in Table 6, the original ResNet152V2 contains
consists of a collection of several strategies, which are designed to 60.4M trainable parameters. When ResNet152V2 is employed through
reduce test error (Goodfellow et al., 2016) as a result of improving the the employed transfer learning technique, the constructed model based
generalization of DNNs. L2 regularization (a.k.a. weight decay), which on ResNet152V2 consists of only 4.1K trainable parameters which are
is the sum of the weights squared multiplied by a hyperparameter 15.1K times less than the original model. From this point of view,
(Lane et al., 2019) and is the most common (Adrian Rosebrock, 2017) Table 7 lists a comparison of trainable parameters for the deep neural
regularization technique was employed with a penalty value of 1 × e−7 networks that were employed through transfer learning. The process of
to further prevent overfitting in addition to the employed Dropout and how the baseline models, which were based on the pre-trained models,
Max Pooling layers. were constructed is illustrated in Fig. 6.

3.5. Model training


3.4. Baseline models

The training of the proposed model was started with the Early
A wide range of traditional machine learning and deep neural
Stopping callback, which is responsible for the stoppage of the training
network models were employed as the baselines of the proposed novel
when the model stops learning. This callback is configured through
model. To this end, ten widely-used traditional machine learning mod-
two parameters as follows: (1) monitored criterion and (2) number
els, namely, (1) SVM, (2) Logistic Regression, (3) Naïve Bayes, (4) Random of epochs that the callback waits before stopping (which is a.k.a.
Forest, (5) Decision Tree, (6) kNN, (7) Linear Discriminant Analysis (LDA), patience). The obtained validation loss and 10 epochs were defined as
(8) GLVQ, (9) LGBM, and (10) XGBoost were employed. To ensure the monitored criterion and patience of the employed Early Stopping
standardization, the implementations of these algorithms available in callback, respectively. 30% of the whole dataset was split as the test
the gold-standard scikit-learn library were used. Therefore, the default set, which consisted of 52 samples. The remaining set was used for
values of the hyperparameters of these algorithms, which are prede- both training and validation purposes thanks to the employed Stratified
fined by the scikit-learn, were employed. In addition to these traditional k-Fold Cross-Validation technique, which is a special type of k-Fold Cross-
machine learning models, seven state-of-the-art pre-trained deep neural Validation (Mosteller and Tukey, 1968) that splits the whole set into 𝑘
networks, namely, (1) InceptionV3, (2) ResNet152V2, (3) MobileNetV2, number of folds while preserving the percentage of samples for each
(4) Xception, (5) InceptionResNetV2, (6) VGG19, and (7) DenseNet201 class. 𝑘 was set to 4 per the experimental result of the employed
were employed through the transfer learning technique, which is the hyperparameter optimization which means the training set, 70% of the
process of improving a learner from one domain by transferring infor- whole dataset, was split into 4 folds and the first fold was used as the
mation from a related domain (Weiss et al., 2016). This information validation set while the remaining 3 folds were used as the training set.
transfer is being handled by not training the layers of the pre-trained This process continued 4 times to use the whole set for both training
models, which are a.k.a. frozen layers. These pre-trained models were and validation purposes.
trained on the widely-used ImageNet (Deng et al., 2009) dataset, which
consists of over 15 million labeled images belonging to roughly 22,000 4. Experimental results and discussion
categories. Therefore, the weights of the layers that were calculated
for the ImageNet were used (in other words, the information was The hardware of the machine that was used to conduct experiments
transferred). For the same concern of the traditional machine learning is key for the proposed study as some experimental results are based
models, the implementations of these pre-trained models available in on elapsed time (duration), which solely depends on the equipped
Keras were used. Each pre-trained model intentionally opted to be the hardware. The hardware specs of the machine that the experiments
latest and most enhanced version of its own family. A comparison of the were carried out on are given in Table 8. It is worth mentioning that all
employed pre-trained models in terms of (𝑖) top-1 accuracy obtained on conducted experiments were repeated 10 times and the average of the
the ImageNet, (𝑖𝑖) depth, (𝑖𝑖𝑖) the number of parameters, and (𝑖𝑣) the obtained numerical values in these 10 trials was regarded as the final
minimum acceptable input shape is given in Table 6. value.

6
P. Erdogmus and A.T. Kabakus Engineering Applications of Artificial Intelligence 123 (2023) 106254

Table 6
A comparison of the proposed model with the employed pre-trained models.
Pre-trained model Top 1 accuracy Depth Number of parameters Minimum Input Shape
InceptionV3 77.9% 189 23.9M (75 × 75)
ResNet152V2 78.0% 307 60.4M (32 × 32)
MobileNetV2 71.3% 105 3.5M (32 × 32)
Xception 79.0% 81 22.9M (71 × 71)
InceptionResNetV2 80.3% 449 55.9M (75 × 75)
VGG19 71.3% 19 143.7M (32 × 32)
DenseNet201 77.3% 402 20.2M (32 × 32)
Proposed model 𝑵𝑨 12 2.4M (75 × 75)

Fig. 6. An illustration of the process of how the baseline models based on the pre-trained deep neural networks were constructed. While the weights calculated on the ImageNet
were employed for the frozen layers, the final Dense layer was trained.

Table 7
A comparison of the number of trainable parameters for the original version and
transfer learning applied version of each employed deep neural network.
Model Number of trainable Number of trainable parameters
parameters after transfer learning
InceptionV3 23.9M 4.1K
ResNet152V2 60.4M 4.1K
MobileNetV2 3.5M 2.6K
Xception 22.9M 4.1K
InceptionResNetV2 55.9M 3.1K
VGG19 143.7M 1K
DenseNet201 20.2M 3.8K

De facto standard evaluation metrics for the evaluation of the


classifiers, namely, 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦, 𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛, 𝑅𝑒𝑐𝑎𝑙𝑙 (a.k.a. 𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦), and
𝐹 1 − 𝑠𝑐𝑜𝑟𝑒 were employed to evaluate the classification performance
of the proposed model. Let 𝑃 , and 𝑁 denote positives, the samples
labeled with the target class, and negatives, the samples labeled with the
Fig. 7. The plot of obtained accuracy values for the training and validation sets during
opposite of the target class, respectively. For the proposed study, while the training of the proposed model.
𝑃 denotes the samples labeled with AD, 𝑁 denotes the normal samples.
𝑇 𝑃 , 𝑇 𝑁, 𝐹 𝑃 , and 𝐹 𝑁 denote correctly predicted samples labeled with
AD, correctly predicted normal samples, samples with AD incorrectly The number of epochs to train the proposed model was set to 250
predicted as normal, and normal samples incorrectly predicted as AD, but the training of the proposed model continued for 223 epochs until
respectively. 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 is the most widely-used evaluation metric that
the employed Early Stopping callback stopped it. The duration of the
is simply the ratio of the correctly predicted samples to all samples.
training of the proposed novel model was obtained as 39.6 s. The plot
𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 is the ratio of the correctly predicted positives to the total ob-
of the obtained accuracy values for the training and validation sets
servations predicted as positive. 𝑅𝑒𝑐𝑎𝑙𝑙 is the ratio of correctly predicted
during the training of the proposed model is presented in Fig. 7. As can
positives to all positives. 𝐹 1 − 𝑠𝑐𝑜𝑟𝑒 is the harmonic mean of the 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛
be seen in this figure, it is safe to say that overfitting is not the case
and 𝑟𝑒𝑐𝑎𝑙𝑙 and is more useful than 𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 for the imbalanced datasets.
The equations of the employed metrics are given in Eqs. (1)–(4). for the proposed model thanks to the employed techniques. All DNNs
within this study were trained under the same training configuration
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (𝑇 𝑃 + 𝑇 𝑁) ∕(𝑃 + 𝑁) (1) (e.g., optimization algorithm, learning rate, and loss function) with the
𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑇 𝑃 ∕(𝑇 𝑃 + 𝐹 𝑃 ) (2) proposed novel model for 250 epochs and the obtained accuracy plots
are presented in Fig. 8.
𝑅𝑒𝑐𝑎𝑙𝑙 = 𝑇 𝑃 ∕(𝑇 𝑃 + 𝐹 𝑁) (3)
A confusion matrix (a.k.a. error matrix) is a gold-standard table
𝐹 1 − 𝑠𝑐𝑜𝑟𝑒 = 2 × (𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑅𝑒𝑐𝑎𝑙𝑙) ∕(𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙) (4) that allows visualization of the predictions of the classifiers. We have

7
P. Erdogmus and A.T. Kabakus Engineering Applications of Artificial Intelligence 123 (2023) 106254

Table 8
The hardware specs of the machine that the experiments were carried out on.
Hardware Specs
CPU Apple M2 8-core (4 efficiency and 4 performance cores, 2.4 GHz base frequency, 3.5 GHz turbo frequency)
GPU Apple M2 10-core (500 MHz base frequency, 1456 MHz turbo frequency)
Neural engine Apple M2 16-core
Memory 16 GB 𝐿𝑃 𝐷𝐷𝑅5 Unified Memory (100 GB/s bandwidth)
Storage 1 TBSSD

Fig. 8. The plots of obtained accuracy values for the training and validation sets during the training of the employed DNN models.

Forest and LGBM provided the second-best 𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦, an 𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 of


88.46%. Following these classifiers, Cilia et al. (2022b) obtained an
accuracy of 85.29% when they employed Random Forest as the classifier
merging features from the best tasks. Regarding the other evaluation
metrics, a 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 of 92.0%, a 𝑟𝑒𝑐𝑎𝑙𝑙 of 90.4%, and an 𝐹 1 − 𝑠𝑐𝑜𝑟𝑒
of 90.4% were obtained when the proposed model was evaluated on
the test set. A scatter plot that visualizes the correlation between the
obtained accuracy on the test set and the depth of the deep neural
network is presented in Fig. 11. According to this experimental result,
it can be concluded that there is no linearity between the obtained
accuracy and the depth of the deep neural network. The comparison
of the employed models in terms of all evaluation metrics is given in
Table 9.
Inference time, the required time to classify a given input, is critical
for the employment of deep neural networks, especially when they
are employed in a real-time system. Regarding the inference time for
the employed DNNs, as the obtained experimental result is presented
in Fig. 12, the proposed novel model obtained the best performance
by a large margin as it was able to classify a given test image in an
Fig. 9. The visualization of the obtained confusion matrix for the evaluation of the
proposed model on the test set. average of 2 ms, which demonstrated that the proposed novel model
was almost 3.5 times faster than MobileNetV2, which is known to have
a lightweight architecture (Shahi et al., 2022). This experimental result
demonstrates the promise of the proposed novel model to be employed
employed a confusion matrix while evaluating the classification per-
on a lightweight or real-time system. On the other hand, DenseNet201
formance of the proposed model. Each row of the confusion matrix
was found to be the slowest model in terms of inference time which
represents the number of samples in the actual class, and each col-
makes it an unfavorable solution to be employed in a real-time system.
umn represents the number of samples in the predicted class. As the A scatter plot that visualizes the correlation between the inference time
visualization of the obtained confusion matrix for the evaluation of the and the depth of the deep neural network is presented in Fig. 13.
proposed model on the test set is presented in Fig. 9, only 5 samples (1 According to this experimental result, it can be concluded that the
normal and 4 samples with AD) were misclassified. shallower the deep neural network is, the lower the inference time is
As the plot of the obtained 𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 values for the employed models in general.
is presented in Fig. 10, the proposed novel model provided the best The training of deep neural networks is known to take a long
𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦, an 𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 as high as 90.4%. Following this, both Random time even on machines equipped with high-processing power (e.g., a

8
P. Erdogmus and A.T. Kabakus Engineering Applications of Artificial Intelligence 123 (2023) 106254

Fig. 10. The plot of the obtained accuracy values of the employed models when they were evaluated on the test set. The proposed model obtained the best accuracy, an accuracy
of 90.4%.

Fig. 11. The correlation between the obtained accuracy on the test set and the depth of the deep neural network. It can be concluded that there is no linearity between the
obtained accuracy and the depth of the deep neural network.

Fig. 12. The obtained inference time for each employed DNN. The proposed model obtained the lowest inference time, 2.0 ms.

9
P. Erdogmus and A.T. Kabakus Engineering Applications of Artificial Intelligence 123 (2023) 106254

Fig. 13. The correlation between the inference time and the depth of the deep neural network. It can be concluded that the shallower the deep neural network is, the lower the
inference time is in general.

Table 9 sum of each layer’s complexity. The computation complexity of a Conv.


The comparison of the employed models in terms of all evaluation metrics.
layer depends on (𝑖) the filter size (which is generally square), (𝑖𝑖) the
Model Accuracy (%) Precision (%) Recall (%) F1-Score (%) number of filters, and (𝑖𝑖𝑖) the input size. The number of operations of
InceptionV3 76.9 80.2 76.9 76.6 2D convolution for a pixel with 𝑘 × 𝑘 filter size is 𝑘 × 𝑘 + (𝑘 − 1). The
ResNet152V2 76.9 78.8 76.9 76.8
computational complexity of the proposed novel CNN model is listed
MobileNetV2 59.6 66.3 59.6 57.0
Xception 78.8 81.5 78.8 78.7
in Table 10.
InceptionResNetV2 69.2 69.5 69.2 69.2
VGG19 75.0 76.3 75.0 75.0 Known limitations
DenseNet201 69.2 70.0 69.2 69.2
Proposed model 90.4 92.0 90.4 90.4
DNNs tend to learn more patterns regarding the data when they are
SVM 82.7 75.9 91.7 83.0
Logistic Regression 86.5 81.5 91.7 86.3
trained on large datasets. As it was discussed in the related section,
Naïve Bayes 84.6 75.0 100.0 85.7 to the best of our knowledge, we used the largest publicly available
Random Forest 88.5 82.1 95.8 88.5 dataset that was proposed for the detection of AD. Even this largest
Decision Tree 76.9 71.4 83.3 76.9 dataset comprised a limited number of participants, a total of 174
kNN 76.9 100.0 50.0 66.7
participants, compared with the datasets that are used to train DNNs. As
LDA 73.1 69.2 75.0 72.0
GLVQ 75.0 73.9 70.8 72.3 discussed by Cilia et al. (2022b), most of the publicly available datasets,
LGBM 88.5 82.1 95.8 88.5 that contain handwriting samples, were proposed for the detection of
XGBoost 80.8 71.9 95.8 82.1 Parkinson’s Disease, another devastating neurologic disorder.

5. Conclusion
high-end GPU/TPU) and a large amount of memory as a result of (𝑖)
AD is one of the most devastating neurologic disorders, if not the
having a complex, deep architecture, (𝑖𝑖) consisting of a large number
most, as there is no cure for this disease, and its symptoms eventually
of parameters, and (𝑖𝑖𝑖) processing huge amount of data. Therefore,
become severe enough to interfere with daily tasks. According to a
we shed light on the elapsed time to train each employed deep neural
recent study (Baek et al., 2022), AD (as well as other dementia) affects
network as this experimental result is presented in Fig. 14. According
50 million people worldwide. As a result of the worldwide lifespan
to the experimental result, MobileNetV2 was found as the fastest model
lengthening, it is expected that the incidence of neurodegenerative dis-
by completing the training in 35 s. InceptionV3 and the proposed model
orders will dramatically increase in the coming decades. It is estimated
followed MobileNetV2 by completing the training in 35.4 and 39.6 s, that there will be 152 million people with AD and other dementias by
respectively. InceptionResNetV2 was found as the slowest model which 2050 (Nichols et al., 2022). Since the early diagnosis of AD comes with
is reasonable since it is the deepest model with a depth of 449 among many promises, it is critical to diagnose it as early as possible. Owing
the employed deep neural networks. A scatter plot that visualizes the to the available tools, this earliness might be up to 8 years before the
correlation between the training time and the depth of the deep neural onset of dementia symptoms. To this end, a novel CNN is proposed
network is presented in Fig. 15. According to this experimental result, in this study which provides a cheap, fast, yet accurate solution. First,
it can be concluded that the shallower the deep neural network is, the we used a gold-standard dataset, namely DARWIN, that was proposed
lower the training time is in general. for the detection of AD and contains data from 174 participants; 89
The number of weights determines the memory size required to AD patients, and 85 healthy people. To the best of our knowledge,
store the weights and the number of operations determines the compu- DARWIN is the largest publicly available dataset proposed for the
tational complexity of the convolution (Véstias, 2019). The complexity detection of AD. The DARWIN dataset consisted of 1𝐷 features, which
of an algorithm is expressed as a function of input, generally using were extracted from the analysis of handwriting. These 1𝐷 features
the 𝑂 notation. (𝑂(𝑁)). The computational complexity of a CNN is the were then converted into 2𝐷 features. The proposed novel model was

10
P. Erdogmus and A.T. Kabakus Engineering Applications of Artificial Intelligence 123 (2023) 106254

Fig. 14. The elapsed time to train the employed deep neural networks.

Fig. 15. The correlation between the training time and the depth of the deep neural network. It can be concluded that the shallower the deep neural network is, the lower the
training time is in general.

Table 10
The computation complexity of the proposed novel CNN model.
Layer Number of operation Number of parameters Layer output
Conv 1 75 × 75 × 3 × 3 × 3 × 16 3 × 3 × 3 × 16 + 16 73 × 73 × 16
Max Pooling 1 73 × 73 × 16 0 36 × 36 × 16
Dropout 1 36 × 36 × 16 0 36 × 36 × 16
Conv 2 36 × 36 × 16 × 3 × 3 × 32 3 × 3 × 16 × 32 + 32 34 × 34 × 32
Max Pooling 2 34 × 34 × 32 0 17 × 17 × 32
Dropout 2 17 × 17 × 32 0 17 × 17 × 32
Dense 1 17 × 17 × 32 × 256 17 × 17 × 32 × 256 256
Dense 2 256 × 2 256 × 2 2

trained and evaluated on this constructed dataset. According to the conversion technique on the accuracy of the proposed model. Also,
experimental result, the proposed model obtained an accuracy as high we would like to propose another model based on the Transformer
as 90.4%. To benchmark the efficiency of the proposed novel model, architecture, which has started to show its promise in computer vision
a total of 17 state-of-the-art traditional machine learning algorithms after natural language processing. Finally, even though, to the best of
and DNNs, were employed as the baselines of the proposed novel our knowledge, we used the largest publicly available dataset that was
model. The accuracy obtained by the proposed model, 90.4%, was proposed for the detection of AD, we would like to train the proposed
higher than the accuracies obtained by these baselines. The proposed novel model on a larger dataset to possibly further improve the learning
novel model’s inference time was obtained as an average of 2 ms, ability of the model.
which demonstrated that the proposed novel model was almost 3.5
times faster than MobileNetV2, a widely-used CNN that is well-known to CRediT authorship contribution statement
have a lightweight architecture (Shahi et al., 2022). This experimental
result proves the promise of the proposed model to be employed on a Pakize Erdogmus: Data curation, Methodology, Conceptualization,
lightweight or real-time system. Formal analysis, Investigation, Writing – original draft, Reviewing,
In future work, we would like to employ different techniques during Editing. Abdullah Talha Kabakus: Methodology, Investigation, Writ-
the 1𝐷 to 2𝐷 feature conversation to reveal the effect of the feature ing – original draft, Reviewing, Editing, Software.

11
P. Erdogmus and A.T. Kabakus Engineering Applications of Artificial Intelligence 123 (2023) 106254

Declaration of competing interest de Stefano, C., Fontanella, F., Impedovo, D., Pirlo, G., Scotto di Freca, A., 2019.
Handwriting analysis to support neurodegenerative diseases diagnosis: A review.
Pattern Recognit. Lett. 121, 37–45. http://dx.doi.org/10.1016/j.patrec.2018.05.
The authors declare that they have no known competing finan- 013.
cial interests or personal relationships that could have appeared to Deng, J., Dong, W., Socher, R., Li, L.-J., Li, Kai, Fei-Fei, Li, 2009. ImageNet: A large-
influence the work reported in this paper. scale hierarchical image database. In: Proceeding of the 2009 IEEE Conference on
Computer Vision and Pattern Recognition (CVPR 2009). IEEE, Miami, FL, USA, pp.
248–255. http://dx.doi.org/10.1109/cvpr.2009.5206848.
Data availability Diaz, M., Moetesum, M., Siddiqi, I., Vessio, G., 2021. Sequence-based dynamic
handwriting analysis for parkinson’s disease detection with one-dimensional con-
The data used in the study is publicly available UCI. volutions and BiGRUs. Expert Syst. Appl. 168, 1–12. http://dx.doi.org/10.1016/j.
eswa.2020.114405.
Du, X., Cai, Y., Wang, S., Zhang, L., 2016a. Overview of deep learning. In: Proceedings
References of the 31st Youth Academic Annual Conference of Chinese Association of Automa-
tion. YAC, IEEE, Wuhan, China, pp. 159–164. http://dx.doi.org/10.1109/YAC.2016.
Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., 7804882.
Irving, G., Isard, M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, D.G., Du, K., Deng, Y., Wang, R., Zhao, T., Li, N., 2016b. SAR ATR based on displacement-
Steiner, B., Tucker, P., Vasudevan, V., Warden, P., Wicke, M., Yu, Y., Zheng, X., and rotation-insensitive CNN. Remote Sens. Lett. 7, 895–904. http://dx.doi.org/10.
2016. TensorFlow: A system for large-scale machine learning. In: Proceedings of 1080/2150704X.2016.1196837.
the 12th USENIX Symposium on Operating Systems Design and Implementation Glorot, X., Bengio, Y., 2010. Understanding the difficulty of training deep feedforward
(OSDI 2016). Savannah, GA, USA, pp. 265–283. neural networks. In: Proceedings of the Thirteenth International Conference on
Accardo, A., Chiap, A., Borean, M., Bravar, L., Zoia, S., Carrozzi, M., Scabar, A., 2007. Artificial Intelligence and Statistics (PMLR 9). Sardinia, Italy, pp. 249–256.
A device for quantitative kinematic analysis of children’s handwriting movements. Goodfellow, I., Bengio, Y., Courville, A., 2016. Deep Learning. The MIT Press,
In: Proceedings of the 11th Mediterranean Conference on Medical and Biological Cambridge, Massachusetts.
Engineering and Computing 2007 (MEDICON 2007). Springer, Ljubljana, Slovenia, Gupta, U., Bansal, H., Joshi, D., 2020. An improved sex-specific and age-dependent
pp. 445–448. http://dx.doi.org/10.1007/978-3-540-73044-6_114. classification model for Parkinson’s diagnosis using handwriting measurement.
Adrian Rosebrock, 2017. Deep Learning for Computer Vision with Python. Comput. Methods Progr. Biomed. 189, 1–10. http://dx.doi.org/10.1016/j.cmpb.
Pyimagesearch. PyImageSearch. 2019.105305.
Han, J., Moraga, C., 1995. The influence of the sigmoid function parameters on the
Agarap, A.F., 2018. Deep learning using rectified linear units (ReLU). pp. 1–7, arXiv:
speed of backpropagation learning. In: Proceedings of the International Workshop
1803.08375.
on Artificial Neural Networks: From Natural To Artificial Neural Computation
Amin, M., Shah, B., Sharif, A., Ali, T., Kim, K.l.L., Anwar, S., 2019. Android malware
(IWANN ’95). Malaga-Torremolinos, Spain, pp. 195–201. http://dx.doi.org/10.
detection through generative adversarial networks. Trans. Emerg. Telecommun.
1007/3-540-59497-3_175.
Technol. E 3675, 1–29. http://dx.doi.org/10.1002/ett.3675.
Harris, C.R., Millman, K.J., van der Walt, S.J., Gommers, R., Virtanen, P., Cour-
Baek, M.S., Kim, H.K., Han, K., Kwon, H.S., Na, H.K., Lyoo, C.H., Cho, H., 2022. Annual
napeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N.J., Kern, R., Picus, M.,
trends in the incidence and prevalence of Alzheimer’s Disease in South Korea: A
Hoyer, S., van Kerkwijk, M.H., Brett, M., Haldane, A., del Río, J.F., Wiebe, M.,
nationwide cohort study. Front. Neurol. 13, 1–8. http://dx.doi.org/10.3389/fneur.
Peterson, P., Gérard-Marchant, P., Sheppard, K., Reddy, T., Weckesser, W., Ab-
2022.883549.
basi, H., Gohlke, C., Oliphant, T.E., 2020. Array programming with NumPy. Nature
Carfora, D., Kim, S., Houmani, N., Garcia-Salicetti, S., Rigaud, A.S., 2022. On extract-
585, 357–362. http://dx.doi.org/10.1038/s41586-020-2649-2.
ing digitized spiral dynamics’ representations: A study on transfer learning for
Hunter, J.D., 2007. Matplotlib: A 2D graphics environment. Comput. Sci. Eng. 9, 90–95.
early Alzheimer’s Detection. Bioengineering 9, 1–14. http://dx.doi.org/10.3390/
http://dx.doi.org/10.1109/MCSE.2007.55.
bioengineering9080375.
Ioffe, S., Szegedy, C., 2015. Batch normalization: Accelerating deep network training
Chen, Y., Xu, L., Liu, K., Zeng, D., Zhao, J., 2015. Event extraction via dynamic
by reducing internal covariate shift. In: Proceedings of the 32nd International
multi-pooling convolutional neural networks. In: Proceedings of the Conference
Conference on Machine Learning (ICML 2015). Lille, France, pp. 448–456.
53rd Annual Meeting of the Association for Computational Linguistics and the
Johny, D.C., Assistant, A.J.S., Zhang, R., Xiao, X., Rashid, N., Iqbal, J., Mahmood, F.,
7th International Joint Conference on Natural Language Processing of the Asian
Abid, A., Khan, U.S., Tiwana, M.I., Fan, Z., Wen, C., Tao, L., Xiaochun, C.,
Federation of Natural Language Processing (ACL-IJCNLP 2015). ACL, Beijing, China,
Haipeng, P., Yang, C., Jia, L., Chen, B.Q., Wen, H.Y., Zeiler, M.D., Hunt, J.E.,
pp. 167–176.
Cooke, D.E., Forrest, S., Perelson, A.S., Allen, L., Cherukuri, R., 2018. ADADELTA:
Chollet, F., 2015. Keras: the python deep learning API [www document]. URL https:
An adaptive learning rate method. IEEE Access 7.
//keras.io (accessed 12.2.22).
Lane, H., Howard, C., Hapke, M.H., 2019. Natural Language Processing in Action.
Chollet, F., 2017. Deep Learning with Python. Manning Publications. Manning. Manning.
Christlein, V., Spranger, L., Seuret, M., Nicolaou, A., Král, P., Maier, A., 2019. Deep Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., Talwalkar, A., 2018. Hyperband: A
generalized max pooling. In: Proceedings of the 15th International Conference on novel bandit-based approach to hyperparameter optimization. J. Mach. Learn. Res.
Document Analysis and Recognition (ICDAR 2019). Sydney, Australia, pp. 1–7. 18, 6765–6816.
Cilia, N.D., D’Alessandro, T., de Stefano, C., Fontanella, F., 2022a. Deep transfer Liu, Y., Starzyk, J.A., Zhu, Z., 2008. Optimized approximation algorithm in neural
learning algorithms applied to synthetic drawing images as a tool for supporting networks without overfitting. IEEE Trans. Neural Netw. 19, 983–995. http://dx.
Alzheimer’s disease prediction. Mach. Vis. Appl. 33, 1–17. http://dx.doi.org/10. doi.org/10.1109/TNN.2007.915114.
1007/s00138-022-01297-8. Loconsole, C., Cascarano, G.D., Brunetti, A., Trotta, G.F., Losavio, G., Bevilacqua, V.,
Cilia, Nicole D., D’Alessandro, T., de Stefano, C., Fontanella, F., Molinara, M., 2021a. di Sciascio, E., 2019. A model-free technique based on computer vision and sEMG
From online handwriting to synthetic images for Alzheimer’s Disease detection for classification in parkinson’s disease by using computer-assisted handwriting
using a deep transfer learning approach. IEEE J. Biomed. Health Inform. 25, analysis. Pattern Recognit. Lett. 121, 28–36. http://dx.doi.org/10.1016/j.patrec.
4243–4254. http://dx.doi.org/10.1109/JBHI.2021.3101982. 2018.04.006.
Cilia, N.D., Gregorio, G.de., Stefano, C.de., Fontanella, F., Marcelli, A., Parziale, A., Meng, J., Huo, X., Zhao, H., Zhang, L., Wang, X., Wang, Y., 2022. Image-based
2022b. Diagnosing Alzheimer’s disease from on-line handwriting: A novel dataset handwriting analysis for disease diagnosis. In: Proceedings of the 41st Chinese
and performance benchmarking. Eng. Appl. Artif. Intell. 111, 1–12. http://dx.doi. Control Conference (CCC 2022). IEEE, Hefei, China, pp. 4058–4062. http://dx.doi.
org/10.1016/j.engappai.2022.104822. org/10.23919/CCC55666.2022.9902136.
Cilia, N.D., Stefano, C.de., Fontanella, F., di Freca, A.S., 2018. An experimental protocol Mollahosseini, A., Chan, D., Mahoor, M.H., 2016. Going deeper in facial expression
to support cognitive impairment diagnosis by using handwriting analysis. Procedia recognition using deep neural networks. In: 2016 IEEE Winter Conference on
Comput. Sci. 141, 466–471. http://dx.doi.org/10.1016/j.procs.2018.10.141. Applications of Computer Vision (WACV 2016). IEEE, Lake Placid, NY, USA, pp.
Cilia, Nicole Dalia, de Stefano, C., Fontanella, F., di Freca, A.S., 2021b. Feature selection 1–10. http://dx.doi.org/10.1109/WACV.2016.7477450.
as a tool to support the diagnosis of cognitive impairments through handwriting Mosteller, F., Tukey, J., 1968. Data analysis, including statistics. In: Lindzey, G.,
analysis. IEEE Access 9, 78226–78240. http://dx.doi.org/10.1109/ACCESS.2021. Aronson, E. (Eds.), HandBook of Social Psychology. Addison Wesley.
3083176. Mwamsojo, N., Lehmann, F., El-Yacoubi, M.A., Merghem, K., Frignac, Y.,
Clevert, D.A., Unterthiner, T., Hochreiter, S., 2016. Fast and accurate deep network Benkelfat, B.E., Rigaud, A.S., 2022. Reservoir computing for early stage Alzheimer’s
learning by exponential linear units (ELUs). In: Proceedings of the 4th International Disease detection. IEEE Access 10, 59821–59831. http://dx.doi.org/10.1109/
Conference on Learning Representations (ICLR 2016). San Juan, Puerto Rico, pp. ACCESS.2022.3180045.
1–14. Nichols, E., Steinmetz, J.D., Vollset, S.E., Fukutaki, K., Chalek, J., Abd-Allah, F., Ab-
2020. CS231n Convolutional Neural Networks for Visual Recognition [WWW Docu- doli, A., Abualhasan, A., Abu-Gharbieh, E., Akram, T.T., Hamad, H.al., Alahdab, F.,
ment]. Stanford University, URL https://cs231n.github.io/convolutional-networks Alanezi, F.M., Alipour, V., Almustanyir, S., Amu, H., Ansari, I., Arabloo, J.,
(accessed 12.2.22). Ashraf, T., Astell-Burt, T., Ayano, G., Ayuso-Mateos, J.L., Baig, A.A., Barnett, A.,

12
P. Erdogmus and A.T. Kabakus Engineering Applications of Artificial Intelligence 123 (2023) 106254

Barrow, A., Baune, B.T., Béjot, Y., Bezabhe, W.M.M., Bezabih, Y.M., Bhaga- O’Malley, T., Bursztein, E., Long, J., Chollet, F., 2019. KerasTuner [WWW Document].
vathula, A.S., Bhaskar, S., Bhattacharyya, K., Bijani, A., Biswas, A., Bolla, S.R., Keras, URL https://github.com/keras-team/keras-tuner (accessed 12.2.22).
Boloor, A., Brayne, C., Brenner, H., Burkart, K., Burns, R.A., Cámera, L.A., Cao, C., Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O.,
Carvalho, F., Castro-de Araujo, L.F.S., Catalá-López, F., Cerin, E., Chavan, P.P., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A.,
Cherbuin, N., Chu, D.T., Costa, V.M., Couto, R.A.S., Dadras, O., Dai, X., Dan- Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, É., 2011. Scikit-learn: Machine
dona, L., Dandona, R., de la Cruz-Góngora, V., Dhamnetiya, D., Dias da Silva, D.,
learning in python. J. Mach. Learn. Res. 12, 2825–2830.
Diaz, D., Douiri, A., Edvardsson, D., Ekholuenetale, M., Sayed, I.el., El-Jaafary, S.I.,
Pereira, C.R., Pereira, D.R., Rosa, G.H., Albuquerque, V.H.C., Weber, S.A.T., Hook, C.,
Eskandari, K., Eskandarieh, S., Esmaeilnejad, S., Fares, J., Faro, A., Farooque, U.,
Papa, J.P., 2018. Handwritten dynamics assessment through convolutional neural
Feigin, V.L., Feng, X., Fereshtehnejad, S.M., Fernandes, E., Ferrara, P., Filip, I.,
networks: An application to Parkinson’s disease identification. Artif. Intell. Med.
Fillit, H., Fischer, F., Gaidhane, S., Galluzzo, L., Ghashghaee, A., Ghith, N.,
Gialluisi, A., Gilani, S.A., Glavan, I.R., Gnedovskaya, E.v., Golechha, M., Gupta, R., 87, 67–77. http://dx.doi.org/10.1016/j.artmed.2018.04.001.
Gupta, V.B., Gupta, V.K., Haider, M.R., Hall, B.J., Hamidi, S., Hanif, A., Han- Rasmussen, J., Langerman, H., 2019. Alzheimer’s disease – why we need early
key, G.J., Haque, S., Hartono, R.K., Hasaballah, A.I., Hasan, M.T., Hassan, A., diagnosis. Degener Neurol. Neuromuscul. Dis. 9, 123–130. http://dx.doi.org/10.
Hay, S.I., Hayat, K., Hegazy, M.I., Heidari, G., Heidari-Soureshjani, R., Herteliu, C., 2147/dnnd.s228939.
Househ, M., Hussain, R., Hwang, B.F., Iacoviello, L., Iavicoli, I., Ilesanmi, O.S., Reisberg, B., Ferris, S.H., de Leon, M.J., Crook, T., 1982. The global deterioration
Ilic, I.M., Ilic, M.D., Irvani, S.S.N., Iso, H., Iwagami, M., Jabbarinejad, R., Jacob, L., scale for assessment of primary degenerative dementia. Am. J. Psychiatry 139,
Jain, V., Jayapal, S.K., Jayawardena, R., Jha, R.P., Jonas, J.B., Joseph, N., 1136–1139. http://dx.doi.org/10.1176/ajp.139.9.1136.
Kalani, R., Kandel, A., Kandel, H., Karch, A., Kasa, A.S., Kassie, G.M., Keshavarz, P., Salimans, T., Kingma, D.P., 2016. Weight normalization: A simple reparameterization
Khan, M.A., Khatib, M.N., Khoja, T.A.M., Khubchandani, J., Kim, M.S., Kim, Y.J., to accelerate training of deep neural networks. In: Advances in Neural Information
Kisa, A., Kisa, S., Kivimäki, M., Koroshetz, W.J., Koyanagi, A., Kumar, G.A., Processing Systems, Vol. 29 (NIPS 2016). Barcelona, Spain, pp. 1–9.
Kumar, M., Lak, H.M., Leonardi, M., Li, B., Lim, S.S., Liu, X., Liu, Y., Logros-
Saxton, J., Lopez, O.L., Ratcliff, G., Dulberg, C., Fried, L.P., Carlson, M.C., New-
cino, G., Lorkowski, S., Lucchetti, G., Lutzky Saute, R., Magnani, F.G., Malik, A.A.,
man, A.B., Kuller, L., 2004. Preclinical Alzheimer disease: Neuropsychological
Massano, J., Mehndiratta, M.M., Menezes, R.G., Meretoja, A., Mohajer, B., Mo-
test performance 1.5 to 8 years prior to onset. Neurology 63, 2341–2347. http:
hamed Ibrahim, N., Mohammad, Y., Mohammed, A., Mokdad, A.H., Mondello, S.,
Moni, M.A.A., Moniruzzaman, M., Mossie, T.B., Nagel, G., Naveed, M., Nayak, V.C., //dx.doi.org/10.1212/01.WNL.0000147470.58328.50.
Neupane Kandel, S., Nguyen, T.H., Oancea, B., Otstavnov, N., Otstavnov, S.S., Shahi, T.B., Sitaula, C., Neupane, A., Guo, W., 2022. Fruit classification using attention-
Owolabi, M.O., Panda-Jonas, S., Pashazadeh Kan, F., Pasovic, M., Patel, U.K., based MobileNetV2 for industrial applications. PLoS One 17, 1–21. http://dx.doi.
Pathak, M., Peres, M.F.P., Perianayagam, A., Peterson, C.B., Phillips, M.R., Pin- org/10.1371/journal.pone.0264586.
heiro, M., Piradov, M.A., Pond, C.D., Potashman, M.H., Pottoo, F.H., Prada, S.I., Shen, F., Shen, C., Zhou, X., Yang, Y., Shen, H.T., 2016. Face image classification by
Radfar, A., Raggi, A., Rahim, F., Rahman, M., Ram, P., Ranasinghe, P., Rawaf, D.L., pooling raw features. Pattern Recognit. 54, 94–103. http://dx.doi.org/10.1016/j.
Rawaf, S., Rezaei, N., Rezapour, A., Robinson, S.R., Romoli, M., Roshandel, G., patcog.2016.01.010.
Sahathevan, R., Sahebkar, A., Sahraian, M.A., Sathian, B., Sattin, D., Sawhney, M., Singhal, A., Bangar, O., Naithani, V., 2012. Medicinal plants with a potential to treat
Saylan, M., Schiavolin, S., Seylani, A., Sha, F., Shaikh, M.A., Shaji, K.S., Shan- Alzheimer and associated symptoms. Int. J. Nutr. Pharmacol. Neurol. Dis. 2, 84–91.
nawaz, M., Shetty, J.K., Shigematsu, M., Shin, J. il, Shiri, R., Silva, D.A.S., http://dx.doi.org/10.4103/2231-0738.95927.
Silva, J.P., Silva, R., Singh, J.A., Skryabin, V.Y., Skryabina, A.A., Smith, A.E.,
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R., 2014.
Soshnikov, S., Spurlock, E.E., Stein, D.J., Sun, J., Tabarés-Seisdedos, R., Thakur, B.,
Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn.
Timalsina, B., Tovani-Palone, M.R., Tran, B.X., Tsegaye, G.W., Valadan Tahbaz, S.,
Res. 15, 1929–1958.
Valdez, P.R., Venketasubramanian, N., Vlassov, V., Vu, G.T., Vu, L.G., Wang, Y.P.,
Wimo, A., Winkler, A.S., Yadav, L., Yahyazadeh Jabbari, S.H., Yamagishi, K., Sun, M., Song, Z., Jiang, X., Pan, J., Pang, Y., 2017. Learning pooling for convolu-
Yang, L., Yano, Y., Yonemoto, N., Yu, C., Yunusa, I., Zadey, S., Zastrozhin, M.S., tional neural network. Neurocomputing 224, 96–104. http://dx.doi.org/10.1016/J.
Zastrozhina, A., Zhang, Z.J., Murray, C.J.L., Vos, T., 2022. Estimation of the global NEUCOM.2016.10.049.
prevalence of dementia in 2019 and forecasted prevalence in 2050: an analysis Taleb, C., Likforman-Sulem, L., Mokbel, C., Khachab, M., 2020. Detection of Parkinson’s
for the Global Burden of Disease Study 2019. Lancet Public Health 7, 105–125. disease from handwriting using deep learning: a comparative study. Evol. Intell.
http://dx.doi.org/10.1016/S2468-2667(21)00249-8. 1–12. http://dx.doi.org/10.1007/s12065-020-00470-0.
Nichols, E., Szoeke, C.E.I., Vollset, S.E., Abbasi, N., Abd-Allah, F., Abdela, J., The pandas development team, 2020. Pandas: Python data analysis library [www
Aichour, M.T.E., Akinyemi, R.O., Alahdab, F., Asgedom, S.W., Awasthi, A., Barker- document]. URL https://pandas.pydata.org (accessed 12.2.22).
Collo, S.L., Baune, B.T., Béjot, Y., Belachew, A.B., Bennett, D.A., Biadgo, B., van Veen, R., 2022. Welcome to scikit-learning vector quantization’s documentation
Bijani, A., bin Sayeed, M.S., Brayne, C., Carpenter, D.O., Carvalho, F., Catalá- [www document]. URL https://sklvq.readthedocs.io (accessed 12.2.22).
López, F., Cerin, E., Choi, J.Y.J., Dang, A.K., Degefa, M.G., Djalalinia, S.,
Véstias, M.P., 2019. A survey of convolutional neural networks on edge with
Dubey, M., Duken, E.E., Edvardsson, D., Endres, M., Eskandarieh, S., Faro, A.,
reconfigurable computing. Algorithms 12, 1–24. http://dx.doi.org/10.3390/
Farzadfar, F., Fereshtehnejad, S.M., Fernandes, E., Filip, I., Fischer, F., Gebre, A.K.,
a12080154.
Geremew, D., Ghasemi-Kasman, M., Gnedovskaya, E.v., Gupta, R., Hachinski, V.,
Hagos, T.B., Hamidi, S., Hankey, G.J., Haro, J.M., Hay, S.I., Irvani, S.S.N., Jha, R.P., Wang, W., Zhao, M., Wang, J., 2019. Effective android malware detection with a hybrid
Jonas, J.B., Kalani, R., Karch, A., Kasaeian, A., Khader, Y.S., Khalil, I.A., Khan, E.A., model based on deep autoencoder and convolutional neural network. J. Ambient
Khanna, T., Khoja, T.A.M., Khubchandani, J., Kisa, A., Kissimova-Skarbek, K., Intell. Humaniz. Comput. 10, 3035–3043. http://dx.doi.org/10.1007/s12652-018-
Kivimäki, M., Koyanagi, A., Krohn, K.J., Logroscino, G., Lorkowski, S., Majdan, M., 0803-6.
Malekzadeh, R., März, W., Massano, J., Mengistu, G., Meretoja, A., Mohammadi, M., Waskom, M.L., 2021. Seaborn: statistical data visualization. J. Open Source Softw. 6,
Mohammadi-Khanaposhtani, M., Mokdad, A.H., Mondello, S., Moradi, G., Nagel, G., 1–4. http://dx.doi.org/10.21105/joss.03021.
Naghavi, M., Naik, G., Nguyen, L.H., Nguyen, T.H., Nirayo, Y.L., Nixon, M.R., Weiss, K., Khoshgoftaar, T.M., Wang, D.D., 2016. A survey of transfer learning. J. Big
Ofori-Asenso, R., Ogbo, F.A., Olagunju, A.T., Owolabi, M.O., Panda-Jonas, S., Data 3, 1–40. http://dx.doi.org/10.1186/s40537-016-0043-6.
Passos, V.M.de A., Pereira, D.M., Pinilla-Monsalve, G.D., Piradov, M.A., Pond, C.D., 2022. Welcome To LightGBM’s Documentation [WWW Document]. Microsoft, URL
Poustchi, H., Qorbani, M., Radfar, A., Reiner, R.C., Robinson, S.R., Roshandel, G., https://lightgbm.readthedocs.io (accessed 12.2.22).
Rostami, A., Russ, T.C., Sachdev, P.S., Safari, H., Safiri, S., Sahathevan, R.,
2022. Xgboost documentation [WWW Document]. URL https://xgboost.readthedocs.io
Salimi, Y., Satpathy, M., Sawhney, M., Saylan, M., Sepanlou, S.G., Shafieesabet, A.,
(accessed 12.2.22).
Shaikh, M.A., Sahraian, M.A., Shigematsu, M., Shiri, R., Shiue, I., Silva, J.P.,
Zhang, X., Chen, X., Yao, L., Ge, C., Dong, M., 2019. Deep neural network hyperpa-
Smith, M., Sobhani, S., Stein, D.J., Tabarés-Seisdedos, R., Tovani-Palone, M.R.,
rameter optimization with orthogonal array tuning. In: International Conference on
Tran, B.X., Tran, T.T., Tsegay, A.T., Ullah, I., Venketasubramanian, N., Vlassov, V.,
Wang, Y.P., Weiss, J., Westerman, R., Wijeratne, T., Wyper, G.M.A., Yano, Y., Neural Information Processing (ICONIP 2019). Springer, Sydney, NSW, Australia,
Yimer, E.M., Yonemoto, N., Yousefifard, M., Zaidi, Z., Zare, Z., Vos, T., Feigin, V.L., pp. 287–295. http://dx.doi.org/10.1007/978-3-030-36808-1_31.
Murray, C.J.L., 2019. Global, regional, and national burden of Alzheimer’s disease Zheng, Y., Iwana, B.K., Uchida, S., 2018. Discovering class-wise trends of max-pooling
and other dementias, 1990–2016: a systematic analysis for the Global Burden of in subspace. In: Proceedings of 2018 16th International Conference on Frontiers
Disease Study 2016. Lancet Neurol. 18, 88–106. http://dx.doi.org/10.1016/S1474- in HandWriting Recognition. ICFHR, IEEE, Niagara Falls, NY, USA, pp. 98–103.
4422(18)30403-4. http://dx.doi.org/10.1109/ICFHR-2018.2018.00026.

13

You might also like