You are on page 1of 4

Early Detection of Alzheimer’s Disease

Dr.Kamatchi Priya L Jayant Harwalkar Hemankith Reddy M


Department of Computer Science Department of Computer Science Department of Computer Science
PES University PES University PES University
Bangalore, India Bangalore, India Bangalore, India
priyal@pes.edu jayanthharwalkar@gmail.com hemankith@gmail.com
Shravani Neelesh S
Department of Computer Science Department of Computer Science
PES University PES University
Bangalore, India Bangalore, India
shravanim1250@gmail.com neeleshsamptur@gmail.com
Abstract—Alzheimer’s disease is a progressive neurological data to compare it with normal cognition by using a
disorder that causes the brain to shrink and brain cells to die. developed machine learning and deep learning model
As of now, the disease is incurable but medication and
management strategies may temporarily improve symptoms. which predicts AD risk.
AD models play an important role in early diagnosis so the Despite the currently available bio markers, electronic
patient can have an opportunity to get the treatment. This healthcare data, health records, and increase in
paper contains comprehensive analysis of different machine
learning methods used to detect AD.
digitization of data, there is not enough information on
Index Terms—component, formatting, style, styling, insert how to use this large-scale health data in the prediction
of AD risk, but there are few studies that demonstrate
that potential AD risk can be predicted when these
I. INTRODUCTION resources are combined with data-driven machine
learning models.[5]
Alzheimer’s disease(AD), a brain disorder, is
considered to be a form of dementia that slowly The subsequent sections are divided into literature
destroys memory, thinking, and eventually, the ability to review, proposed method, implementation details and
perform daily tasks. It is caused by the loss and conclusion. In proposed method we have described the
degeneration of neurons in the brain mostly in the architecture and discussed about the model. The
cortex region. AD is caused by the formation of plagues implementation details includes the brief set out of the
in which clumps of abnormal proteins are formed data set, prepossessing and winded it with the
outside the neuron which block the neuron connections, comparative analysis with respect to different models.
disrupting signals, which leads to impairment of the
brain. AD can also be formed by tangles in which a build- II. LITERATURE REVIEW
up of protein occurs inside the neuron which affects the
Considering the amount of research that is made,
signal transition. In AD, the brain starts to shrink, the
machine learning techniques, branch of artificial
gyri become narrow while the sulci widen. The risk of
intelligence are widely used to this. Even with the
getting this disease increases with age and it is mostly
existence of all these methods, there are no instruments
seen in older people.
for the detection. However, certain, physical,
AD can be diagnosed by doing a brain autopsy and neurophycological, phycological, neurological tests can
biopsy and there is no complete cure for the disease. be used for identification of this disease.[12]
Having an early detection improves the chances for
SVM was heavily researched for both feature
effective treatment and the ability of the individual to
selection and modelling. There are many variants of
participate in a wide variety of clinical trials. Treatment
SVM used for classification, for example it can be done
is effective if given in the early stages. Currently, there
using SPECT images which uses SPECT perfusion imaging
are no treatments to reverse the damage already caused
to classify the healthy patients’ images from those
but proper medication can halt the further progression
having AD. The approach is based on linear
of AD and prolong their life.
programming formulation based on linear hyperplane
AD can be detected by performing scans like magnetic which performs simultaneous feature selection and
resonance imaging(MRI), computed tomography(CT), or classification. This method has specificity of 90% and
positron emission tomography(PET)[3]. Researchers use specificity of 84%, this is also proven to be better than
raw MRI brain scans, demographic images, and clinical Fisher Linear Discriminant(FLD) and also statistical
parametric mapping (SPM) and also better than human reduction and classification so there is no need to do
experts.[6] Using SVM to find Atrophy patterns in AD as extraction manually. The weights in initial layers act as
a feature selection is also proven to be one of the best feature extractors and their values can be further
methods. This method classifies whether the patient has improved by iterative learning[9]. CNN was used as a
AD or not based on the anatomical MRI. Even though feature extractor for Decision tree, SVM, K means
this approach provided good results on Cohort 1, the clustering, and Linear Discriminate classifiers are
results weren’t great for inter-cohort as the accuracy applied. The classification accuracy was seen to be
dropped to 74%. improved by including a pre-processing method before
This showed that the selected regions of considered CNN models[15].
refined atlas did not have good generalisation ability[7]. A deep learning architecture with a softmax
SVM was also used for binary classification using LIBSVM regression layer and stacked sparse auto-encoders was
toolbox under MATLAB(the Scikit-learn library can also also used to develop early diagnosis technique for
be used for python to implement SVM(SVC-Support Alzheimer’s disease. The autoencoder learns the input
Vector Classification)).SVM is less sensitive to the representations. By selecting the highest predicted
dimensionality of the problem and hence allows working probabilities for each label, the softmax regression layer
with complex problems that involve a large number of classifies instances.•The Accuracy, Sensitivity and
variables. The Radial Basis Function kernel was chosen Specificity of the model turned out to be 87.76%,
as it offers good asymptotic behaviour.However, the 88.57%, and 87.22% for classification of AD vs NC and
results in some conditions might be nontrival[13]. showed increase in accuracy compared to convention
Decision trees is also one of the most popular methods such as SVM [14]. Another approach
methods as well. The ID3 Decision Tree, along with implements feature extraction of brain voxels from grey
measures like Entropy and Information Gain has been matter and classify using the CNN algorithm. Voxels are
used in this research. At each node in the decision tree, enhanced with a Gaussian filter, and unnecessary tissues
the attribute with the highest information gain is chosen are deleted from enhanced voxels. And then CNN
as the splitting attribute. algorithm is used for classification and this method
PSO is one of the well known methods for optimising achived an accuracy of accuracy of 90.47%, 92.59%
feature extraction as well as classification. Optimisation precision, and recall of 86.66% in comparison to some
algorithms like GA (Generic programming) for feature system which uses physician decision.[16]
selection, PSO (Particle Swarm Optimisation) for
III. PROPOSED METHODOLOGY
performance optimisation and ELM (Extreme Learning
Machine) for classification, VBM (Voxal Based A. Architecture
Morphometr for feature extraction, along with ELM and The images from the database are fed to a pipeline
PSO classified can be used to identify the class of AD which consists of a series of pre-processing techniques.
among the three classes. Training and testing accuracy PSO performs feature selection on the pre-processed
were 94.57% and 87.23% respectively for GA-ELMPSO images. The resultant images will be stored in a
algorithm over 10 random trials.[18] PSO for feature database and will be used by PSO to get optimal
reduction along with Decision Tree Classifier for parameters of the Convolutional Neural Network. This
classification achieved an accuracy of 91.24% while the produces an optimized architecture for CN. The CNN
sensitivity was 91.24% with specificity being 93.10%. In model is trained, validated and tested.
this method, feature reduction by PSO gave a reduced B. CNN parameter optimisation using PSO
set of variables instead of original data and finally
The process of training is repetitive and continued
classification is done using decision tree technique there
until the stop criteria is met. The steps to optimize PSO
are many parameters are to be found which takes a lot
are:
of time and energy and process of noise reduction is
difficult as the images were degraded.[19] 1) Feed the pre-processed images to as input to the
CNN was also prominent as prediction and CNN network. The images should be of the same
classification model. The classification is done using two size and characteristics. For example, they should of
methods and the first one is building the CNN the same dimensions, scale, color gamma, etc.,
architecture from scratch based on MRI scans 2D and 3 2) Design of PSO parameters. The algorithm’s particle
D convolutions. The second method is using transfer population is generated. This involves setting the
learning techniques like VGG19 pre-trained model. values of number of particles, number of iterations,
Standard CNN contains feature extraction, feature inertial weight, social constant, cognitive constant
etc., Random values can be set or can also be set X1 coordinate controls the hyper-parameter for
according to some heuristic convolution layer number. If X1 = 4, it means that there
3) With the parameters obtained by the PSO, will be 4 convolution layers. X2 and X3 control the
parameters of CNN are initialised (parameters to be hyper-parameters filter number and size respectively. If
set are given in the table below). The CNN is ready X2 = 32 and X3 = 2, it implies that there will be 32 filters
to be trained now. of size 5x5 (1 is mapped to 3x3, 2 to 5x5, 3 corresponds
4) Training and validation of CNN. The CNN reads, to 7x7, 4 implies 9x9). Similarly, X4 and X5 control filter
processes, validates and tests the input images. This numbers and size for layer 2. The same goes for all the
step produces values for the objective functions. remaining coordinates. X10 represents the batch size for
The objective functions are AIC and Recognition training.
rate. These values are returned to the PSO.
TABLE III: Example particle generated by algorithm
5) Calculate the objective function. The objective
function is calculated by PSO to obtain the optimal 4 100 2 64 2 64 3 96 1 32
values in the search space. IV. IMPLEMENTATION DETAILS
6) PSO parameters are updated. Both, the position of A. Dataset
the particles and the velocity of the particles that The data was obtained from the Alzheimer’s Disease
characterize the particles, are updated by taking Neuroimaging Initiative (ADNI)database
into consideration Pbest and Gbest. They are updated (adni.loni.usc.edu). The ADNI is a long-term study that
based on its own optimal position (Pbest) and the uses indicators such imaging, genetic, clinical, and
optimal position of the entire swarm in the search biochemical markers to follow and detect Alzheimer’s
space (Gbest). disease early. The ADNI data repository has around 2220
7) This process continues until the end criteria is met. patients’ imaging, clinical, and genetic data from four
The end criteria can be the number of iterations or a investigations (ADNI3, ADNI2, ADNI1 and ADNI GO). The
threshold value. image data (MRI scans) was used. ADNI provides
8) It is then determined which architecture is optimal. researchers with as they work on progression of
Here the Gbest particle represents the optimal Alzheimer’s disease. PET images, MRI images, genetics,
architecture. cognitive tests, CSF, and blood data are collected,
To elaborate further on how the algorithm will work, validated and these can be used by researchers as
an example is presented. The particle structure can predictors or the disease. The first goal is to detect AD at
consist of 8 positions as shown below. Each particle has the earliest stage (pre-dementia) and identify
these 8 positions and each of these positions are biomarkers that can be used to track the disease’s
responsible for tuning one hyper-parameter. The hyper- progression. Support breakthroughs in Alzheimer’s
parameters to be optimized are given in the below disease intervention, prevention, and therapy by using
table. innovative diagnostic tools at the earliest possible stage
(when intervention may be most successful).
TABLE I: Structure of Particle ADNI has ground-breaking data-access policy, which
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 makes data available to all scientists worldwide without
TABLE II: Example Particle Structure restriction. We have acquired 2294 ADNI1 1.5T MRI
Particle Hyper-Parameter Search Space scans which are in the NiFTI format. The images are pre-
Coordinate classified into CN, MCI or AD. Each of the images are of
X1 Convolution layer no. [1, 4] the shape 192 x 192 X 160.
X1 Filter no. (1st layer) [32, 128]
X3 Filter size (1st layer) [1, 3] B. Pre-processing
X4 Filter number (2st layer) [32, 128] The pre-processing steps are:
X5 Filter size (2st layer) [1, 3]
(I) ADNI pipeline
X6 Filter number (3rd layer) [32, 128]
(II) Registration
X7 Filter size (4th layer) [1, 3]
(III) Segmentation and Normalization
X8 Filter number (4th layer) [32, 128]
(IV) Skull Stripping
X9 Filter size (5th layer) [1, 3]
(V) Smoothing
X10 Filter number (5th layer) [32, 128]
(I) ADNI pipeline: The images in the dataset are those features using a transformation model.
obtained from MRI machines, and the machines documentclassarticle
use magnetic waves and radio waves to produce (III) Segmentation and Normalization: The division of
the scans. There are parameters such as radio brain tissue, which can be split into tissue sections
frequency, magnetic frequency and uniformity of such as cerebrospinal fluid (CSF) which cushions
the coil which can cause variations in the MRI the brain, grey matter (GM) where the actual
scans. To correct such variations in the images, processing is done and white matter (WM) which
ADNI pipeline is used. The following is done on the gives communication between different GM areas,
MRI image as a part of this pipleline is the major focus of the brain magnetic resonance
(i) Post-Acquisition Correction: Scanners with imaging (MRI) image segmentation approach (CSF).
different acquisition parameters provide Various significant brain regions that could be
considerable hurdles. Small changes in useful in identifying Alzheimer’s disease are found
acquisition parameters for quantitative and kept during image segmentation.
sequences have a significant impact on machine Normalization is the process of shifting and scaling
learning models, thus rectifying these an image so that the pixels have a zero mean and
inconsistencies is critical. unit variance. By removing scale invariance, the
(ii) B1 Intensity Variation: B1 errors are one of the model can converge faster.
problems in measuring MTR which expands to (IV) Skull Stripping: Skull stripping is a process where in
magnetization transfer ratio since this MTR the skull and the non-brain region of the image is
value changes with change in the magnetization removed and only the brain portion of the image is
transfer (MT) pulse amplitude. These errors can retained as we deal with only this region for
also be caused due to nonuniformity in the analysis of Alzheimer’s disease. Skull stripping is
radiofrequency and incorrect transmitter output one of the first steps in the process of diagnosing
settings when accounting for changing levels of brain disorders. In a brain MRI scan, it is the
RF coil loading. These mistakes need to be method for differentiating brain tissue from non-
corrected in order to obtain images with no brain tissue. Even for experienced radiologists,
variations and loss of crucial data. separating the brain from the skull is a time-
(iii) Intensity Non-Uniformity: The quality of consuming process, with results that vary widely
acquired data can be affected by intensity non- from person to person. This is a pipeline that only
uniformity. The term ”intensity non-uniformity” needs the input of a raw MRI picture and should
refers to anatomically unrelated intensity produce a segmented image of the brain after the
variance in data. It can be caused by the radio- necessary preprocessing.
frequency coil used, the acquisition pulse (V) Smoothing: Smoothing involves removing
sequence used, and the sample’s composition redundant information and noise from the images.
and geometry. As a result, it is critical to correct It helps in easy identification of trends and
this variation, and a variety of approaches have patterns in the images. When the image is
been offered to do so. produced in an MRI machine, it consists of
(II) Registration: The act of aligning images to be different kinds of noise which needs to be removed
analyzed is called registration of image, and it is a in order to obtain clean image without loss of any
critical phase in which data from several images crucial information.
must be integrated everywhere. They can be taken
at various times, from various perspectives, and
with various sensors. Registration in medical
imaging allows you to merge data from multiple
modalities, such as CT, MR, SPECT, or PET, to get a
full picture of the patient. In our case, since the
MRI scans are taken from different angles, it is the
process of geometrically aligning all the images for
further analysis. It can be used to create
correspondence between features in a set of
images and then infer correspondence away from

You might also like