This document discusses early detection of Alzheimer's disease through machine learning and deep learning models. It provides an overview of different machine learning methods that have been used for Alzheimer's detection, including support vector machines, decision trees, convolutional neural networks, and ensemble methods. The proposed method uses a pipeline of preprocessing techniques and a machine learning model to classify Alzheimer's patients' brain scans and compare them to normal cognition data in order to predict the risk of Alzheimer's disease.
This document discusses early detection of Alzheimer's disease through machine learning and deep learning models. It provides an overview of different machine learning methods that have been used for Alzheimer's detection, including support vector machines, decision trees, convolutional neural networks, and ensemble methods. The proposed method uses a pipeline of preprocessing techniques and a machine learning model to classify Alzheimer's patients' brain scans and compare them to normal cognition data in order to predict the risk of Alzheimer's disease.
This document discusses early detection of Alzheimer's disease through machine learning and deep learning models. It provides an overview of different machine learning methods that have been used for Alzheimer's detection, including support vector machines, decision trees, convolutional neural networks, and ensemble methods. The proposed method uses a pipeline of preprocessing techniques and a machine learning model to classify Alzheimer's patients' brain scans and compare them to normal cognition data in order to predict the risk of Alzheimer's disease.
Dr.Kamatchi Priya L Jayant Harwalkar Hemankith Reddy M
Department of Computer Science Department of Computer Science Department of Computer Science PES University PES University PES University Bangalore, India Bangalore, India Bangalore, India priyal@pes.edu jayanthharwalkar@gmail.com hemankith@gmail.com Shravani Neelesh S Department of Computer Science Department of Computer Science PES University PES University Bangalore, India Bangalore, India shravanim1250@gmail.com neeleshsamptur@gmail.com Abstract—Alzheimer’s disease is a progressive neurological data to compare it with normal cognition by using a disorder that causes the brain to shrink and brain cells to die. developed machine learning and deep learning model As of now, the disease is incurable but medication and management strategies may temporarily improve symptoms. which predicts AD risk. AD models play an important role in early diagnosis so the Despite the currently available bio markers, electronic patient can have an opportunity to get the treatment. This healthcare data, health records, and increase in paper contains comprehensive analysis of different machine learning methods used to detect AD. digitization of data, there is not enough information on Index Terms—component, formatting, style, styling, insert how to use this large-scale health data in the prediction of AD risk, but there are few studies that demonstrate that potential AD risk can be predicted when these I. INTRODUCTION resources are combined with data-driven machine learning models.[5] Alzheimer’s disease(AD), a brain disorder, is considered to be a form of dementia that slowly The subsequent sections are divided into literature destroys memory, thinking, and eventually, the ability to review, proposed method, implementation details and perform daily tasks. It is caused by the loss and conclusion. In proposed method we have described the degeneration of neurons in the brain mostly in the architecture and discussed about the model. The cortex region. AD is caused by the formation of plagues implementation details includes the brief set out of the in which clumps of abnormal proteins are formed data set, prepossessing and winded it with the outside the neuron which block the neuron connections, comparative analysis with respect to different models. disrupting signals, which leads to impairment of the brain. AD can also be formed by tangles in which a build- II. LITERATURE REVIEW up of protein occurs inside the neuron which affects the Considering the amount of research that is made, signal transition. In AD, the brain starts to shrink, the machine learning techniques, branch of artificial gyri become narrow while the sulci widen. The risk of intelligence are widely used to this. Even with the getting this disease increases with age and it is mostly existence of all these methods, there are no instruments seen in older people. for the detection. However, certain, physical, AD can be diagnosed by doing a brain autopsy and neurophycological, phycological, neurological tests can biopsy and there is no complete cure for the disease. be used for identification of this disease.[12] Having an early detection improves the chances for SVM was heavily researched for both feature effective treatment and the ability of the individual to selection and modelling. There are many variants of participate in a wide variety of clinical trials. Treatment SVM used for classification, for example it can be done is effective if given in the early stages. Currently, there using SPECT images which uses SPECT perfusion imaging are no treatments to reverse the damage already caused to classify the healthy patients’ images from those but proper medication can halt the further progression having AD. The approach is based on linear of AD and prolong their life. programming formulation based on linear hyperplane AD can be detected by performing scans like magnetic which performs simultaneous feature selection and resonance imaging(MRI), computed tomography(CT), or classification. This method has specificity of 90% and positron emission tomography(PET)[3]. Researchers use specificity of 84%, this is also proven to be better than raw MRI brain scans, demographic images, and clinical Fisher Linear Discriminant(FLD) and also statistical parametric mapping (SPM) and also better than human reduction and classification so there is no need to do experts.[6] Using SVM to find Atrophy patterns in AD as extraction manually. The weights in initial layers act as a feature selection is also proven to be one of the best feature extractors and their values can be further methods. This method classifies whether the patient has improved by iterative learning[9]. CNN was used as a AD or not based on the anatomical MRI. Even though feature extractor for Decision tree, SVM, K means this approach provided good results on Cohort 1, the clustering, and Linear Discriminate classifiers are results weren’t great for inter-cohort as the accuracy applied. The classification accuracy was seen to be dropped to 74%. improved by including a pre-processing method before This showed that the selected regions of considered CNN models[15]. refined atlas did not have good generalisation ability[7]. A deep learning architecture with a softmax SVM was also used for binary classification using LIBSVM regression layer and stacked sparse auto-encoders was toolbox under MATLAB(the Scikit-learn library can also also used to develop early diagnosis technique for be used for python to implement SVM(SVC-Support Alzheimer’s disease. The autoencoder learns the input Vector Classification)).SVM is less sensitive to the representations. By selecting the highest predicted dimensionality of the problem and hence allows working probabilities for each label, the softmax regression layer with complex problems that involve a large number of classifies instances.•The Accuracy, Sensitivity and variables. The Radial Basis Function kernel was chosen Specificity of the model turned out to be 87.76%, as it offers good asymptotic behaviour.However, the 88.57%, and 87.22% for classification of AD vs NC and results in some conditions might be nontrival[13]. showed increase in accuracy compared to convention Decision trees is also one of the most popular methods such as SVM [14]. Another approach methods as well. The ID3 Decision Tree, along with implements feature extraction of brain voxels from grey measures like Entropy and Information Gain has been matter and classify using the CNN algorithm. Voxels are used in this research. At each node in the decision tree, enhanced with a Gaussian filter, and unnecessary tissues the attribute with the highest information gain is chosen are deleted from enhanced voxels. And then CNN as the splitting attribute. algorithm is used for classification and this method PSO is one of the well known methods for optimising achived an accuracy of accuracy of 90.47%, 92.59% feature extraction as well as classification. Optimisation precision, and recall of 86.66% in comparison to some algorithms like GA (Generic programming) for feature system which uses physician decision.[16] selection, PSO (Particle Swarm Optimisation) for III. PROPOSED METHODOLOGY performance optimisation and ELM (Extreme Learning Machine) for classification, VBM (Voxal Based A. Architecture Morphometr for feature extraction, along with ELM and The images from the database are fed to a pipeline PSO classified can be used to identify the class of AD which consists of a series of pre-processing techniques. among the three classes. Training and testing accuracy PSO performs feature selection on the pre-processed were 94.57% and 87.23% respectively for GA-ELMPSO images. The resultant images will be stored in a algorithm over 10 random trials.[18] PSO for feature database and will be used by PSO to get optimal reduction along with Decision Tree Classifier for parameters of the Convolutional Neural Network. This classification achieved an accuracy of 91.24% while the produces an optimized architecture for CN. The CNN sensitivity was 91.24% with specificity being 93.10%. In model is trained, validated and tested. this method, feature reduction by PSO gave a reduced B. CNN parameter optimisation using PSO set of variables instead of original data and finally The process of training is repetitive and continued classification is done using decision tree technique there until the stop criteria is met. The steps to optimize PSO are many parameters are to be found which takes a lot are: of time and energy and process of noise reduction is difficult as the images were degraded.[19] 1) Feed the pre-processed images to as input to the CNN was also prominent as prediction and CNN network. The images should be of the same classification model. The classification is done using two size and characteristics. For example, they should of methods and the first one is building the CNN the same dimensions, scale, color gamma, etc., architecture from scratch based on MRI scans 2D and 3 2) Design of PSO parameters. The algorithm’s particle D convolutions. The second method is using transfer population is generated. This involves setting the learning techniques like VGG19 pre-trained model. values of number of particles, number of iterations, Standard CNN contains feature extraction, feature inertial weight, social constant, cognitive constant etc., Random values can be set or can also be set X1 coordinate controls the hyper-parameter for according to some heuristic convolution layer number. If X1 = 4, it means that there 3) With the parameters obtained by the PSO, will be 4 convolution layers. X2 and X3 control the parameters of CNN are initialised (parameters to be hyper-parameters filter number and size respectively. If set are given in the table below). The CNN is ready X2 = 32 and X3 = 2, it implies that there will be 32 filters to be trained now. of size 5x5 (1 is mapped to 3x3, 2 to 5x5, 3 corresponds 4) Training and validation of CNN. The CNN reads, to 7x7, 4 implies 9x9). Similarly, X4 and X5 control filter processes, validates and tests the input images. This numbers and size for layer 2. The same goes for all the step produces values for the objective functions. remaining coordinates. X10 represents the batch size for The objective functions are AIC and Recognition training. rate. These values are returned to the PSO. TABLE III: Example particle generated by algorithm 5) Calculate the objective function. The objective function is calculated by PSO to obtain the optimal 4 100 2 64 2 64 3 96 1 32 values in the search space. IV. IMPLEMENTATION DETAILS 6) PSO parameters are updated. Both, the position of A. Dataset the particles and the velocity of the particles that The data was obtained from the Alzheimer’s Disease characterize the particles, are updated by taking Neuroimaging Initiative (ADNI)database into consideration Pbest and Gbest. They are updated (adni.loni.usc.edu). The ADNI is a long-term study that based on its own optimal position (Pbest) and the uses indicators such imaging, genetic, clinical, and optimal position of the entire swarm in the search biochemical markers to follow and detect Alzheimer’s space (Gbest). disease early. The ADNI data repository has around 2220 7) This process continues until the end criteria is met. patients’ imaging, clinical, and genetic data from four The end criteria can be the number of iterations or a investigations (ADNI3, ADNI2, ADNI1 and ADNI GO). The threshold value. image data (MRI scans) was used. ADNI provides 8) It is then determined which architecture is optimal. researchers with as they work on progression of Here the Gbest particle represents the optimal Alzheimer’s disease. PET images, MRI images, genetics, architecture. cognitive tests, CSF, and blood data are collected, To elaborate further on how the algorithm will work, validated and these can be used by researchers as an example is presented. The particle structure can predictors or the disease. The first goal is to detect AD at consist of 8 positions as shown below. Each particle has the earliest stage (pre-dementia) and identify these 8 positions and each of these positions are biomarkers that can be used to track the disease’s responsible for tuning one hyper-parameter. The hyper- progression. Support breakthroughs in Alzheimer’s parameters to be optimized are given in the below disease intervention, prevention, and therapy by using table. innovative diagnostic tools at the earliest possible stage (when intervention may be most successful). TABLE I: Structure of Particle ADNI has ground-breaking data-access policy, which X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 makes data available to all scientists worldwide without TABLE II: Example Particle Structure restriction. We have acquired 2294 ADNI1 1.5T MRI Particle Hyper-Parameter Search Space scans which are in the NiFTI format. The images are pre- Coordinate classified into CN, MCI or AD. Each of the images are of X1 Convolution layer no. [1, 4] the shape 192 x 192 X 160. X1 Filter no. (1st layer) [32, 128] X3 Filter size (1st layer) [1, 3] B. Pre-processing X4 Filter number (2st layer) [32, 128] The pre-processing steps are: X5 Filter size (2st layer) [1, 3] (I) ADNI pipeline X6 Filter number (3rd layer) [32, 128] (II) Registration X7 Filter size (4th layer) [1, 3] (III) Segmentation and Normalization X8 Filter number (4th layer) [32, 128] (IV) Skull Stripping X9 Filter size (5th layer) [1, 3] (V) Smoothing X10 Filter number (5th layer) [32, 128] (I) ADNI pipeline: The images in the dataset are those features using a transformation model. obtained from MRI machines, and the machines documentclassarticle use magnetic waves and radio waves to produce (III) Segmentation and Normalization: The division of the scans. There are parameters such as radio brain tissue, which can be split into tissue sections frequency, magnetic frequency and uniformity of such as cerebrospinal fluid (CSF) which cushions the coil which can cause variations in the MRI the brain, grey matter (GM) where the actual scans. To correct such variations in the images, processing is done and white matter (WM) which ADNI pipeline is used. The following is done on the gives communication between different GM areas, MRI image as a part of this pipleline is the major focus of the brain magnetic resonance (i) Post-Acquisition Correction: Scanners with imaging (MRI) image segmentation approach (CSF). different acquisition parameters provide Various significant brain regions that could be considerable hurdles. Small changes in useful in identifying Alzheimer’s disease are found acquisition parameters for quantitative and kept during image segmentation. sequences have a significant impact on machine Normalization is the process of shifting and scaling learning models, thus rectifying these an image so that the pixels have a zero mean and inconsistencies is critical. unit variance. By removing scale invariance, the (ii) B1 Intensity Variation: B1 errors are one of the model can converge faster. problems in measuring MTR which expands to (IV) Skull Stripping: Skull stripping is a process where in magnetization transfer ratio since this MTR the skull and the non-brain region of the image is value changes with change in the magnetization removed and only the brain portion of the image is transfer (MT) pulse amplitude. These errors can retained as we deal with only this region for also be caused due to nonuniformity in the analysis of Alzheimer’s disease. Skull stripping is radiofrequency and incorrect transmitter output one of the first steps in the process of diagnosing settings when accounting for changing levels of brain disorders. In a brain MRI scan, it is the RF coil loading. These mistakes need to be method for differentiating brain tissue from non- corrected in order to obtain images with no brain tissue. Even for experienced radiologists, variations and loss of crucial data. separating the brain from the skull is a time- (iii) Intensity Non-Uniformity: The quality of consuming process, with results that vary widely acquired data can be affected by intensity non- from person to person. This is a pipeline that only uniformity. The term ”intensity non-uniformity” needs the input of a raw MRI picture and should refers to anatomically unrelated intensity produce a segmented image of the brain after the variance in data. It can be caused by the radio- necessary preprocessing. frequency coil used, the acquisition pulse (V) Smoothing: Smoothing involves removing sequence used, and the sample’s composition redundant information and noise from the images. and geometry. As a result, it is critical to correct It helps in easy identification of trends and this variation, and a variety of approaches have patterns in the images. When the image is been offered to do so. produced in an MRI machine, it consists of (II) Registration: The act of aligning images to be different kinds of noise which needs to be removed analyzed is called registration of image, and it is a in order to obtain clean image without loss of any critical phase in which data from several images crucial information. must be integrated everywhere. They can be taken at various times, from various perspectives, and with various sensors. Registration in medical imaging allows you to merge data from multiple modalities, such as CT, MR, SPECT, or PET, to get a full picture of the patient. In our case, since the MRI scans are taken from different angles, it is the process of geometrically aligning all the images for further analysis. It can be used to create correspondence between features in a set of images and then infer correspondence away from