You are on page 1of 8

Early Alzheimer’s Disease Prediction in Machine

Learning Setup: Empirical Analysis with Missing Value


Computation
Sidra Minhas1, Aasia Khanum2, Farhan Riaz3, Atif Alvi4, Shoab A. Khan5, Alz-
heimer’s Disease Neuroimaging Initiative
1, 3, 5 Computer Engineering Department, College of E&ME, National University of Sciences
and Technology (NUST), Islamabad, Pakistan
2, 4 Computer Science Department, Forman Christian College, Lahore, Pakistan

e-mail: sidra.minhas@ceme.nust.edu.pk

Abstract— Alzheimer’s Disease (AD) is the most prevalent progressive neuro-


degenerative disorder of the elderly. Prospective treatments for slowing down or
pausing the process of AD require identification of the disease at an early stage.
Many patients with mild cognitive impairment (MCI) may eventually develop
AD. In this study, we evaluate the significance of using longitudinal data for ef-
ficiently predicting MCI-to-AD conversion a few years ahead of clinical diagno-
sis. The use of longitudinal data is generally restricted due to missing feature
readings. We implement five different techniques to compute missing feature
values of neuropsychological predictors of AD. We use two different summary
measures to represent the artificially completed longitudinal features. In a com-
parison with other recent techniques, our work presents an improved accuracy of
71.16% in predicting pre-clinical AD. These results prove feasibility of building
AD staging and prognostic systems using longitudinal data despite the presence
of missing values.

Keywords: Machine learning, Alzheimer’s disease (AD), Mild Cognitive Im-


pairment (MCI), ADNI, longitudinal data, missing value, Support Vector Ma-
chine, classification, AUC

1 Introduction
Alzheimer’s Disease (AD) is the most common form of dementia diagnosed in elder
persons. AD results from neuro-degeneration which leads to cognitive failure and other
behavioral deficiencies in daily activities of patients. From 2000 – 2010, number of
deaths due to AD increased by 68%, whereas deaths caused by other diseases decreased
(Alzheimer’s Association, 2014). Hence, there is a dire need of developing systems for
early diagnosis of AD. For effective treatment, it is necessary to identify the disease
process at an early stage so that its progress can be halted or slowed down. Mild Cog-
nitive Impairment (MCI) is an early stage of deteriorating memory and intellectual
functionality. Though patients with MCI do not necessarily worsen, they are at high
risk of progressing to AD. To observe and monitor the people effected by or at risk of
being effected by AD, a longitudinal study named Alzheimer’s Disease Neuroimaging
Initiative (ADNI) [2] was launched in USA by National Institute of Aging (NIA), in
association with the National Institute of Biomedical Imaging and Bioengineering
(NIBIB), the Food and Drug Association (FDA) and several other pharmaceutical com-
panies in 2003.
MCI is a heterogeneous group consisting of instances which regain to normal cog-
nition, stay stable as MCI (MCIs) or progress to AD (MCIp). ADNI data, which con-
sists of serial Neuropsychological measure (NM), images (MRI, PET scans), biochem-
ical findings and genetic factors, has been used in numerous machine learning setups
to identify MCIp from within MCI subjects. Most of the studies have used multimodal
baseline data as input for learning disease outcome prediction [3, 4]. Moradi et al. [5]
used a semi-supervised machine learning framework using baseline imaging and NM
data. Matilla et al. [4] developed a statistical Disease State Index method to estimate
progression from healthy to disease on bases of various data sources. A good summary
of multimodal performance for AD prediction can be found in [5]. However, it is now
realized that for effective staging of the disease, it is useful to employ information con-
tained in time-sampled longitudinal data [6]. A recent implementation by Runtti et al.
[7] used longitudinal heterogeneous multimodal predictors to differentiate between
MCIp and MCIs. In view of such research findings, [8, 9] have published to use both
cognitive tests and biomarkers for AD staging.
Longitudinal studies, such as ADNI, are faced with the problem of missing data
which may be significant for slowly progressing diseases such as AD. Missing data in
these initiatives arise due to patients missing one or more follow up visits or completely
dropping out of the study. Consequently, a reduction in sample size is observed which
may bias the classifier performance. Lo et al [10] suggests that missing data in ADNI
is not Missing Completely at Random; hence, may contribute to further understanding
of AD progression. Generally, machine learning packages resort to immediately drop-
ping the instances with missing values. Our hypothesis is that by imputing missing data
values instead of instantaneous dropping it, and effectively utilizing longitudinal fea-
tures- important information differentiating MCIp from MCIs will be retained and su-
perior predictive performance will be delivered.
A number of studies exist utilizing multimodal (mostly baseline) biomarkers for
AD conversion prediction in machine learning framework [3, 4, 5, 6, 7]. The present
paper however employs unimodal longitudinal data for disease prediction. We select
only NM data modality due to its cheap and non-invasive nature as opposed to more
expensive and complicated methods of imaging and bio specimen sampling. We incor-
porate longitudinal data in our machine learning setup, and attempt to enhance their
prognostic power by effectively handling missing values. Using our proposed approach,
we foretell which of the MCI patients will progress to AD within 36 months of the
baseline visit and which of them will retain a stable diagnosis. Through extensive ex-
perimentation, we demonstrate the efficacy of longitudinal data while accounting for
missing values. The paper is organized in four sections. Section 2 describes the mate-
rials and proposed method. Experimental results are discussed in section 3, followed
by the conclusion in section 4.

2 Materials and methods


Figure 1 shows the major steps of this study. As a first step, missing data, within the
dataset, were identified and substituted using various techniques. The resulting artifi-
cially completed longitudinal features were represented using two different summary
measures. Support Vector Machine (SVM) classifier is employed for quantitative per-
formance evaluation. SVM is a commonly used technique in the domain under study,
preferred due to its high accuracy. Summary measures are later used to identify each
Fig. 1. Proposed method

feature’s relevance in MCIp vs. MCIs segregation (*). The impact of sequentially re-
ducing the number of features on grouping performance is also studied.

2.1 Data
ADNI1 focused on MCI subjects, for which follow ups were conducted biannually after
baseline visit for the first two years. Afterwards, follow ups were conducted annually.
Over a period of 3 years, 200 of the MCI patients were reported to have progressed to
AD (MCIp) whereas 100 of them remained stable at MCI (MCIs). The remaining 100
patients had unstable diagnosis. Group wise patient IDs used in this study were pro-
vided by [5]. From the ADNI website, neuropsychological test results of MCI patients
were downloaded. The sixteen neuropsychological biomarkers used in this study are
the responses of the tests listed in Table 1. For each feature, approximately 30% of the
patients had missed at least one follow up visit. It is worth noticing that the features
Geriatric Depression Scale (GDS) and Immediate Recall Total Score (LIMM) are con-
sidered 100% missing as they were recorded annually only; 6th and 18th month read-
ings were missing.
2.2 Missing Data Computation
Each of the j features were arranged in i x v matrix depicting i training instances, each
with v follow up visits. Following techniques were used for data enhancement through
completion:
 Case Deletion (CD):
Using this technique, the instances which have one or more missing value are removed
from the training set. As a result only those cases for which all follow up readings for
all features were recorded were retained for training.
 Case Ignorance (CI):
In this case, the missing values were ignored and all semi-known data was included in
the training set.
 Group Mean Replacement (GMR):
The unknown feature values were replaced with the mean of known values at that time
point.
Table 1. NM used and percentage of missing data
Feature
Test Performed Abbreviation Missing data/ %
Number
1 AD Assessment Scale – 11 items ADAS_11 30.71
2 AD Assessment Scale – 13 items ADAS_13 35.58
3 Rey Auditory Verbal Test –Trail 1 AVTOT1 30.34
4 Rey Auditory Verbal Test –Trail 2 AVTOT2 30.34
5 Rey Auditory Verbal Test –Trail 3 AVTOT3 30.34
6 Rey Auditory Verbal Test –Trail 4 AVTOT4 30.71
7 Rey Auditory Verbal Test –Trail 5 AVTOT5 30.71
8 Clock Drawing Test CLOCKSCOR 30.34
9 Clock Copy Test COPYSCOR 31.09
10 Functional Assessment Questionnaire FAQ 31.09
11 Geriatric Depression Scale GDS 100
12 Immediate Recall Total Score LIMM 100
13 Mini Mental State Examination MMSE 29.59
14 Neuropsychiatric Inventory Questionnaire NPIQ 29.21
15 Trail Making Test A TRAA 29.59
16 Trail Making Test B TRAB 37.45

 Last Observation Carried Forward (LOCF):


As ADNI reports the natural progression of MCI patients without any intervention, it
can be assumed that the patients retain previous scores for missed follow up visits. One
possible scheme for filling up unavailable values is by carrying forward the last known
value. For the cases, where baseline reading was missing, the next known value was
assigned to it.
 K Nearest Neighbor (KNN):
For each instance in the training set, every missing value is substituted by the value in
its ‘k’ nearest neighbor column. The nearest neighbor is defined as the instance having
most similar value on the previous visit. Similarity, in this case, is measured using ab-
solute difference between feature readings. Parameter ‘k’ is set to 1 in this scenario.
2.3 Experimental Setup
10-fold cross validation method is employed for performance investigation. To cumu-
latively represent longitudinal data, two summary measures are devised in this study.
Summary measures demonstrate the projection of longitudinal readings at baseline
(t=0) which are calculated according to (1) and (2):
∑𝑣𝑛=0 𝑖𝑛 (1)
𝑆𝑀1𝑗𝑖 =
𝑣
1 (2)
𝑆𝑀2𝑗𝑖 = (𝑖𝑣 + 𝑖𝑣+1 )
2 2 2

SM1ji and SM2ji respectively correspond to arithmetic mean and median value of
jth feature of ith instance over v follow up visits. The final data matrix is of size ixj. The
temporal factor of conversion is out of scope of this study.
Standard SVM with Radial Basis Function Kernel and soft margin set to 1 is used
for training and prediction. In order to study the impact of complete and incomplete
data on prediction performance, a total of eleven experiments are designed. First exper-
iment is restricted to using only the baseline feature readings for training. Whereas in
rest of the experiments, partially known feature sets are completed using the computa-
tion techniques and are coupled with the two summary measures.
Finally, for performance optimization and removal of redundant features, feature
selection is performed. Each summarized feature is ranked according to class separa-
bility criteria. Classifier independent feature evaluation is performed using Information
Gain measure which quantifies the reduction in entropy caused by partitioning the data
using that attribute. Higher the information gain, higher the rank. This selection pertains
to both training and test set. Lowest ranked features are deleted one by one. Final fea-
ture set consists of features resulting in maximum Area under Curve (AUC) of Receiver
Operating Characteristic (ROC) Curve.

3 Results and Discussion


From a total of 400 MCI patients initially recruited in ADNI, there were only 167 in-
stances of MCIp class and 100 members of MCIs class that had at least one reading for
all the j features. However, when CD was performed, the sample size reduced by ap-
proximately 50% as only 78 samples of MCIp and 67 cases of MCIs had complete
follow up readings for all biomarkers. Due to imbalanced dataset, performance quanti-
fication pivots on AUC instead of accuracy. Table 2 provides the results of prediction
without performing feature selection. It can be noticed that using only baseline readings
for conversion prediction produces low scores for both AUC and accuracy as compared
to longitudinal data. It is also visible that CD underperformed all other techniques in
terms of both accuracy and AUC and presented a high variance across validation folds.
Overall best performance (AUC: 74.43%, Accuracy: 68.16%) is clearly displayed by
GMR scheme using SM1 as a summary measure.
Table 3 demonstrates the prediction results after feature set purification. Table 3
(Row: 3-18, col: 2-12) reveal the feature ranks under each experiment. The top ranked
features required to provide best AUC are highlighted and corresponding count is men-
tioned in Row 19. N/A for LIMM and GDS exhibit absence of complete cases in CD,
GMR and KNN setups. Rey’s Auditory Verbal Test scores are pointed out as most
significant feature in all experiments. AUC reading achieved using the reduced feature
set are stated in Row 20 and its variance over 10 cross validation folds is mentioned in
Row 21. Last row of Table III exhibits the accuracy obtained by using the reduced
feature set. It can be seen that maximum AUC of 76.42% was displayed by LOCF-SM1
with a corresponding Accuracy of 71.16%, by using top 13 features only. LOCF-SM2,
CD-SM1 and CI-SM1 also deliver competitive AUC results but display high variance
in AUC values over validation folds and poor accuracy, hence indicating their instabil-
ity for AD prediction. However, overall best accuracy of 72.28% is displayed by em-
ploying baseline data only. This signifies that artificially generated data does introduce
minor impurities in longitudinal data. Even though, maximum accuracy is achieved by
using baseline data only, it cannot be concluded as the most appropriate one due to a
considerably lower and highly variable AUC reading. Resultantly, it can be established
that LOCF using SM1 as summary measure is a promising technique for MCIp vs.
MCIs identification.
Table 2. AUC, Accuracy (Acc) and their variance (var.) over CV folds, without feature selection
Technique AUC (%) AUC var. (%) Acc (%) Acc var. (%)
BL 70.44 0.63 66.29 0.12
CD-SM1 72.53 2.16 60.69 0.24
CD-SM2 68.13 2.48 58.62 0.11
CI-SM1 73.23 0.84 67.79 0.01
CI-SM2 72.63 0.87 67.42 0.04
GMR-SM1 74.43 0.90 68.16 0.04
GMR-SM2 73.90 0.86 67.04 0.02
LOCF-SM1 72.69 0.86 67.42 0.04
LOCF-SM2 73.90 1.00 67.04 0.04
KNN-SM1 69.89 1.03 67.42 0.03
KNN-SM2 72.05 0.91 67.04 0.07

Table 3. Performance results of feature selection


Feature BL CD CI GMR LOCF KNN
Number - SM1 SM2 SM1 SM2 SM1 SM2 SM1 SM2 SM1 SM2
1 7 10 10 10 10 10 10 10 10 10 10
2 6 6 6 6 6 6 7 6 7 7 7
3 5 7 7 7 7 7 6 7 6 6 6
4 1 2 2 2 2 2 2 2 2 2 2
5 15 11 1 5 1 5 5 5 1 5 1
6 4 1 11 1 5 11 1 1 5 1 5
7 14 5 5 13 12 1 11 13 13 11 11
8 3 4 4 12 13 4 4 12 12 4 4
9 13 14 14 4 4 3 8 4 4 8 8
10 12 8 8 8 16 8 3 8 8 9 14
11 8 N/A N/A 16 8 N/A N/A 16 16 N/A N/A
12 11 N/A N/A 9 15 N/A N/A 9 15 N/A N/A
13 9 3 3 3 3 14 14 3 3 14 13
14 2 13 12 15 9 13 13 15 14 13 3
15 10 12 13 14 14 12 12 14 9 3 9
16 16 9 9 11 11 9 9 11 11 12 12
# of feat. 9 11 8 8 13 14 4 13 14 2 8
AUC 74.44 76.09 73.73 75.67 73.81 74.43 73.95 76.42 76.09 70.16 73.98
AUC Var. 1.73 1.69 1.01 1.63 1.38 0.94 1.15 0.60 0.79 0.71 0.50
Accuracy 72.28 66.21 67.59 68.91 66.67 68.16 65.54 71.16 70.79 62.17 64.79
C 1: (R 2-17) Feature number corresponding to Table 1.C 2-12: (R 3-18) Feature ranks under each experiment, (R 19): Number of top ranked
features (feat.) required for max AUC , (R 20): AUC (%) achieved using selected features, (R 21): Variance (Var) of AUC (%) over 10 cross
validation folds, (R 22): Accuracy (%) corresponding to max AUC

Fig. 2 shows the ROC curves for some selected experiments. Fig 2(a) shows the
ROC curves for competing techniques in terms of AUC whereas Fig 2(b) shows the
resulting curves of feature selection under the best concluded technique (LOCF-SM1).
Fig. 2. ROC Curves, (a) competing techniques, (b) feature selection under LOCF-SM1 scheme

A brief accuracy comparison between the proposed and previously published tech-
niques applied on unimodal ADNI data is given in Table 4. Varying data modalities,
validation methods and follow up times make direct comparison difficult. Most relevant
comparison of our work is with Ewer et al. [14] and Casanova et al. [15] in terms of all
variables. Significant amount of improvement can be detected when complete, longitu-
dinal data are used. It can also be observed that by effective handling of missing longi-
tudinal data, NM alone are competitive with studies using more advanced and expen-
sive modality data like imaging (MRI).

Table 4. Accuracy (Acc) Comparison


Author Data Validation method Follow up (months) Acc (%)
Moradi et al.[5] MRI k fold cv 0 – 36 69.15
Cuingnet et al. [11] MRI Ind. Test set 0 – 18 67
Wolz et al. [12] MRI k fold cv 0 – 48 68
Ye et al. [13] MRI k fold cv 0 – 15 56.10
Ewer et al. [14] NM k fold cv 0 – 36 64.60
Casanova et al. [15] NM/ MRI k fold cv 0 – 36 65/ 62
Present work (LOCF-SM1) NM k fold cv 0 – 36 71.16

4 Conclusions and future work


In this paper, we presented a machine learning framework for prediction of AD in MCI
patients using longitudinal, neuropsychological scores from ADNI. Proper selection of
relevant neuropsychological biomarkers, effective substitution of missing feature read-
ings and a strong summary measure for the longitudinal data helps retain significant
differences between MCIp and MCIs. We aim to repeat this scheme with multimodal
data while considering longer follow up times to further enhance predictive power of
this system.

Acknowledgements. We would like to thank all investigators of ADNI listed at:


https://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowedge-
ment_List.pdf, for developing and making their data publically available.
References
1. Hebert, L. E., Weuve, J., Scherr, P.A., and Evans, D.A.:AD in the United States (2010–
2050) estimated using the 2010 census, Neurology, vol. 80(19), 1778-83 (2013)
2. Alzheimer’s Disease Neuroimaging Initiative http://adni.loni.ucs.edu, last accessed April
2015.
3. Asrami, F.F.: AD Classification using K-OPLS and MRI”, Masters’ Thesis, Department of
Biomedical Engineering, Linkoping University (2012)
4. Mattila, J., Koikkalainen, J., Virkki, A., Simonsen, A., van Gils, M., Waldemar G, Soininen,
H., Lötjönen, J, ADNI: A disease state fingerprint for evaluation of AD. J Alzheimer’s Dis-
ease 27, 163- 176 (2011)
5. Moradi, E., Pepe, A., Gaser, C., Huttunen, H., Tohk, J.: Machine learning framework for
early MRI-based Alzheimer’s conversion prediction in MCI subjects, NeuroImage, vol. 104
398-412, (2015)
6. Zhang, D., Shen, D.,: Predicting Future Clinical Changes of MCI Patients Using
Longitudinal and Multimodal Biomarkers,” Plos One, vol. 7(3) (2012)
7. Runtti, H., Mattila, J., van Gils, M., Koikkalainen, J., Soininen, H., Lötjönen, J.: Quantitative Eval-
uation of disease progression in a longitudinal mild cognitive impairment cohort”, J. of AD,
vol. 39(1), 49-61 (2014)
8. Sperling, R.A., Aisen, P.S., Beckett, L.A., Bennett, D.A., Craft, S., Fagan, A.M., Iwatsubo,
T., Jack, C.R. Jr, Kaye, J., Montine, T.J., Park, D.C., Reiman, E.M., Rowe, C.C., Siemers,
E., Stern, Y., Yaffe, K., Carrillo, M.C., Thies, B., Morrison-Bogorad, M., Wagster, M.V.,
Phelps, C.H.: Toward defining the preclinical stages of AD: recommendations from the Na-
tional Institute on Aging-Alzheimer’s Association workgroups on diagnostic guidelines for
Alzheimer’s disease. Alzheimers Dement 7, 280-292 (2011)
9. Albert, M.S., DeKosky, S.T., Dickson, D., Dubois, B., Feldman, H.H., Fox, N.C., Gamst,
A., Holtzman, D.M., Jagust, W.J., Petersen, R.C., Snyder, P.J., Carrillo, M.C., Thies, B.,
Phelps, C.H.: The diagnosis of mild cognitive impairment due to AD: recommendations
from the National Institute on Aging-Alzheimer’s Association workgroups on diagnostic
guidelines for Alzheimer’s disease. Alzheimers Dement 7, 270-279 (2011)
10. Lo, R.Y., Jagust, W. J.: Predicting missing biomarker data in a longitudinal study of AD,
Neurology, vol. 78(18), 1376-1382 (2012)
11. Cuingnet, R., Gerardin, E., Tessieras, J., Auzias, G., Lehéricy, S., Habert, M. O., Chupin,
M.: Automatic classification of patients with AD from structural MRI: a comparison of ten
methods using the ADNI database, Neuroimage, vol. 56(2), 766-781 (2011)
12. Wolz, R., Julkunen, V., Koikkalainen, J., Niskanen, E., Zhang, D.P., Rueckert, D., Soininen,
H., Lötjönen, J.: Multi-method analysis of MRI images in early diagnosis of AD, PlosOne,
vol. 6(10), p. 25446 (2011)
13. Ye, D.H., Pohl, K.M., Davatzikos, C.: Semi-supervised pattern classification: application to
structural MRI of AD. Pattern Recognition in NeuroImaging (PRNI), 2011 International
Workshop on. IEEE, pp. 1–4 (2011)
14. Ewers, M., Walsh, C., Trojanowskid, J. Q., Shawd, L. M., Petersene, R. C., Jack Jr., C. R.,
Feldmang, H.H., Bokdeh, A. L.W., Alexanderi, G. E., Scheltens, P., Vellas, B., Dubois, B.,
Weinera, M., Hampe, H.: Prediction of conversion from mild cognitive impairment to AD
dementia based upon biomarkers and neuropsychological test performance,” Neurobiology
of Ageing, vol. 33(7), 1203-1214 (2012)
15. Casanova, R., Hsu, F. C., Sink, K. M., Rapp, S. R., Williamson, J. D., Resnick, S. M.,
Espeland, M. A.: AD Risk Assessment Using Large-Scale Machine Learning Methods,
PlosOne, vol. 8(11) (2013)

You might also like