Professional Documents
Culture Documents
e-mail: sidra.minhas@ceme.nust.edu.pk
1 Introduction
Alzheimer’s Disease (AD) is the most common form of dementia diagnosed in elder
persons. AD results from neuro-degeneration which leads to cognitive failure and other
behavioral deficiencies in daily activities of patients. From 2000 – 2010, number of
deaths due to AD increased by 68%, whereas deaths caused by other diseases decreased
(Alzheimer’s Association, 2014). Hence, there is a dire need of developing systems for
early diagnosis of AD. For effective treatment, it is necessary to identify the disease
process at an early stage so that its progress can be halted or slowed down. Mild Cog-
nitive Impairment (MCI) is an early stage of deteriorating memory and intellectual
functionality. Though patients with MCI do not necessarily worsen, they are at high
risk of progressing to AD. To observe and monitor the people effected by or at risk of
being effected by AD, a longitudinal study named Alzheimer’s Disease Neuroimaging
Initiative (ADNI) [2] was launched in USA by National Institute of Aging (NIA), in
association with the National Institute of Biomedical Imaging and Bioengineering
(NIBIB), the Food and Drug Association (FDA) and several other pharmaceutical com-
panies in 2003.
MCI is a heterogeneous group consisting of instances which regain to normal cog-
nition, stay stable as MCI (MCIs) or progress to AD (MCIp). ADNI data, which con-
sists of serial Neuropsychological measure (NM), images (MRI, PET scans), biochem-
ical findings and genetic factors, has been used in numerous machine learning setups
to identify MCIp from within MCI subjects. Most of the studies have used multimodal
baseline data as input for learning disease outcome prediction [3, 4]. Moradi et al. [5]
used a semi-supervised machine learning framework using baseline imaging and NM
data. Matilla et al. [4] developed a statistical Disease State Index method to estimate
progression from healthy to disease on bases of various data sources. A good summary
of multimodal performance for AD prediction can be found in [5]. However, it is now
realized that for effective staging of the disease, it is useful to employ information con-
tained in time-sampled longitudinal data [6]. A recent implementation by Runtti et al.
[7] used longitudinal heterogeneous multimodal predictors to differentiate between
MCIp and MCIs. In view of such research findings, [8, 9] have published to use both
cognitive tests and biomarkers for AD staging.
Longitudinal studies, such as ADNI, are faced with the problem of missing data
which may be significant for slowly progressing diseases such as AD. Missing data in
these initiatives arise due to patients missing one or more follow up visits or completely
dropping out of the study. Consequently, a reduction in sample size is observed which
may bias the classifier performance. Lo et al [10] suggests that missing data in ADNI
is not Missing Completely at Random; hence, may contribute to further understanding
of AD progression. Generally, machine learning packages resort to immediately drop-
ping the instances with missing values. Our hypothesis is that by imputing missing data
values instead of instantaneous dropping it, and effectively utilizing longitudinal fea-
tures- important information differentiating MCIp from MCIs will be retained and su-
perior predictive performance will be delivered.
A number of studies exist utilizing multimodal (mostly baseline) biomarkers for
AD conversion prediction in machine learning framework [3, 4, 5, 6, 7]. The present
paper however employs unimodal longitudinal data for disease prediction. We select
only NM data modality due to its cheap and non-invasive nature as opposed to more
expensive and complicated methods of imaging and bio specimen sampling. We incor-
porate longitudinal data in our machine learning setup, and attempt to enhance their
prognostic power by effectively handling missing values. Using our proposed approach,
we foretell which of the MCI patients will progress to AD within 36 months of the
baseline visit and which of them will retain a stable diagnosis. Through extensive ex-
perimentation, we demonstrate the efficacy of longitudinal data while accounting for
missing values. The paper is organized in four sections. Section 2 describes the mate-
rials and proposed method. Experimental results are discussed in section 3, followed
by the conclusion in section 4.
feature’s relevance in MCIp vs. MCIs segregation (*). The impact of sequentially re-
ducing the number of features on grouping performance is also studied.
2.1 Data
ADNI1 focused on MCI subjects, for which follow ups were conducted biannually after
baseline visit for the first two years. Afterwards, follow ups were conducted annually.
Over a period of 3 years, 200 of the MCI patients were reported to have progressed to
AD (MCIp) whereas 100 of them remained stable at MCI (MCIs). The remaining 100
patients had unstable diagnosis. Group wise patient IDs used in this study were pro-
vided by [5]. From the ADNI website, neuropsychological test results of MCI patients
were downloaded. The sixteen neuropsychological biomarkers used in this study are
the responses of the tests listed in Table 1. For each feature, approximately 30% of the
patients had missed at least one follow up visit. It is worth noticing that the features
Geriatric Depression Scale (GDS) and Immediate Recall Total Score (LIMM) are con-
sidered 100% missing as they were recorded annually only; 6th and 18th month read-
ings were missing.
2.2 Missing Data Computation
Each of the j features were arranged in i x v matrix depicting i training instances, each
with v follow up visits. Following techniques were used for data enhancement through
completion:
Case Deletion (CD):
Using this technique, the instances which have one or more missing value are removed
from the training set. As a result only those cases for which all follow up readings for
all features were recorded were retained for training.
Case Ignorance (CI):
In this case, the missing values were ignored and all semi-known data was included in
the training set.
Group Mean Replacement (GMR):
The unknown feature values were replaced with the mean of known values at that time
point.
Table 1. NM used and percentage of missing data
Feature
Test Performed Abbreviation Missing data/ %
Number
1 AD Assessment Scale – 11 items ADAS_11 30.71
2 AD Assessment Scale – 13 items ADAS_13 35.58
3 Rey Auditory Verbal Test –Trail 1 AVTOT1 30.34
4 Rey Auditory Verbal Test –Trail 2 AVTOT2 30.34
5 Rey Auditory Verbal Test –Trail 3 AVTOT3 30.34
6 Rey Auditory Verbal Test –Trail 4 AVTOT4 30.71
7 Rey Auditory Verbal Test –Trail 5 AVTOT5 30.71
8 Clock Drawing Test CLOCKSCOR 30.34
9 Clock Copy Test COPYSCOR 31.09
10 Functional Assessment Questionnaire FAQ 31.09
11 Geriatric Depression Scale GDS 100
12 Immediate Recall Total Score LIMM 100
13 Mini Mental State Examination MMSE 29.59
14 Neuropsychiatric Inventory Questionnaire NPIQ 29.21
15 Trail Making Test A TRAA 29.59
16 Trail Making Test B TRAB 37.45
SM1ji and SM2ji respectively correspond to arithmetic mean and median value of
jth feature of ith instance over v follow up visits. The final data matrix is of size ixj. The
temporal factor of conversion is out of scope of this study.
Standard SVM with Radial Basis Function Kernel and soft margin set to 1 is used
for training and prediction. In order to study the impact of complete and incomplete
data on prediction performance, a total of eleven experiments are designed. First exper-
iment is restricted to using only the baseline feature readings for training. Whereas in
rest of the experiments, partially known feature sets are completed using the computa-
tion techniques and are coupled with the two summary measures.
Finally, for performance optimization and removal of redundant features, feature
selection is performed. Each summarized feature is ranked according to class separa-
bility criteria. Classifier independent feature evaluation is performed using Information
Gain measure which quantifies the reduction in entropy caused by partitioning the data
using that attribute. Higher the information gain, higher the rank. This selection pertains
to both training and test set. Lowest ranked features are deleted one by one. Final fea-
ture set consists of features resulting in maximum Area under Curve (AUC) of Receiver
Operating Characteristic (ROC) Curve.
Fig. 2 shows the ROC curves for some selected experiments. Fig 2(a) shows the
ROC curves for competing techniques in terms of AUC whereas Fig 2(b) shows the
resulting curves of feature selection under the best concluded technique (LOCF-SM1).
Fig. 2. ROC Curves, (a) competing techniques, (b) feature selection under LOCF-SM1 scheme
A brief accuracy comparison between the proposed and previously published tech-
niques applied on unimodal ADNI data is given in Table 4. Varying data modalities,
validation methods and follow up times make direct comparison difficult. Most relevant
comparison of our work is with Ewer et al. [14] and Casanova et al. [15] in terms of all
variables. Significant amount of improvement can be detected when complete, longitu-
dinal data are used. It can also be observed that by effective handling of missing longi-
tudinal data, NM alone are competitive with studies using more advanced and expen-
sive modality data like imaging (MRI).