You are on page 1of 18

Accepted Manuscript

Title: Looking for Alzheimer’s disease morphometric


signatures using machine learning techniques

Authors: Patricio Andres Donnelly-Kehoe Guido Orlando


Pascariello Juan Carlos Gomez, for the Alzheimers Disease
Neuroimaging Initiative

PII: S0165-0270(17)30401-6
DOI: https://doi.org/doi:10.1016/j.jneumeth.2017.11.013
Reference: NSM 7898

To appear in: Journal of Neuroscience Methods

Received date: 31-7-2017


Revised date: 17-11-2017
Accepted date: 19-11-2017

Please cite this article as: Patricio Andres Donnelly-Kehoe, Guido Orlando
Pascariello, Juan Carlos Gomez, <ce:text>for the Alzheimers Disease Neuroimaging
Initiative</ce:text>, Looking for Alzheimer’s disease morphometric signatures using
machine learning techniques, <![CDATA[Journal of Neuroscience Methods]]> (2017),
https://doi.org/10.1016/j.jneumeth.2017.11.013

This is a PDF file of an unedited manuscript that has been accepted for publication.
As a service to our customers we are providing this early version of the manuscript.
The manuscript will undergo copyediting, typesetting, and review of the resulting proof
before it is published in its final form. Please note that during the production process
errors may be discovered which could affect the content, and all legal disclaimers that
apply to the journal pertain.
Looking for Alzheimer’s Disease morphometric signatures using machine
learning techniques

Donnelly-Kehoe, Patricio AndresI,IIa , Pascariello, Guido OrlandoIa , Gomez, Juan Carlosa , for the
Alzheimers Disease Neuroimaging Initiative∗
27 de Febrero 210 bis, Rosario, Argentina.

t
a Multimedia Signal Processing Group - Neuroimage Division, French-Argentine International Center for Information and

ip
Systems Sciences (CIFASIS) - National Scientific and Technical Research Council (CONICET), Argentina

cr
Abstract

us
Background : We present our results in the International challenge for automated prediction of MCI from MRI
data. We evaluate the performance of MRI-based neuromorphometrics features (nMF) in the classification of
Healthy Controls (HC), Mild Cognitive Impairment (MCI), converters MCI (cMCI) and Alzheimer’s Disease
(AD) patients.

an
New methods: We propose to segregate participants in three groups according to Mini Mental State Ex-
amination score (MMSEs), searching for the main nMF in each group. Then we use them to develop a Multi
Classifier System (MCS). We compare the MCS against a single classifier scheme using both MMSEs+nMF
and nMF only. We repeat this comparison using three state-of-the-art classification algorithms.
Results: The MCS showed the best performance on both Accuracy and Area Under the Receiver Oper-
M
ating Curve (AUC) in comparison with single classifiers. The multiclass AUC for the MCS classification on
Test Dataset were 0.83 for HC, 0.76 for cMCI, 0.65 for MCI and 0.95 for AD. Furthermore, MCSs optimum
accuracy on Neurodegenerative Disease (ND) detection (AD+cMCI vs MCI+HC) was 81.0% (AUC = 0.88),
while the single classifiers got 71.3% (AUC = 0.86) and 63.1% (AUC = 0.79) for MMSEs+nMF and only
d

nMF respectively.
Comparison with existing method : The proposed MCS showed a better performance than using all nMF
te

into a single state-of-the-art classifier.


Conclusions: These findings suggest that using cognitive scoring, e.g. MMSEs, in the design of a Multi
Classifier System improves performance by allowing a better selection of MRI-based features.
p

Key words: Neuroscience, Machine learning, Alzheimer’s disease, Classification, Mild Cognitive
Impairment, Morphometric analysis, Structural MRI
ce

2010 MSC: 50-200, 70-800

Highlights
Ac

• We aimed to detect Alzheimer’s Disease related disorders independently of MRI setup.


• We used T1 MRI derived morphometric features and demographic information.
• We developed a method based on cognitive profile segregation.

• We found group specific patterns of anatomical changes.

I Both authors contributed equally to this work.


II Corresponding author. Tel.: +54 9341 2164799;
∗ Data used in preparation of this article were obtained from the Alzheimers Disease Neuroimaging Initiative (ADNI) database

(adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or
provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be
found at: http://adni.loni.usc.edu/wp-content/uploads/how to apply/ADNI Acknowledgement List.pdf
Email addresses: patricio.donnelly@gmail.com (Donnelly-Kehoe, Patricio AndresI,II )

Preprint submitted to Journal of Neuroscience Methods November 17, 2017


Page 1 of 17
• The best classifier exhibited an optimal detection of degenerative disorders.

t
ip
cr
us
an
M
d
p te
ce
Ac

2
Page 2 of 17
1. Introduction

Alzheimer’s Disease (AD) is a neurodegenerative disease that develops gradually and it is characterized
by the decline in memory and other cognitive functions, as well as behavioral changes (Albert et al., 2011).
Moreover, according to death certificates, in 2013 AD was the sixth leading cause of death in the United
5 States (Alzheimer’s Association, 2015). The diagnostic criteria for AD propose three stages of AD, con-
sisting on preclinical AD, mild cognitive impairment (MCI) due to AD and dementia due to AD. MCI is
an abnormal decline in cognitive functions that for the subject’s age and educational level does not meet

t
criteria for AD (Gauthier et al., 2006). Approximately 15% of adults older than 65 years old suffer from MCI

ip
and from these more than half progress to AD within 5 years (Farlow, 2009). Under this context and given
10 that among people with AD or another dementia fewer than half have been properly diagnosed (Alzheimer’s
Association, 2015), developing simple and economic methods to detect AD and converter MCI (cMCI) are

cr
research fields with a great influence in public health (Alzheimer’s Association, 2015).
Magnetic Resonance Imaging (MRI) combined with different post-processing techniques have demonstrated
to be a robust method to obtain brain morphometrics across subjects and across MRI machines. Particularly,
FreeSurfer has shown to be a powerful tool to reliably characterize brain volumes, areas, cortical thickness

us
15

and curvature through an automated pipeline that has no need for any user interaction (Dale et al., 1999;
Fischl and Dale, 2000; Fischl et al., 2001, 2002, 1999a,b, 2004b). Furthermore, numerous studies proved
repeatability and robustness between different acquisition parameters and clinical conditions (Fischl et al.,
2004a; Han et al., 2006; Jovicich et al., 2006; Reuter et al., 2010).

an
20 In order to successfully apply machine learning methods to a problem, there are three main components that
should be addressed, i. e. 1) a reliable and robust feature extraction method, 2) data availability and 3) algo-
rithms specially developed and tuned for the application. In the assessment of the brain structure, FreeSurfer
can be used as a reliable and robust feature extraction method and data availability is being constantly im-
M
proved by the growth of open databases that include different clinical conditions, such us Alzheimers Disease
25 Neuroimaging Initiative (ADNI) database (Weiner et al., 2015), NeuroVault.org (Gorgolewski et al., 2015),
openfMRI.org (Poldrack and Gorgolewski, 2015), AddNeuroMed (Simmons et al., 2011) and others (Marcus
et al., 2007; Wyman et al., 2013; Hodge et al., 2016). The research for suitable machine learning imple-
d

mentations to use quantitative neuroimaging data in clinical assessment is a field of increasing development.
In AD detection only, there is a vast number of works that analyze the usage of different approaches, in-
te

30 cluding grey matter morphology at voxel level combined with neuropsychological assessment (Moradi et al.,
2015), surface-based morphometry with ensemble classification (Iftikhar and Idris, 2016), multiple methods
of hippocampal segmentation and linear discrimination (Platero and Tobar, 2016), similarity detection with
p

random forest classifiers (Gray et al., 2013), among others (Ota et al., 2015; Schouten et al., 2016; Hojjati
et al., 2017).
This article was written in the context of the International challenge for automated prediction of
ce

35

MCI from MRI data, hosted in https://inclass.kaggle.com/c/mci-prediction and organized by


Alessia Sarica and collaborators from the IBFM, National Research Council, Catanzaro, Italy; in association
with the Alzheimer’s Disease Neuroimaging Initiative. The competition aim was to obtain reliable T1 MRI-
derived biomarkers that could distinguish between four categories, subjects with stable AD, individuals with
Ac

40 MCI who converted to AD (cMCI), individuals with MCI who did not convert to AD (MCI) and healthy
controls (HC). In this work we present our conclusions, analyzing which are the main features and also
testing different classification algorithms with morphometric information for a reliable detection.

2. Materials and Methods

2.1. Data Preprocessing


45 2.1.1. Competition data
Data Selection. The main goal of the competition was to evaluate how well classification algorithms could
learn from MRI data from different sites, protocols, scanners and technologies. As described in the compe-
tition’s web site, MRIs were selected from the ADNI database (adni.loni.usc.edu). The ADNI was launched
in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary
50 goal of ADNI has been to test whether serial MRI, positron emission tomography, other biological markers,
and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive

3
Page 3 of 17
impairment and early Alzheimers disease.
Competition organizers randomly and automatically selected subjects with a static seed by using the data
analytics platform Konstanz Information Miner (KNIME) (Sarica et al., 2014). Subjects from ADNI were
55 selected by filtering text files downloaded from the website. In particular, they used the file containing
the conversion of diagnosis (DXSUM PDXCONV ADNIALL.csv) for first choosing HC, AD patients and
Mild Cognitive Impairment patients who did not convert their diagnosis in the follow up. With the same
approach, they selected those MCI patients who converted to AD.
The second step was to select the visit ID of each subject at the baseline and to obtain demographic and

t
60 clinical parameters at that timepoint, i.e. Age, Gender and Mini-Mental Scale Examination score (MMSEs)

ip
Folstein et al. (1983).
This dataset was subsetted by diagnosis, in order to obtain a balanced number of subjects (100) for each
of the four classes (HC, MCI, cMCI and AD). The last step was to obtain the subjects’ MRI scan ID at

cr
the baseline from the file MPRAGEMETA.csv. In particular, they selected the first MPRAGE sequence (no
65 repetition), acquired at 3 Tesla.

us
Data preprocessing. Data was preprocessed by the competition organizers and it was composed by demo-
graphic (Age, Gender and MMSEs) and morphometric data of 400 subjects equally divided into the four
categories previously accounted for (HC, MCI, cMCI and AD). This set was then splitted into training (240,
unique ID TRAIN XXX) and test (160, unique ID TEST XXX) sets with a static seed.

an
70

Competition organizer assured with an ANOVA that no statistically significant differences existed in age, and
with a Chi squared test in gender, among classes in both sets. On the other hand, an ANOVA on MMSEs
showed that there are significant differences, and a t-test revealed that they exist between HC and AD, HC
and MCI, HC and cMCI, AD and MCI, AD and cMCI. No differences instead exist between converter and
M
75 non-converter MCI. The test set was inflated with 340 dummy subjects, for a total of 500 rows. The test set
was half splitted into public and private test set.
T1 MRIs were processed by Freesurfer (v5.3) with the standard pipeline (recon-all -hippo-subfields) on a
GNU/Linux Ubuntu 14.04 with 16 CPUs and 16Gb RAM. Organizers used the KNIME plugin K-Surfer
d

(Sarica et al., 2014) for extracting numerical data produced by Freesurfer into a table format. Then this
80 table was augmented with demographic data.
te

2.1.2. Checking and fixing the raw data


The feature space generated by the competition organizers consisted of 430 values for each subject which
p

includes cortical, subcortical and hippocampal subfields measurements. Demographic data consisting in
Age, Gender and MMSEs was included too. The organizers also provided a test dataset including the same
ce

85 features but without diagnosis label.


Freesurfer’s result files were previously read with a parser package (K-Surfer), the information was structured
as tables and saved in CSV format. We got the dataset as a text file with the encoding mentioned above.
A first visual inspection (consisting in feature vs age scatter plots) of the dataset showed a considerable
amount of outliers. Further examination revealed a bias on data likely due to an encoding programming
Ac

90 error. A thoughtful analysis of outliers using a k-nearest neighbors analysis for predicting most possible
values according to related features in the dataset, revealed that outliers in most of the features were due
to a bad parsing of the point character and therefore, they were ”virtually multiplied” by a factor of 1,000.
Hence, values higher than a threshold were divided by a factor of 1,000 in accordance with the scale of each
measurement.

95 2.1.3. Anatomic Normalization


To make measurements comparable between subjects, an anatomical normalization was performed. For
each subject in the dataset, all the volume quantifications were divided by the corresponding estimated
intracranial volume (eTIV) and area quantifications were divided by total area of the same hemisphere (both
provided also in the dataset). Neither thickness nor curvatures quantification needed to be anatomically
100 normalized.

4
Page 4 of 17
2.1.4. Enhanced features (expert knowledge)
The main change visually appreciated in MRI due to neurodegenerative pathologies is the shrink of
neural tissue. This is particularly reflected in the growth of ventricular cavity volume and the reduction
of gray matter volumes, primarily for the hippocampus. A recent study analyzed data from 1074 subjects
105 (AD=295, MCI=444 and controls =335) from both Unites States and Europe populations and they showed
that AD was characterized by the same pattern of atrophy independently of the population, being the most
atrophic the hippocampal structures and surrounding grey structures and being the most enlarged structure
the inferior lateral ventricle (Westman et al., 2011, 2012).

t
Based on these findings, we decided to generate a set of new features, dividing each hippocampal subfield

ip
110 volume by the same-hemisphere’s inferior lateral ventricle volume. This new set of features was added to
the dataset using the prefix e-, being the abbreviation of enhanced.

cr
2.1.5. Numerical Normalization
Most machine learning techniques generate a model from the relationship between features in the data.
Depending on how each technique works, it can be sensitive to feature scaling. To avoid this influence, each

us
115 feature in the data was normalized to a zero mean and a unit variance.

2.2. Assessing Cognitive Bias


As the MMSEs is the gold standard for a quick assessment of the cognitive status Tombaugh and McIn-

an
tyre (1992), it was needed to know if the MMSEs induces a bias over classification. We performed two
Progressive Feature Elimination (PFE) analyses –using MRI-based features with and without the MMSEs–,
120 in which we optimized the accuracy of a Random Forest Classifier (RFC) ranging the number of features
from all to a single in accordance with their importance ranking. RFC quantifies a feature’s importance
M
depending on how much the average Gini impurity index decreases in the forest due to its use as node in a
tree (Breiman, 2001), so we used this score to progressively eliminate features by removing the feature with
the lowest importance at each iteration. In this procedure we used 80% of the training dataset, split into 10
125 folds, in order to train 10 RFC and select the best on a cross validation process. We then used the feature
d
importance of the selected classifier in the PFE analysis, in which we trained a new RFC at each iteration
over the selected classifier’s train fold and tested over remaining 20% of the data (validation set).
These analyses were performed with the RFC implemented in scikit-learn python’s package (Zhu et al., 2009),
te

with an optimal number trees (1000) —which was determined by an optimizationpprocedure using iterative
130 cross validation— and the recommended number of features (P ) in each split (P = number of f eatures(train)).
p

2.3. Cognitive segregation


ce

We used a Decision Tree with Gini Impurity criterion (from scikit-learn python’s package (Zhu et al.,
2009)) to set the best values of MMSEs to split the training dataset. Two MMSEs threshold were taken
135 which split the data into three groups as Table 1 summarizes.
Ac

Congnitive Group MMSEs range Mean Age Age Std HC MCI cMCI AD Total

Full MMSEs 19 - 30 73.06 7.02 60 60 60 60 240

Low (L) MMSEs 19 - 23.5 75.06 8.31 0 0 0 29 29


Middle (M) MMSEs 23.5 - 26.5 74.09 6.82 2 8 27 31 68
High (H) MMSEs 26.5 - 30 72.16 6.73 58 52 33 0 143
Table 1: Cognitive Groups on Training Dataset

5
Page 5 of 17
2.4. Feature Selection Analysis
We performed two PFE analysis like in Section 2.2, but this time on Middle MMSEs (M-MMSEs) and
High MMSEs (H-MMSEs) groups separately and we used a Leave One Out Cross Validation (LOOCV)
strategy due to the small number of subjects in each group. In each iteration, one subject was selected as
140 validation and a RFC (1000 trees) was trained over the rest. As a result, we got the estimated accuracy at
different numbers of features and the importance distribution at each LOOCV.

2.5. Multi Classifier System

t
Once the number of features for groups M-MMSEs and H-MMSEs were selected, we designed a multi

ip
classifier system (MCS) composed by three sub-classifiers as shown in Figure 1. The sub-classifiers C#1 and
145 C#2 were trained on M-MMSEs and H-MMSEs group respectively.The C#0 sub-classifier uses the same
number of features than C#1 and it was trained on the join of Low MMSEs (L-MMSEs) and M-MMSEs

cr
groups relabeling the results as AD or N OT AD according to the classification. This extra classifier has the
purpose of supporting AD classification of all patients whose MMSEs is lower than 23.5.
Additionally, each sub-classifier was compose by two internal classifiers, being the first one always a RFC

us
150 that serve us to automatically select the first K features, where K was the optimal number of features for
each specific classification problem (determined in Section 2.4). Then a second classifier was set using only
the K selected features. This last one could use any of the existing classification algorithms.

an
M
d
p te
ce

Figure 1: Multi Classifier System. It was implemented using three sub-classifiers which are trained on different subsets of the
Ac

data depending on the MMSEs.

2.6. Assessing different machine learning techniques


We assessed three different classification algorithms: Random Forest Classifier (Breiman, 2001; Breiman
155 et al., 1984), Support Vector Machine Classifier (SVM) (Vapnik, 2013) with radial basis function kernel
(Shawe-Taylor and Cristianini, 2004; Scholkopf and Smola, 2001) and Ada-Boost (AB) Classifier (Freund
et al., 1996; Schapire and Singer, 1998; Freund et al., 1999) (implemented in scikit-learn python’s package
(Zhu et al., 2009)). We implemented three architectures for each:
1. Mixing all features, i.e. demographics, MMSEs and MRI-based features. We obtained RF#1, SVM#1
160 and AB#1 classifiers.
2. All features but MMSEs. We obtained RF#2, SVM#2 and AB#2 classifiers.
3. Multi Classifier System explained in Section 2.5. We obtained RF#3, SVM#3 and AB#3 classifiers.

6
Page 6 of 17
All the parameters for each classifier were previously optimized using individual cross validation analyses.
Each test was repeated 300 times, selecting a training set of randomly chosen 80% of the dataset and
165 validating on the remained 20%. The automatic feature selection process was performed independently in
each of the 300 iterations, using the training set.
Additionally, once the competition was over, the organizers shared the full list of test subjects with diagnostic,
indicating whether each one was real or dummy data. We then completed the test by making 300 iterations,
training with all the train dataset and testing on all the test dataset.

t
170 2.7. Receiver operating characteristic curve

ip
Using the probabilities of each class returned by the classifiers we generated two Receiver Operating
Characteristic (ROC) analysis as described in (Bradley, 1997; Fawcett, 2006). The first one was a multi-
class ROC analysis generated by considering only the probability returned for each class independently and

cr
grouping the other classes as one. For the second ROC analysis we assessed the classifier’s capacity to
175 classify whether or not the participant has a Neurodegenerative Disease (ND), i.e. if the patient suffers from
cMCI or AD, without making any distinction. Both analysis were made over training and testing dataset

us
separately.
Using the information provided by ROC curves we compute the Optimum Test Accuracy. We define this
metric as the test accuracy corresponding to the ROC point which maximizes the Youden’s J statistic
180 (Youden, 1950) on validation set.

3. Results

3.1. Cognitive bias


an
M
As expected, the MMSEs was by far the most important feature in the classification (Figure 2).Classi-
fication accuracy (on validation) without MMSEs started near 29% for one feature, i.e. 4% more than the
185 random accuracy, to then slowly increase until getting to a maximum accuracy of 43% at an optimal number
of features of 100. However, classification including the MMSEs showed a starting accuracy (on validation)
d

with only one feature (the MMSEs) of 46%, and then accuracy increased to a maximum of 56% at an opti-
mal number of features of 70. When MMSEs was used in the classification, it reached an importance three
te

times bigger than morphometric features and its remotion did not changed the importance of the remaining
190 features. This analysis allowed us to quantify and analyze the influence and utility of the MMSEs on the
final classification and also reveled that the importance of morphological features (in order and magnitude)
p

were conserved in both cases.


ce
Ac

Figure 2: Cognitive Bias Analysis. The influence in the accuracy for different number of features (right side) and in the
importance of features (left side) are shown for classification with and without the MMSEs as part of the feature set.

7
Page 7 of 17
3.2. Important morphometric features in the assessment of AD
As the previous Section shows, morphological features for themselves did not reach a good accuracy. The
195 maximum accuracy achieved using these did not even reached the one using the MMSEs alone. To overcome
this pitfall we implemented a cognitive profile dependent analysis as described in Sections 2.3 and 2.4, aiming
at obtaining a better classification using only morphological features.
In Figure 3 is shown a visual summary of the main findings about the importance of features in the M-
MMSEs group. Interestingly, most important features were related with volume and regularity of the brain
cortex. Bilateral rostral anterior cingulate cortex showed to play a crucial role in the differentiation in this

t
200

group. Standard deviation and curvature in frontal and temporal cortical regions also proved to be important

ip
features.
In the same way, Figure 4 shows importance of features for group H-MMSEs, where it can be noted that, in
contrast to group M-MMSEs, most decisive features are hippocampal subfields or nearby regions. Also, it

cr
205 should be noticed that enhanced features from Section 2.1.4 were highly ranked.

us
an
M
d
p te
ce

Figure 3: Feature Analysis in M-MMSEs group. In the left subplot it is summarized the median importance in the LOOCV
analysis, in the upper right it is shown the position distribution of each feature according to the order shown in the previously
described subplot, the red horizontal line that shows the optimal number of features (K) for the classification (accuracy ≈ 45%),
Ac

and in the bottom right it is depicted the number of times each variable is included in the selected K features, showing the
stability of each feature in the final classification.

3.3. Classifiers comparison


In Table 2 is shown the cognitive segregation on testing dataset, in which data is broken down by type,
e.i. real or dummy. The distribution into classes of the real test data split into three groups using the
MMSEs threshold selected on train dataset is similar to distribution shown in Table 1. However dummy
210 case are almost evenly distributed over four classes.

The comparison between different classifiers, changing both, architecture and classification algorithm is
shown in Figure 5 and summarized in Table 3. It is interesting to see that, although almost 10% in accuracy
(on validation) was lost in all the types of algorithms due to MMSEs detachment, in the MCS the accuracy
215 was slightly higher. In all cases accuracy on validation set was good estimation of the accuracy on real test
data. Finally, it should be noticed that competition feedback on the test accuracy (real + dummy) was

8
Page 8 of 17
t
ip
cr
us
an
Figure 4: Feature Analysis in H-MMSEs group. In the left subplot it is summarized the median importance in the LOOCV
M
analysis, in the upper right it is shown the position distribution of each feature according to the order shown in the previously
described subplot, the red horizontal line that shows the optimal number of features (K) for the classification (accuracy ≈ 56%),
and in the bottom right it is depicted the number of times each variable is included in the selected K features, showing the
stability of each feature in the final classification.
d
Congnitive Group MMSEs range Mean Age Age Std HC MCI cMCI AD Total

Full MMSEs 18.1 - 32.1 73.03/72.41 7.16/7.83 40/77 40/89 40/84 40/90 160/340
te

Low (L) MMSEs 19 - 23.5 74.39/71.45 8.24/8.84 0/13 1/14 0/9 25/13 26/49
Middle (M) MMSEs 23.5 - 26.5 73.15/72.78 7.52/8.00 1/12 9/19 12/20 15/20 37/71
p

High (H) MMSEs 26.5 - 30 72.62/72.51 6.73/7.56 39/52 30/56 28/55 0/57 97/220
ce

Table 2: Cognitive Groups on Testing Dataset (Real/Dummy)

highly biased by dummy data.


Ac

With respect to the confusion between classes, Figure 6 shows the confusion matrices built by using
220 the results from 300 iterations of train and test using Random Forest Classifier in the three architectures
described in Section 2.5 (RF#1, RF#2, RF#3) architectures and the confusion matrix for the best classifier
selected within the competition (RF#3 architecture). Although RF#1 architecture has a good overall
performance, confusions are almost randomly distributed between degenerative and non-degenerative states.
When the MMSEs information is taken out from the decision, accuracy decreases mainly by the lost of
225 distinction between AD and cMCI (as shown in RF#2 confusion matrix). Finally, by using RF#3 accuracy
is recovered and the pattern of confusions becomes more polarized. For example, in the case of AD patients
false negatives, it gets concentrated in cMCI, which is a neurodegenerative condition. Furthermore with
RF#3, confusions for HC subjects become more concentrated in MCI, which is a non-degenerative state.

3.4. ROC curves


230 In Table 3 we show the AUC for ROC curves obtained as described in Section 2.7. Figure 7 shows
multi-class ROC curves generated using the RF#3 classifier with the best validation accuracy over the 300

9
Page 9 of 17
t
ip
cr
us
an
M
d
te

Figure 5: Accuracy comparison for each classifier. In the title of each subplot it is indicated the classifier used and the number
references the architecture used. Number #1 is the classifier using all features (including MMSEs), #2 are classifiers that did
p

not use the MMSEs and #3 corresponds to the classifier’s architecture described in Section 2.5. From left to right, the first
box indicates accuracies obtained in the validation, while the second one is the score estimated by training with all the training
data and tested with the whole test group (real+dummy data), after the competition closed. Finally, third and fourth boxes
ce

represent the same results but discriminating scores on real and dummy data correspondingly.

iterations (Section 2.6), which had an accuracy of 54.3% in the real test data. A good matching between
Ac

validation and testing can be observed for HC, cMCI and AD, while MCI appears to be overestimated. It
is interesting to highlight that AD detection had the better AUC score, while the detection of MCI had the
235 worst performance.
On top of Figure 8 are shown validation and test ROC curves for the detection of ND, e.i. without
differentiating cMCI and AD, nor HC and MCI, for the best RF#1, RF#2 and RF#3 classifiers (as in
Figure 7). On the bottom of Figure 8 are shown the test accuracies at each point of the ROC curves and in
which it is highlighted the Optimum Test accuracy. These Optimum Test accuracies for ND classification of
240 71.3% for RF#1, 63.1% for RF#2 and 81.0% for RF#3 are also summarized in Table 3.
We also compared Specificity (1-false positive rate) and Sensibility (true positive rate) at the the point of
optimal accuracy. RF#3 exhibited a Specificity of 87% and Sensitivity of 76.0% while RF#1 have a Speci-
ficity of 90.9% and a Sensibility of only 55.3%. The classification using only MRI-based features (RF#2)
showed a Specificity of 90% and a Sensibility of 41.5%.
245

10
Page 10 of 17
Performance Metric RF#1 RF#2 RF#3 SVM#1 SVM#2 SVM#3 AB#1 AB#2 AB#3

Mean accuracy on Validation Set (std) 0.495 (0.06) 0.394 (0.06) 0.512 (0.06) 0.474 (0.06) 0.407(0.06) 0.533 (0.06) 0.481 (0.07) 0.330 (0.06) 0.443 (0.07)
Accuracy on Full Test Set 0.352 0.312 0.332 0.282 0.294 0.330 0.326 0.266 0.350
Accuracy on Real Test Set 0.531 0.369 0.531 0.500 0.413 0.531 0.506 0.294 0.500
Accuracy on Dummy Test Set 0.268 0.285 0.238 0.179 0.238 0.235 0.241 0.253 0.279

Mean HC AUC on Validation Set (std) 0.812 (0.06) 0.791 (0.06) 0.806 (0.05) 0.818 (0.06) 0.798 (0.06) 0.824 (0.06) 0.640 (0.07) 0.590 (0.08) 0.624 (0.08)
HC AUC on Real Test Set 0.786 0.721 0.830 0.774 0.742 0.824 0.671 0.508 0.758

Mean MCI AUC on Validation Set (std) 0.667 (0.08) 0.636 (0.08) 0.704 (0.07) 0.669 (0.08) 0.655 (0.07) 0.717 (0.07) 0.564 (0.08) 0.514 (0.07) 0.557 (0.08)
MCI AUC on Real Test Set 0.643 0.603 0.629 0.659 0.658 0.644 0.583 0.471 0.558

t
Mean cMCI AUC on Validation Set (std) 0.708 (0.08) 0.649 (0.08) 0.728 (0.08) 0.692 (0.08) 0.646 (0.08) 0.773 (0.07) 0.626 (0.08) 0.526 (0.08) 0.612 (0.08)
cMCI AUC on Real Test Set 0.767 0.651 0.750 0.783 0.666 0.761 0.617 0.567 0.625

ip
Mean AD AUC on Validation Set (std) 0.893 (0.05) 0.787 (0.06) 0.922 (0.04) 0.865 (0.05) 0.795 (0.06) 0.947 (0.02) 0.796 (0.07) 0.594 (0.08) 0.729 (0.08)
AD AUC on Real Test Set 0.913 0.755 0.945 0.877 0.788 0.960 0.813 0.571 0.725

Mean ND AUC on Validation Set (std) 0.887 (0.05) 0.858 (0.05) 0.897 (0.05) 0.887 (0.05) 0.869 (0.05) 0.913 (0.04) 0.788 (0.06) 0.672 (0.07) 0.765 (0.06)

cr
ND AUC on Real Test Set 0.860 0.794 0.880 0.858 0.829 0.883 0.744 0.656 0.700

Optimum Test accuracy on ND 0.713 0.631 0.810 0.755 0.728 0.720 0.720 0.562 0.662

Table 3: Performance comparison for each classifier and architecture.

us
4. Discussion and conclusion

an
In this paper we presented an in-depth analysis on the use of machine learning techniques in the detec-
tion of AD and related states by using only MRI structural information regardless of used MRI protocol or
MRI machine. Under the context of an international machine learning competition we searched for the best
250 possible classification, assessing the most useful features, classifier algorithms and architecture, confusion
patterns and ROC curves profiles.
M
Considering that the MMSEs was the cognitive metric present in the dataset, cognitive data was limited.
Although the limitations and potentialities of the MMSEs have already been widely studied (Tombaugh and
McIntyre, 1992; Lancu and Olmer, 2006), in this paper we showed the impact on discarding this score from
255 the data, which induces a loss in accuracy of approximately 10%. Despite of this, the importance of the
d

main morphometric features with or without the MMSEs was conserved when using all the subjects as one
single group.
te

To eliminate the MMSEs from the classification process without losing its information, we decided to segre-
gate the train data according to the participants’ cognitive profile. This approach allowed us to recover the
260 accuracy lost by cutting off the MMSEs and to find distinctive main features for each group.
p

This segregation allowed us to build the MCS presented in Section 2.5, with which we explored which brain
regions and measurements were useful to detect different classes. As mentioned above M-MMSEs group was
ce

classified mostly using cortical regions, more specifically frontal and temporal regions. Top ranked features
were bilateral rostral anterior cingulate cortex volume, left lateral orbito-frontal thickness, left parsorbitalis
265 average mean curvature (AMC), right transverse temporal cortex volume, right parahipocampal AMC, left
fusiform AMC, and others that are listed in Figure 3. Interestingly, average mean curvature and mean
Ac

standard deviation in frontal and temporal cortical regions seems to be good predictor of AD progression
in advanced stages of the disease, i.e. in the transition from cMCI to AD. Few non-cortical brain regions
were present in the first 30 features for M-MMSEs group, the ones included were bilateral cerebellar white
270 matter volume, CSF (as an indication of global atrophy) and only two hippocampal subfields (in contrast to
the H-MMSEs group).
Because previous AD studies using FreeSurfer’s morphometry had reported mainly changes in volume and
cortical thickness Westman et al. (2011), we did not expect that cortical thickness standard deviation and
mean curvature to be important features. Nevertheless, they showed to be informative about the late stages
275 in the neurodegenerative disease progression. We think that in late stages of AD, or in the change from
cMCI to AD, the regularity on the cortical thickness is lost and therefore cortical regions specifically affected
by the neurodegenerative process change their mean curvature and thickness standard deviation. This find-
ing reveals that this metrics of cortical regions should not be discarded arbitrarily before applying machine
learning techniques, as they may contribute with relevant information about neuropathological processes.
280 Important features in the H-MMSEs group were fewer and in agreement with those previously described in
bibliography (Klöppel et al., 2008; Gray et al., 2013; Platero and Tobar, 2016). Additionally, in this work we

11
Page 11 of 17
t
ip
cr
us
an
M
d
p te

Figure 6: Percentage confusion matrices for different classification architectures using RFC. In the top left is shown the confusion
matrix for the best submission within the competition using RF#3 classifier. The others confusion matrices were built by using
the results from 300 iterations of train and test with different architectures, i.e. RF#1 (top right), RF#2 (bottom left) and
ce

RF#3 (bottom right). Colors encode the percentage accuracy.


Ac

Figure 7: Multi-class ROC curves for HC (orange), MCI (pink), cMCI (blue) and AD (green) generated using the best RF#3
classifier out of 300 (Section 2.6). Dashed lines show curves obtained in the validation, while dashed lines show curves obtained
using the real test dataset. In the legend each curve is associated with its AUC.

proved the utility of FreeSurfer’s hippocampal subfields in AD/MCI detection. Most important features in
H-MMSEs were hippocampal subfields, enhanced features and hippocampal nearby cortical regions (bilat-

12
Page 12 of 17
t
ip
cr
us
an
M
Figure 8: ROC curves and Test Accuracy for ND classification for RF#1, RF#2 and RF#3 (from left to right). In upper plots,
solid lines show curves obtained from test data, while the dashed lines represent results obtained in validation. Each curve
is associated with its AUC. On the bottom, test accuracy is depicted for each point on the ROC curves. Dots highlight the
Optimum Test accuracy.
d
te

eral entorhinal cortex). As previously described by Westman and collaborators, the expansion of the inferior
285 lateral ventricles was found to be a good feature for the detection of AD related progression (Westman et al.,
2011, 2012) and therefore the most enhanced features had a predominant role in the detection of neurode-
p

generative disease for the H-MMSEs group.


Accuracies from nine different classifiers were compared using validation and test scores. As mentioned
before, omitting the MMSEs from the features provoked a significant lost in accuracy but by using the MCS
ce

290 this lost was recovered or even improved (Figure 5 and Table 3). Although accuracy of RF#1 and RF#3
were found similar, when confusion matrices are analyzed (Figure 6) it is possible to note that the mistakes
made by RF#1 are more uniformly distributed between degenerative and non-degenerative states, while
in RF#3 mistakes are more clustered into these two states. This shows that a more morphological driven
Ac

characterization of the disease can identify degenerative processes in a more accurate way.
295 Regarding the comparison between classification algorithms, Random Forest Classifiers and Support Vec-
tor Machines exhibited a similar level of accuracy, while Ada-Boost algorithms showed an increased error.
Given that Random Forest Classifiers have a simpler tuning (fewer parameters), it was chosen as the best
algorithm.
The multi classes ROC curve analysis shown better performance in AD detection and in the identification
300 of control subjects. The detection of MCI patients showed the worst performance of all. ROC curves and
confusion matrices altogether showed that MCI and cMCI are difficult conditions to detect accurately by a
single measurement. Previous works performed by other groups found that this could not be solved by using
multimodal information (Gray et al., 2013), but that can be improved by using longitudinal information
(Hinrichs et al., 2011; Zhang et al., 2012).
305 As for the ND capability of each architecture, binary ROC curves from Figure 8 confirmed RF#3 to be
better than RF#1 and RF#2. From the clinical point of view these findings represent a mayor improvement
given that not only the Optimum Test accuracy from classifier RF#3 is ≈10% greater than RF#1 (81.0%

13
Page 13 of 17
vs. 71.3%) but also there was an important difference in Sensitivity (76% vs. 55%). Classification using
only MRI-based features showed the worst performance with an accuracy of 63.1% (only 13.1% above the
310 random classification) and a Sensitivity of 41.5%.
As previously reported in another work from Gray and collaborators, there is little usage of random forest
classifiers in the context of brain imaging (Gray et al., 2013). In the present work RFC allowed us to build a
data-driven procedure where feature selection for classifiers was not made by an expert but in a completely
automated way. This processing design enables the classification system to be adaptive because as more
315 train data is added to the process, then the classifier can be re-trained, updating its knowledge about main

t
features.

ip
A direct comparison of accuracy and confusion profiles with existing works could not be performed due to
the inclusion of different subjects, MRI machines and modalities, as well as the use of different methods
for feature extraction and cross-validation. Also we found that most articles in the field classify between

cr
320 two classes at each time, i.e. AD vs MCI, AD vs cMCI, MCI vs cMCI, while we classified between four
classes. Notably, no usage of confusion matrices was found in previous works (Klöppel et al., 2008; Cuingnet
et al., 2011; Davatzikos et al., 2011; Gray et al., 2013; Ota et al., 2015; Iftikhar and Idris, 2016). For these

us
reasons, we believe that the International challenge for automated prediction of MCI from MRI
data represented a mayor achievement to obtain robust measures of comparison between different machine
325 learning approaches in the detection of AD from MRI structural data.
Nevertheless, we think that there were few remarks that could improve the findings within this context. In

an
first place, data was buggy because of the aforementioned parser problem. This corruption in the data took
a considerable time of analysis before it could be rectified. Secondly, the inclusion of dummy data in the test
score calculated while the competition was taking place made it very difficult to have a performance metric
330 of the progress in the accuracy and in the ranking position. We believe that the accuracy estimated using a
randomly chosen subset of real data may have been a more useful score.
M
In conclusion, we compared different automatic classification methods to assist in the early diagnosis of
Alzheimer’s disease using a dataset provided by the International challenge for automated prediction
of MCI from MRI data. We changing both classification algorithms and implementation architecture.
335 As our aim was not to analyze binary classification but to consider the outcome of assessing between all
d

possibles health/disease states related with AD, we analyzed a multi-class classification using four classes
(HC, MCI, cMCI and AD). We found that a cognitive profile dependent MCS developed using random forest
te

classifiers obtained the best results, specially in detecting AD and HC patients, and also in detecting whether
a neurodegenerative process was involved.
p

340 Conflict of interest


ce

Hereby, we declare that the research was conducted in the absence of any commercial or financial rela-
tionships that could be construed as a potential conflict of interest.

Acknowledgements
Ac

We would like to acknowledge PhD. Pablo Granitto for his help with machine learning topics. This
345 study was made possible thanks to funding from the Argentina’s National Scientific and Technical Research
Council (CONICET). Data collection and sharing for this project was funded by the Alzheimer’s Disease Neu-
roimaging Initiative (ADNI; Principal Investigator: Michael Weiner; NIH grant U01 AG024904). ADNI is
funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering
(NIBIB), and through generous contributions from the following: Pfizer Inc., Wyeth Research, Bristol-Myers
350 Squibb, Eli Lilly and Company, GlaxoSmithKline, Merck & Co. Inc., AstraZeneca AB, Novartis Pharma-
ceuticals Corporation, Alzheimer’s Association, Eisai Global Clinical Development, Elan Corporation plc,
Forest Laboratories, and the Institute for the Study of Aging, with participation from the U.S. Food and
Drug Administration. Industry partnerships are coordinated through the Foundation for the National Insti-
tutes of Health. The grantee organization is the Northern California Institute for Research and Education,
355 and the study is coordinated by the Alzheimer’s Disease Cooperative Study at the University of California,
San Diego. ADNI data are disseminated by the Laboratory of Neuro Imaging at the University of California.

14
Page 14 of 17
References

References

Albert, M. S., DeKosky, S. T., Dickson, D., Dubois, B., Feldman, H. H., Fox, N. C., Gamst, A., Holtzman,
360 D. M., Jagust, W. J., Petersen, R. C., Snyder, P. J., Carrillo, M. C., Thies, B., Phelps, C. H., May
2011. The diagnosis of mild cognitive impairment due to alzheimer’s disease: Recommendations from the
national institute on Aging-Alzheimer’s association workgroups on diagnostic guidelines for alzheimer’s
disease. Alzheimer’s & Dementia 7 (3), 270–279.

t
URL http://dx.doi.org/10.1016/j.jalz.2011.03.008

ip
365 Alzheimer’s Association, Mar. 2015. 2015 alzheimer’s disease facts and figures. Alzheimer’s & dementia : the
journal of the Alzheimer’s Association 11 (3), 332–384.

cr
URL http://view.ncbi.nlm.nih.gov/pubmed/25984581
Bradley, A. P., 1997. The use of the area under the roc curve in the evaluation of machine learning algorithms.
Pattern recognition 30 (7), 1145–1159.

us
370 Breiman, L., 2001. Random forests. Machine learning 45 (1), 5–32.
Breiman, L., Friedman, J., Stone, C. J., Olshen, R. A., 1984. Classification and regression trees. CRC press.

an
Cuingnet, R., Gerardin, E., Tessieras, J., Auzias, G., Lehéricy, S., Habert, M.-O., Chupin, M., Benali, H.,
Colliot, O., Initiative, A. D. N., et al., 2011. Automatic classification of patients with alzheimer’s disease
from structural mri: a comparison of ten methods using the adni database. neuroimage 56 (2), 766–781.
M
375 Dale, A. M., Fischl, B., Sereno, M. I., 1999. Cortical surface-based analysis: I. segmentation and surface
reconstruction. Neuroimage 9 (2), 179–194.
Davatzikos, C., Bhatt, P., Shaw, L. M., Batmanghelich, K. N., Trojanowski, J. Q., 2011. Prediction of mci to
ad conversion, via mri, csf biomarkers, and pattern classification. Neurobiology of aging 32 (12), 2322–e19.
d

Farlow, M. R., 2009. Treatment of mild cognitive impairment (MCI). Current Alzheimer Research 6 (4),
362–367.
te

380

URL http://dx.doi.org/10.2174/156720509788929282
Fawcett, T., 2006. An introduction to roc analysis. Pattern recognition letters 27 (8), 861–874.
p

Fischl, B., Dale, A. M., 2000. Measuring the thickness of the human cerebral cortex from magnetic resonance
images. Proceedings of the National Academy of Sciences 97 (20), 11050–11055.
ce

385 Fischl, B., Liu, A., Dale, A. M., 2001. Automated manifold surgery: constructing geometrically accurate and
topologically correct models of the human cerebral cortex. IEEE transactions on medical imaging 20 (1),
70–80.
Ac

Fischl, B., Salat, D. H., Busa, E., Albert, M., Dieterich, M., Haselgrove, C., Van Der Kouwe, A., Killiany, R.,
Kennedy, D., Klaveness, S., et al., 2002. Whole brain segmentation: automated labeling of neuroanatomical
390 structures in the human brain. Neuron 33 (3), 341–355.
Fischl, B., Salat, D. H., van der Kouwe, A. J., Makris, N., Ségonne, F., Quinn, B. T., Dale, A. M., 2004a.
Sequence-independent segmentation of magnetic resonance images. Neuroimage 23, S69–S84.
Fischl, B., Sereno, M. I., Dale, A. M., 1999a. Cortical surface-based analysis: Ii: inflation, flattening, and a
surface-based coordinate system. Neuroimage 9 (2), 195–207.
395 Fischl, B., Sereno, M. I., Tootell, R. B., Dale, A. M., et al., 1999b. High-resolution intersubject averaging
and a coordinate system for the cortical surface. Human brain mapping 8 (4), 272–284.
Fischl, B., van der Kouwe, A., Destrieux, C., Halgren, E., Ségonne, F., Salat, D. H., Busa, E., Seidman,
L. J., Goldstein, J., Kennedy, D., et al., 2004b. Automatically parcellating the human cerebral cortex.
Cerebral cortex 14 (1), 11–22.

15
Page 15 of 17
400 Folstein, M. F., Robins, L. N., Helzer, J. E., 1983. The mini-mental state examination. Archives of general
psychiatry 40 (7), 812–812.
Freund, Y., Schapire, R., Abe, N., 1999. A short introduction to boosting. Journal-Japanese Society For
Artificial Intelligence 14 (771-780), 1612.
Freund, Y., Schapire, R. E., et al., 1996. Experiments with a new boosting algorithm. In: icml. Vol. 96. pp.
405 148–156.

t
Gauthier, S., Reisberg, B., Zaudig, M., Petersen, R. C., Ritchie, K., Broich, K., Belleville, S., Brodaty,

ip
H., Bennett, D., Chertkow, H., Cummings, J. L., de Leon, M., Feldman, H., Ganguli, M., Hampel, H.,
Scheltens, P., Tierney, M. C., Whitehouse, P., Winblad, B., Apr. 2006. Mild cognitive impairment. The
Lancet 367 (9518), 1262–1270.

cr
410 URL http://dx.doi.org/10.1016/s0140-6736(06)68542-5
Gorgolewski, K. J., Varoquaux, G., Rivera, G., Schwarz, Y., Ghosh, S. S., Maumet, C., Sochat, V. V.,
Nichols, T. E., Poldrack, R. A., Poline, J.-B., et al., 2015. Neurovault. org: a web-based repository for

us
collecting and sharing unthresholded statistical maps of the human brain. Frontiers in neuroinformatics 9.
Gray, K. R., Aljabar, P., Heckemann, R. A., Hammers, A., Rueckert, D., Initiative, A. D. N., et al., 2013.
415 Random forest-based similarity measures for multi-modal classification of alzheimer’s disease. NeuroImage

an
65, 167–175.
Han, X., Jovicich, J., Salat, D., van der Kouwe, A., Quinn, B., Czanner, S., Busa, E., Pacheco, J., Albert,
M., Killiany, R., et al., 2006. Reliability of mri-derived measurements of human cerebral cortical thickness:
the effects of field strength, scanner upgrade and manufacturer. Neuroimage 32 (1), 180–194.
M
420 Hinrichs, C., Singh, V., Xu, G., Johnson, S. C., Initiative, A. D. N., et al., 2011. Predictive markers for ad
in a multi-modality framework: an analysis of mci progression in the adni population. Neuroimage 55 (2),
574–589.
d

Hodge, M. R., Horton, W., Brown, T., Herrick, R., Olsen, T., Hileman, M. E., McKay, M., Archie, K. A.,
Cler, E., Harms, M. P., et al., 2016. Connectomedbsharing human brain connectivity data. Neuroimage
te

425 124, 1102–1107.


Hojjati, S. H., Ebrahimzadeh, A., Khazaee, A., Babajani-Feremi, A., Initiative, A. D. N., et al., 2017.
Predicting conversion from mci to ad using resting-state fmri, graph theoretical approach and svm. Journal
p

of Neuroscience Methods 282, 69–80.


ce

Iftikhar, M. A., Idris, A., 2016. An ensemble classification approach for automated diagnosis of alzheimer’s
430 disease and mild cognitive impairment. In: Open Source Systems & Technologies (ICOSST), 2016 Inter-
national Conference on. IEEE, pp. 78–83.
Jovicich, J., Czanner, S., Greve, D., Haley, E., van der Kouwe, A., Gollub, R., Kennedy, D., Schmitt, F.,
Ac

Brown, G., MacFall, J., et al., 2006. Reliability in multi-site structural mri studies: effects of gradient
non-linearity correction on phantom and human data. Neuroimage 30 (2), 436–443.
435 Klöppel, S., Stonnington, C. M., Chu, C., Draganski, B., Scahill, R. I., Rohrer, J. D., Fox, N. C., Jack Jr,
C. R., Ashburner, J., Frackowiak, R. S., 2008. Automatic classification of mr scans in alzheimer’s disease.
Brain 131 (3), 681–689.
Lancu, I., Olmer, A., 2006. The minimental state examination–an up-to-date review. Harefuah 145 (9),
687–90.
440 Marcus, D. S., Wang, T. H., Parker, J., Csernansky, J. G., Morris, J. C., Buckner, R. L., 2007. Open
access series of imaging studies (oasis): cross-sectional mri data in young, middle aged, nondemented, and
demented older adults. Journal of cognitive neuroscience 19 (9), 1498–1507.
Moradi, E., Pepe, A., Gaser, C., Huttunen, H., Tohka, J., Initiative, A. D. N., et al., 2015. Machine learning
framework for early mri-based alzheimer’s conversion prediction in mci subjects. Neuroimage 104, 398–412.

16
Page 16 of 17
445 Ota, K., Oishi, N., Ito, K., Fukuyama, H., Group, S.-J. S., Initiative, A. D. N., et al., 2015. Effects of imaging
modalities, brain atlases and feature selection on prediction of alzheimer’s disease. Journal of neuroscience
methods 256, 168–183.
Platero, C., Tobar, M. C., 2016. A fast approach for hippocampal segmentation from t1-mri for predicting
progression in alzheimer’s disease from elderly controls. Journal of Neuroscience Methods 270, 61–75.
450 Poldrack, R. A., Gorgolewski, K. J., 2015. Openfmri: open sharing of task fmri data. NeuroImage.

t
Reuter, M., Rosas, H. D., Fischl, B., 2010. Highly accurate inverse consistent registration: a robust approach.

ip
Neuroimage 53 (4), 1181–1196.
Sarica, A., Di Fatta, G., Cannataro, M., 2014. K-surfer: A knime extension for the management and analysis
of human brain mri freesurfer/fsl data. In: International Conference on Brain Informatics and Health.

cr
455 Springer, pp. 481–492.
Schapire, R. E., Singer, Y., 1998. Improved boosting algorithms using confidence-rated predictions. In:

us
Proceedings of the eleventh annual conference on Computational learning theory. ACM, pp. 80–91.
Scholkopf, B., Smola, A. J., 2001. Learning with kernels: support vector machines, regularization, optimiza-
tion, and beyond. MIT press.

an
460 Schouten, T. M., Koini, M., de Vos, F., Seiler, S., van der Grond, J., Lechner, A., Hafkemeijer, A., Möller,
C., Schmidt, R., de Rooij, M., et al., 2016. Combining anatomical, diffusion, and resting state func-
tional magnetic resonance imaging for individual classification of mild and moderate alzheimer’s disease.
NeuroImage: Clinical 11, 46–51.
M
Shawe-Taylor, J., Cristianini, N., 2004. Kernel methods for pattern analysis cambridge univ. Cambridge,
465 UK.
Simmons, A., Westman, E., Muehlboeck, S., Mecocci, P., Vellas, B., Tsolaki, M., Kloszewska, I., Wahlund,
L.-O., Soininen, H., Lovestone, S., et al., 2011. The addneuromed framework for multi-centre mri as-
d

sessment of alzheimer’s disease: experience from the first 24 months. International journal of geriatric
psychiatry 26 (1), 75–82.
te

470 Tombaugh, T. N., McIntyre, N. J., 1992. The mini-mental state examination: a comprehensive review.
Journal of the American Geriatrics Society 40 (9), 922–935.
p

Vapnik, V., 2013. The nature of statistical learning theory. Springer science & business media.
Weiner, M. W., Veitch, D. P., Aisen, P. S., Beckett, L. A., Cairns, N. J., Cedarbaum, J., Green, R. C.,
ce

Harvey, D., Jack, C. R., Jagust, W., et al., 2015. 2014 update of the alzheimer’s disease neuroimaging
475 initiative: a review of papers published since its inception. Alzheimer’s & dementia 11 (6), e1–e120.
Westman, E., Muehlboeck, J.-S., Simmons, A., 2012. Combining mri and csf measures for classification of
Ac

alzheimer’s disease and prediction of mild cognitive impairment conversion. Neuroimage 62 (1), 229–238.
Westman, E., Simmons, A., Muehlboeck, J.-S., Mecocci, P., Vellas, B., Tsolaki, M., Kloszewska, I., Soininen,
H., Weiner, M. W., Lovestone, S., et al., 2011. Addneuromed and adni: similar patterns of alzheimer’s
480 atrophy and automated mri classification accuracy in europe and north america. Neuroimage 58 (3),
818–828.
Wyman, B. T., Harvey, D. J., Crawford, K., Bernstein, M. A., Carmichael, O., Cole, P. E., Crane, P. K.,
DeCarli, C., Fox, N. C., Gunter, J. L., et al., 2013. Standardization of analysis sets for reporting results
from adni mri data. Alzheimer’s & Dementia 9 (3), 332–337.
485 Youden, W. J., 1950. Index for rating diagnostic tests. Cancer 3 (1), 32–35.
Zhang, D., Shen, D., Initiative, A. D. N., et al., 2012. Predicting future clinical changes of mci patients using
longitudinal and multimodal biomarkers. PloS one 7 (3), e33182.
Zhu, J., Zou, H., Rosset, S., Hastie, T., 2009. Multi-class adaboost. Statistics and its Interface 2 (3), 349–360.

17
Page 17 of 17

You might also like