You are on page 1of 14

Accepted Manuscript

Title: Performance analysis of different classification


algorithms using different feature selection methods on
Parkinson’s disease detection

Author: Ozkan Cigdem Hasan Demirel

PII: S0165-0270(18)30252-8
DOI: https://doi.org/doi:10.1016/j.jneumeth.2018.08.017
Reference: NSM 8089

To appear in: Journal of Neuroscience Methods

Received date: 20-6-2018


Revised date: 4-8-2018
Accepted date: 14-8-2018

Please cite this article as: Ozkan Cigdem, Hasan Demirel, Performance analysis
of different classification algorithms using different feature selection methods on
Parkinson’s disease detection, <![CDATA[Journal of Neuroscience Methods]]> (2018),
https://doi.org/10.1016/j.jneumeth.2018.08.017

This is a PDF file of an unedited manuscript that has been accepted for publication.
As a service to our customers we are providing this early version of the manuscript.
The manuscript will undergo copyediting, typesetting, and review of the resulting proof
before it is published in its final form. Please note that during the production process
errors may be discovered which could affect the content, and all legal disclaimers that
apply to the journal pertain.
Performance Analysis of Different Classification Algorithms Using Different Feature
Selection Methods on Parkinson’s Disease Detection

Ozkan Cigdema,∗, Hasan Demirela


a Department of Electrical and Electronics Engineering, Eastern Mediterranean University, Gazimagusa, Mersin 10, Turkey

t
ip
Abstract

Background

cr
In diagnosis of neurodegenerative diseases, the three-dimensional magnetic resonance imaging (3D-MRI) has been heavily
researched. Parkinson’s disease(PD) is one of the most common neurodegenerative disorders.

us
New Method
The performances of five different classification approaches using five different attribute rankings each followed with
an adaptive Fisher stopping criteria feature selection (FS) method are evaluated. To improve the performance of PD
an
detection, a source fusion technique which combines the gray matter (GM) and white (WM) tissue maps and a decision
fusion technique which combines the outputs of all classifiers using the correlation-based feature selection (CFS) method
by majority voting are used.

Results
M
Among the five FS methods, the CFS provides the highest results for all five classification algorithms and the SVM
provides the best classification performances for all five different FS methods. The classification accuracy of 77.50% and
81.25% are obtained for the GM and WM tissues, respectively. However, the fusion of GM and WM datasets improves
the classification accuracy of the proposed methodology up to 95.00%.
ed

Comparison with Existing Methods


An f-contrast is used to generate 3D masks for GM and WM datasets and a fusion technique, combining the GM and
WM datasets is used. Several classification algorithms using several FS methods are performed and a decision fusion
pt

technique is used.

Conclusions
ce

Using the combination of the 3D masked GM and WM tissue maps and the fusion of the outputs of multiple classifiers
with CFS method gives the classification accuracy of 95.00%.
Keywords: Parkinson’s disease, structural MRI, DARTEL, feature selection, source fusion, decision fusion.
Ac

1. Introduction AD the atrophies in the brain are clearly visible from the
structural MRI (sMRI) and might be sufficient to decide
The neurodegenerative disorders are characterized by the level of the disease, in PD, the atrophies might not be
the progressive deterioration of the brain neurons [1, 2]. sufficient. Hence, in addition to sMRI data, the clinical
Parkinson’s disease (PD) and Alzheimer’s disease (AD) examinations and the medical histories of the patients are
are the two most common neurodegenerative diseases. In required [5, 6]. To improve the accuracy of clinical tests
neurodisease classification, in order to detect the disorder on PD identification and make it robust, a computer-aided
by using only one MRI scan, there is a need to have a detection (CAD) has been progressively used in neurode-
model generated from a large collection of diseased and generative disease detection [7, 8, 9]. There have been
healthy controls (HCs) datasets [3, 4]. Even though in the various neuroimaging methods used in the literature such
as sMRI [10], functional MRI (fMRI) [11], positron emis-
∗ Corresponding
sion tomography (PET) [12], and single photon emission
author. Tel:+90 392 1301; fax:+90 392 630 1648.
Email address: ozkancigdem@ieee.org (Ozkan Cigdem) computed tomography (SPECT) [13]. Among these neu-

Preprint submitted to Elsevier August 4, 2018

Page 1 of 13
roimaging techniques, in PD detection, MRI has been used technique, the high dimensions of the combined GM and
widely due to its high spatial resolution neuroanatomy, high WM data are reduced to the lower VOIs level. Even though
availability, good contrast, and no a requirement for any the feature extraction decreases the amount of raw data,
pharmaceutical injections [3, 1]. In sMRI, atrophies and since the number of samples is more than the number of
physical differences are examined among different tissue observations, an automatic feature selection (FS) method
types, while in fMRI, hemodynamic responses of the brain might be used to prune off the irrelevant, redundant, and
regarding neural activities have been investigated [11]. In noisy information from the data.
this paper, sMRI data have been used for classification of Feature selection is an important task in machine learn-
PD and HC. Differentiated voxels which are considered ing, artificial intelligence, computer vision, and data mining.
as volumes of interests (VOIs) are obtained by using the It aims to reduce the dimensionality of the data and noise

t
VBM method which discriminates the PD from the HC by in data sets. The purpose of FS is to remove redundant

ip
analyzing group-wise comparisons of cross-sectional sMRI features from the feature sets and to keep the relevant ones.
scans [10]. In order to improve inter-group registration and Therefore, a convenient FS method mostly provides a bet-
provide more precise and accurate localization of the struc- ter performance in learning and classification processes,

cr
tural differences of sMRI data, a Diffeomorphic Anatomic since the chance of over-fitting increases with the number
Registration Through Exponentiated Lie algebra algorithm of features. Considering the evaluation strategy of the FS,
(DARTEL) has been used with a combination of the VBM there are two categories mostly used, filters [22] and wrap-
[14]. pers [23]. In wrapper methods, to evaluate the possible

us
After preprocessing of the data, statistical analysis is feature subsets, a built-in classifier is required. However,
studied for extracting VOIs through voxel-by-voxel com- in filter methods, correlation between features are evalu-
parison of the brains of PD and HC [15]. In statistical ated using the criterion indicative of the ability to separate
analysis, in order to detect tissue abnormalities optimally, the classes [24]. In this paper, five different filter based
there are some parameters needed to be provided such as
covariate and contrast [16]. In literature, for PD detec-
tion, in order to obtain the 3D masks of GM and WM
an
feature ranking approaches including both the supervised
and the unsupervised learning have been investigated for
the preprocessed, 3D masked, and the extracted GM as
tissue maps, mostly t-contrast is used [10, 17, 18]. While well as WM data sets. These approaches are Relief-F [25],
M
the t-contrast compares the mean of two groups, f-contrast Laplacian score (LS) [26], unsupervised feature selection
compares variances of the two groups. Therefore, t-contrast for multi-cluster data (MCFS) [27], unsupervised discrimi-
is directional (either mean of group one is larger than group native feature selection (UDFS) [28], and correlation-based
two or vice versa), f-contrast checks for the differences be- feature selection (CFS) [22]. After ranking all the features,
ed

tween the groups. In this paper, total intracranial volume the optimum number of top-ranked features are determined
(TIV) is used as a covariate and f-contrast is used for model by using a Fisher Criterion (FC) stopping method which
building. By doing statistical analysis, different number maximizes the class separation between the PD and HC
of clusters having different amounts of VOIs are obtained. [29]. The purpose of this method is to select the optimal
The obtained clusters are used to generate a 3D mask [19]. number of top-ranked features adaptively based on training
pt

Masking the segmented, warped, and smoothed data de- data in each fold instead of identifying a fixed number of
creases the amount of raw data significantly. Hence, only features. It also aims to determine a discriminative feature
the differentiated voxels clustered in VOIs are taken into subset with high performance by using training data in each
ce

account. In neurodegenerative disorder diagnosis, mostly fold of classification algorithm [30]. In order to classify the
the alterations in the GM tissue map using VBM have been data with the selected features, five different classification
considered [10, 17, 7]. However, in [20, 21], the alterations algorithms which are k nearest neighbor classifier (kNN)
in the WM tissue map using VBM have been taken into [31], naive Bayes (NB) [32], ensemble-subspace discrimi-
Ac

account. In this paper, the preprocessing of 40 PD and nant (ESD) [33], ensemble-bagged trees (EBT) [33], and
40 HC data from Parkinsons Progression Markers Initia- support vector machines (SVM) [34] have been studied.
tive (PPMI) datasets (www.ppmi-info.org/data) has been The performances of all five feature ranking methods with
performed for both the GM and WM modalities and the all five classification algorithms are compared. Regarding
3D masks are obtained for GM and WM, separately. The to CFS method, a good feature subset has the features
obtained 3D masks are applied to the GM and WM tissues which are not correlated with each other, also they are
and only the differentiated voxels are extracted. Since correlated to the class labels. Hence, the performance of
the morphological differences occur in both GM and WM feature evaluation as subsets is better than that of evalua-
brain volumes of PD patients and HCs, instead of taking tion individually [22]. The experimental results show that
the individual contributions of each volume, concatenating among all five feature ranking approaches with Fisher stop-
them provides improved classification performance on the ping criterion, the CFS one has the highest performances
diagnosis of PD. In order to analyze the effects of GM and for all five classification algorithms, since it ranks feature
WM modalities together, a single vector is obtained by subsets rather than individual features. In order to improve
concatenating the 3D masked GM and WM datasets. By the overall PD classification performance, a decision fusion
combining the GM and WM modalities, a source fusion technique among the outputs of all five classifiers for the
2

Page 2 of 13
CFS ranking with Fisher stopping criterion FS approach 3. Methodology of the CAD system
is proposed. The proposed decision fusion approach has
been performed for GM, WM, and combination of GM In this section, the methodology of designing an au-
as well as WM datasets. The obtained results indicate tomatic CAD system for PD detection is given. First,
that fusion the outputs of five classifiers outperforms the the preprocessing of 3D sMRI data by using VBM plus
performance of each classifier individually. Furthermore, DARTEL approach is employed. Second, the 3D masks
the combination of the GM as well WM datasets, a source are being generated using the TIV as a covariate and the
fusion technique, significantly increase the classification f-contrast for model building, and the combination of GM
performances of using the GM and the WM datasets, sepa- as well as WM modalities. Third, five different feature
rately. By using the source and decision fusion techniques, ranking methods are used to rank the features and an FC
is used to select the number of top-ranked features. Five

t
a higher classification result is achieved for PD detection.
different classification approaches have been applied to the

ip
The novelty of this paper is to introduce an automatic
approach by comparing the performances of five classifi- selected features and finally, a decision fusion technique
cation algorithms for five feature ranking approaches by combining the binary results of all the classifiers by using
the majority voting scheme is used. The framework is given

cr
using an automatic FC stopping method on PD detection.
To the best of the authors’ knowledge, it is the first study in Fig. 1.
to compare the performances of several FS approaches with
several classification algorithms and using both the source 3.1. MRI Data Pre-processing and Statistical Analysis

us
and decision fusion techniques on PD detection. The ex- The 3D T1-weighted sMRI images are preprocessed
perimental results indicate that for the combination of GM through taking advantage of statistical parameter map-
and WM datasets, using the CFS ranking with FC stopping ping (SPM12) package (Welcome Trust Centre for Neu-
method FS approach and fusion the outputs of the five clas- roimaging, London, UK; available at: http://www.fil.
sifiers gives the highest accuracy, sensitivity, and specificity
scores of 95.00%, 90.00%, and 100%, respectively.
The remainder of this paper is arranged as follows:
an
ion.ucl.ac.uk/spm) and its extension called as compu-
tational anatomy toolbox(CAT12) (http://www.neuro.
uni-jena.de/cat/) implemented in Matlab 2017b. CAT12
Section II provides statistics of the data namely materials is the newer version of VBM8 and it covers diverse morpho-
M
used in the work and Section III describes the methodology metric methods such as VBM, surface-based morphometry,
used to design an automatic CAD tool. In Section IV, deformation-based morphometry, and region- or label-based
experimental results for the proposed design are introduced. morphometry. CAT12 is more robust and accurate than
In Section V, the discussion is given, and in Section VI, VBM8 for detecting the small morphological abnormalities
ed

the conclusion is drawn. [35].


The 3D high resolution image data is downloaded from
PPMI database in DICOM formats which includes the
2. Materials
176 MR images of a subject. A DICOM image, one MR
image of a subject’s head, has X/Y:240/256 pixels. The
pt

2.1. MRI Acquisition


scanner captures 176 MR images of a subject, from left
The data used in this research are obtained from the
ear to right ear. Hence, the data of a subject downloaded
PPMI data set (www.ppmi-info.org/data). The proto-
from PPMI database has 176 DICOM images with the
ce

col included T1-weighted MRI images based on a scan-


size of 240x256 pixels. In order to have a 3D image, all
ner by Siemens with acquisition plane=sagittal, acquisi-
176 MR images are combined. The 176 DICOM format
tion type=3D, coil=Body, flip angle=9.0 degrees, matrix
images are converted into one 3D NIFTI format image
X/Y/Z=240.0/256/176 pixel, mfg model=TrioTim, pixel
by using SPM12 software package. Finally, a 3D matrix
Ac

spacing X/Y=1.0/1.0 mm, pulse sequence=GR/IR, slice


of X/Y/Z=240.0/256/176 is obtained. After converting
thickness=1 mm, and TE/TI/TR=2.98/900/2300 ms.
the images, first of all, the anterior commissural (AC)
points of all subjects are co-registered to the central point
2.2. Subjects
spaces to have every image with the same central loca-
50 PDs and 50 HCs data are downloaded, yet among tions. VBM technique is an automated method to analyze
100 persons, 10 HCs and 10 PD patients’ MR data were tissue volumes between different subject groups. It takes
excluded due to mismatch X/Y/Z matrix size, hence failure into account the whole brain structure by comparing it
of the segmentation method. For the present study, 40 voxel-by-voxel and discriminates the degenerated tissue
PD patients (mean age±standard deviation = 60.37±8.63 concentrations by referencing the brain of HC as a tem-
years, range: 40.075.2 years, gender: 19M-21F) and 40 plate. Then by using SPM12 together with CAT12 toolbox,
HCs (mean age ± standard deviation = 60.09 ±10.35 years, the data are segmented into six tissue probability maps
range: 32.578.9 years, gender: 27M-13F) are used. regarding to the existing templates for each of six modali-
ties which are GM, WM, CSF, skull, scalp and air cavities.
In this study, only the GM and WM tissues have been
examined. After GM and WM tissues are obtained, they
3

Page 3 of 13
are normalized with DARTEL approach, since it has more smoothed GM and WM tissues, respectively. Therefore,
precise inter-subject alignment of MRI images than VBM only the voxels clustered in the VOIs are extracted from the
[36]. Additionally, it gives precise and accurate localization whole GM and WM tissue maps. The 3D masks of the GM
of structural deformation on the MRI images. In order to modality is given in Fig. 2a and the WM modality is given
create deformation fields of each data, instead of creating in Fig. 2b. The dimensions of the extracted GM and WM
template for studied data, existed DARTEL template cre- data are reduced significantly through feature extraction.
ated from 555 healthy controls is used. By using DARTEL In this study, GM and WM VOIs are concatenated to
normalization, all data are registered to standard Mon- investigate the effects of both modalities. The pipeline of
treal Neurological Institute (MNI) space that includes both the procedure is given in Fig. 1.
affine transformation and nonlinear deformation. Later on,

t
the segmented, normalized images are modulated through 3.4. Feature Ranking

ip
preserving amount which preserves total amounts of tis- In order to alleviate the effect of the curse of dimen-
sue corrected for individual differences in brain size, and sionality, improve the performance of the designed model,
finally spatially smoothed with an 8 mm default full-width- reduce the learning process time, and enhance data un-

cr
half-maximum (FWHM) Gaussian kernel set in SPM12 derstanding, an FS method is required. There are mainly
software package. All other parameters of SPM12 kept as three FS techniques, namely filters, wrappers, and em-
default. The framework is given in Fig. 1. Finally, the bedded methods. In filter methods, irrelevant features
general linear model (GLM) is described for the segmented,

us
are filtered out independently from the learning algorithm.
DARTEL-warped, modulated, and smoothed images by The classifier is ignored and only intrinsic properties of
using statistical analysis for GM and WM modalities indi- the data are examined. First, the features are taken into
vidually. The voxel-wise two sample t-test is used to set up account exclusively and ranked according to their impor-
the model of 3D masks of GM and WM individually. In or-
der to configure a design matrix, describing the GLM, data
specification, and other parameters are necessary for the sta-
tistical analysis(http://www.neuro.uni-jena.de/cat/).
an
tance level. Then a feature subset is obtained from the
ranked ones. In wrappers methods, classifiers are consid-
ered in order to score a given subset of features. Hence, in
the wrapper approach, there is need to evaluate a specific
The model of the data is designed for TIV as a covariate. learning algorithm and to determine which features are
M
The design parameters are estimated and the inference on selected. In embedded ones, a selection process is injected
these estimated parameters are handled by using SPM12 into the learning of the classifier [24]. With wrappers and
Results section. This is done in order to find the volume embedded methods, first, the feature subsets are sampled
changes of GM and WM tissues among PD and HC. In and evaluated. Then they are kept as the final outputs [24].
ed

order to ensure that analysis focuses on the tissue type of The wrappers and embedded methods tend to find the
the intention, an absolute threshold value of 0.2 is used. more optimal features for the specific learning algorithms,
The analysis is experimented with a threshold of uncor- hence the computational cost is high. However, the filter
rected p<.001 and none extend threshold voxels. These methods are fast and easy to implement. In this paper,
experiments are done with the f-contrast for GM as well
pt

filter feature ranking methods including both supervised


as WM. and unsupervised ones are studied for detection of PD. A
After pre-processing the data and statistical analysis, subset of relevant features from 3D masked data is ranked
feature extraction, feature ranking followed by ranked FS,
ce

according to their degrees of relevance and importance. The


and classification steps are performed as seen in Fig. 1. FS method, learning type, and computational complexity
of the used five feature ranking approaches are given in
3.2. Identification of Affected Brain Regions in PD by Table 1.
Using VBM
Ac

Table 1: The class (Cl.) and the computational (Com.) complexity of


The anatomical brain regions affected owing to the PD the five filter methods based feature selection approaches are given. T,
are identified by using Neuromorphometrics labels http: for the number of samples; n, for the number of initial features; C, for
//Neuromorphometrics.com/. The most important ROIs the number of classes; d, for the number of selected features; i, for the
detected using different combinations of covariates and number of iterations; s, for supervised learning; u, for unsupervised
learning are used.
contrasts belong to the regions of left superior frontal gyrus,
right middle temporal, left anterior cingulate gyrus, right FS Cl. Com. Complexity
anterior insula, right angular gyrus, left middle temporal Relief-F s O(iTnC)
gyrus, left inferior temporal, and right putamen. All of Laplacian u N/A
the obtained regions are reported in different papers in MCFS u O(nT 2 + Cd3 + T Cd2 + n log n)
literature [10, 14, 18, 37, 38, 39]. UDFS u N/A
2
CFS u O( n2 T)
3.3. Feature Extraction
The generated 3D masks for GM and WM tissues are
multiplied with the DARTEL normalized, modulated, and

Page 4 of 13
Set origin to
DICOM Convert to NIFTI
anterior commissure

Segmentation

GM Images WM Images

t
ip
DARTEL DARTEL

Preprcossing
Normalization Normalization

cr
Modulation Modulation

us
Smoothing Smoothing

GLM Parameters
Estimation
Results
f-contrast
an
Total Intracranial Volume (TIV)
GLM Parameters
Estimation
Results
M
3D Mask 3D Mask

Feature Extraction
smwp1* smwp2*
ed

X X
GM Images WM Images
pt

GM+WM

Feature Selection
Classification

Classification
ce

Feature Ranking k Nearest Neighbor


Relief-F
Laplacian Score Naive Bayes
MCFS Ensemble Bagged Trees
+
Ac

UDFS
Ensemble Subspace Discriminant
Selection

CFS
Feature

Support Vector Machines

Fisher Criterion

Decision Fusion

Parkinson's Disease Healthy Control

smwp1*=DARTEL warped, modulated, smoothed GM tissue. smwp2*=DARTEL warped, modulated, smoothed WM


tissue
Figure 1: The framework of VBM plus DARTEL processing pipeline and classifying PD apart from HC.

Page 5 of 13
Sagittal view Axial view Coronal view 3.4.5. CFS: Correlation-Based Feature Selection
CFS proposes a metric to assess a feature subset. The
main purpose of CFS is to reduce feature-to-feature corre-
(a) GM lation (rf f ) and increase feature-to-class correlation (rf o ).
In other words, according to CFS, good feature sets include
features which have a high correlation with the class, but
uncorrelated with each other. In 1, higher ratio represents
a better subset.
rf o
(1)
(a) WM rf f

t
3.5. Feature Selection Based on Fisher Criterion Methods

ip
The aim of FS is to find the feature subset of a certain
Figure 2: The 3D masks of GM and WM tissues with TIV as a size which causes the largest possible generalization or
minimal risk [22]. In order to determine the optimal feature

cr
covariate and f-contrast.
subset which has the number of top discriminative features,
an automatic approach based on the FC, J(w), given in Eq.
3.4.1. Relief-F 2 is used [30].

us
Relief-F is an iterative, randomized, and supervised wT SB w
approach in which the quality of the features are estimated J(w) = T (2)
w SW w
according to how well the values of the features separate
where SB and SW indicate the determinant of between
close data samples. Discrimination is not done among re-
dundant features, hence using few data might decrease the
performance of the algorithm [40]. In the Relief-F algo-
rithm, the weight of a feature vector is identified regarding
an
class and within class scatter matrices, respectively. The
between class scatter and within class scatter matrixes for
y1 and y2 classes are defined as follows:
the feature relevance. The feature weights are obtained SB = (µy1 − µy2 )(µy1 − µy2 )T
M
by solving a convex optimization problem. However, these X X
feature weights may fluctuate with the instances and due SW = (xi − µy1 )(xi − µy1 )T + (xi − µy2 )(xi − µy2 )T
to the uncertainty in sampling frequency. Therefore, the xi y1 xi y2
Relief-F algorithm is unstable and might reduce the ex- (3)
where w = Sw −1 (µy1 −µy2 ) and µyi is the mean on the data
ed

pected accuracies [24].


in each class. The number of top discriminative features
3.4.2. LS:Laplacian Score from the ranked datasets is selected adaptively in each
LS is an unsupervised method, in which the significance set of training data instead of a fixed number of features.
of a feature is evaluated based on how powerfully it pre- The number of top-ranked features increases iteratively
pt

serves the locality of the data manifold structure. The and the respective FC value is calculated for each iteration.
nearest neighbor graph is designed to model the local geo- The iterations are performed until a maximum FC value
metric structure. LS method searches for the features that is obtained and the optimal number of top features is
ce

preserve the existing structure of the graph. determined by taking the number of top-ranked features
maximizing the FC.
3.4.3. MCFS: Unsupervised Feature Selection for Multi-
Cluster Data 3.6. Classification Methods
Ac

MCFS is an unsupervised filter method which selects In order to separate the PD patients apart from the
the relevant features such that the multi-cluster structure HCs, classification algorithms need to be used. In literature,
of the data might be preserved the best [27]. The MCFS many different classifiers are used to train a model. In this
method is inspired from the manifold learning and the paper, five different classification methods are used for train-
L1-regularized models for the subset selection. ing the model, namely k nearest neighbor classifier, naive
Bayes, ensemble-subspace discriminant, ensemble-bagged
3.4.4. UDFS: Unsupervised Discriminative Feature Selec- trees, and support vector machines. The experiments are
tion performed by using Matlab classification learner app which
In UDFS, the most discriminative feature subsets are trains models to classify data. The feature values for each
ranked in batch mode, namely by taking into account the measurement and measurement class are taken as inputs
structure of the manifold [28]. It exploits local discrimi- by the app. After choosing a machine learning algorithm
native information and feature correlations simultaneously among several supervised ones using various classifiers, the
[41]. app automatically trains the data. In this study, the pa-
rameters automatically determined by the classification
learner app are used.
6

Page 6 of 13
Data Set

Training Set (K1-1 folds) Test Set


Separate Data Set to 1 fold
K1 folds

Feature
Selection

K2-1 folds Separate Train Data 1 fold


Set to K2 Folds

t
Training Set Validation Set

ip
Parameter
Estimation

Classifier Train Optimized Test Set


Parameters Classifier Classification

cr
Classification

Classification Results

us
Figure 3: The pipeline of the K-fold cross validation procedure [36].

In order to evaluate the performance of the classifier, a labeled by the most common class label among its k nearest
procedure of two CVs, an outer K1 CV and an inner K2
CV, is combined with a grid search [36]. This procedure
consists of two nested loops. In the outer loop, the data
are split into K1 folds. At each step, one fold is used
an
neighbors measured by a distance function. Hence, the out-
put in this classification algorithm is a class membership.
In this paper, the Euclidean distance function is used as
a distance metric and a squared inverse distance weight
as a test and the remaining K1 − 1 folds are used for are selected. The drawback of k-NN is that equal weight is
M
training and validation. In the inner loop, the training intrinsically assigned to each feature by comparing the dis-
data (K1 − 1 folds) are further divided into K2 folds. For tances. Hence, for irrelevant and noisy feature datasets, the
each combination of C and γ, the classifier is trained using accuracy results of the classification might be insufficient
the training data and its performance is assessed using [31].
ed

the fold left for validation by estimating the classification


accuracy. One fold is left for validation and the remaining 3.6.2. Naive Bayes
K2 − 1 fold for training is combined with the grid search NB is a classification algorithm which assumes the inde-
to determine the optimal parameters. In the grid search, pendence of the predictors. In other words, in NB classifier,
pt

the values of C and γ varies logarithmically from 2−5 to it is assumed that the effect of a feature in a given class
220 and from 2−15 to 215 , respectively. The inner loop is unrelated to the many other existing features [32]. By
is repeated K2 times and the accuracy of the classifier is doing so, the computational cost is reduced. Since there
ce

obtained across the K2 folds for every combination of C is no exhaustive iterative parameter estimation, an NB
and γ. Optimal parameters are selected in a way that the algorithm performs well, especially on large datasets. The
average accuracy across the K2 folds is maximized. Then, main idea behind the NB classification is to try and classify
the class label of the test data is predicted, which is left data by maximizing
Ac

out in the outer loop using selected optimal parameters.


The above procedure is repeated K1 times by leaving a P (X|Ci )P (Ci )
P (Ci |X) = (4)
different fold as test data, which is used to compute the P (X)
classification accuracy [36, 39]. In this study, K1 = 10 and
K2 = 10 are used to represent the number of inner and where for a class index, i, for measurements, X, and for
outer CV loops, respectively and the k is used to indicate the classes, C are used. In this paper, the hyperparameters
the number of nearest neighbors in kNN algorithm. The of NB are optimized to minimize the CV loss in an NB
pipeline of the K-fold CV procedure is given in Fig. 3. As classifier by using fitcnb function in Matlab (https://it.
shown in Fig. 3, the FS by using the ranked features is mathworks.com/help/stats/fitcnb.html#bvdksew-1).
performed through the outer loop CV in the classification
3.6.3. Ensemble Subspace Discriminant
algorithms.
The error in machine learning is mostly occurred owing
3.6.1. k Nearest Neighbor to the noise, the bias, and the variance. Ensemble learning
combines multiple learners in order to decrease the variance
In kNN classification, an object is classified using ma-
and improve the predictive performance of the machine
jority vote of its neighbors, namely the object is simply
learning models compared to a single classifier. A subspace
7

Page 7 of 13
ensemble method is used to improve the accuracy of dis- Table 2: The five different classification performances by using five
different feature ranking methods and FC feature selecting on each
criminant analysis classifier which is a Gaussian mixture method for the combination of GM and WM datasets.
model for data generation. It has the advantage of over-
coming the missing values (NaNs) problem and using less CM ∗ F SM ∗ ACC ∗ SEN ∗ SP E ∗
memory than ensembles with all predictors [33]. In this Relief-F 81.25 77.50 85.00
paper, 30 for the number of learner and 175 for subspace LS 82.50 77.50 87.50
dimension are selected. KNN MCFS 78.75 75.00 82.50
UDFS 80.00 75.00 85.00
3.6.4. Ensemble Bagged Trees CFS 85.00 77.50 92.50
In bagging mate-algorithm, also known as bootstrap Relief-F 75.00 70.00 80.00

t
aggregation, each model in the ensemble is voted with equal LS 73.75 70.00 77.50

ip
weight and each model is built independently. In order NB MCFS 76.25 75.00 77.50
to obtain the data subsets for training base learners, a UDFS 75.00 70.00 80.00
bagging method takes into account a bootstrap sampling. CFS 87.50 82.50 92.50

cr
Each model in the ensemble is trained by using a randomly Relief-F 76.25 80.00 72.50
decided subset of training sets. Decision trees might suffer LS 73.75 75.00 72.50
from the variance and taking the average of multiple esti- ESD MCFS 66.25 72.50 60.00
mates of an estimate by using bagging algorithm is a way

us
UDFS 70.00 72.50 67.50
of reducing the variance. As a result, the higher accuracy CFS 78.75 85.00 72.50
is achieved by ensemble bagged trees method [24]. In this Relief-F 72.50 77.50 67.50
paper, 30 for the number of learners, 79 for the maximum LS 71.25 77.50 65.00
number of splits are selected.

3.6.5. Support Vector Machines


An SVM classification algorithm is used in order to
an EBT MCFS
UDFS
CFS
Relief-F
76.25
76.25
83.75
86.25
77.50
80.00
80.00
82.50
75.00
72.50
87.50
90.00
classify PD patients apart from HCs. The main idea be- LS 83.75 77.50 90.00
M
hind the SVM is to search for an optimal class-separation SVM MCFS 83.75 80.00 87.50
hyperplane in the maximal margin [34]. In this study, an UDFS 86.25 82.50 90.00
RBF-SVM classifier has been used. CFS 90.00 87.50 92.50
*
CM:classification method, FSM:feature
ed

3.7. Decision Fusion


selection method, ACC:accuracy(%),
In decision fusion, the prediction results of five classi- SEN:sensitivity(%), SPE:specificity(%).
fiers are concatenated into a single 80x5 matrix. Assum-
ing CKN N , CN B , CESD , CEBT , CSV M are the prediction
pt

binary results of the KNN, NB, ESD, EBT, and SVM, data of GM and WM are multiplied with the original GM
respectively. Hence, the decision fusion is designed as: and WM datasets, respectively and the 3D masked GM and
WM data are obtained. In order to improve classification
CDF = [CKN N , CN B , CESD , CEBT , CSV M ]. (5)
ce

performance, the 3D masked GM and WM data are con-


catenated and a new fused GM+WM dataset is obtained.
After generating the decision fusion matrix which has a size
Since the number of the features in the combination of GM
of 80x5, the majority voting method is employed as a final
and WM datasets are more than that of samples, an FS
classification decision. Each sample having more than 3
Ac

approach needs to be applied to the data. In this paper, fil-


votes are labeled as correctly classified. Since the majority
ter based five feature ranking methods, including both the
voting is easy to implement, has high performance on real
supervised and unsupervised learning, are applied to the
data, and has a very simple scheme, it is one of the most
combination of the GM and WM datasets. An adaptive FC
used versatile combination methods in literature [42, 30].
algorithm is used in order to select the optimum number
of top discriminative features among the ranked vector in
4. Experimental Results each set of training fold. For classification of the data with
selected features, five different methods are studied. Hence,
The experimental results obtained from 3D T1-weighted the performances of all five different classification methods
sMRI data by using SPM12 and CAT12 toolbox with DAR- using the outputs of all five feature ranking approaches as
TEL analysis have been studied. The segmented, normal- an input are compared. The experimental results given
ized, modulated and smoothed data are used in order to in Table 2 indicate that among all five features, the CFS
generate a 3D mask by using TIV as a covariate, f-contrast, ranking with an FC stopping method has the highest ac-
and the combination of GM as well as WM tissue maps. curacy, sensitivity, and specificity scores. FS, in a simple
The 3D masks are the VOIs which encapsulate the most form, assess individual attributes and rank them regarding
discriminative voxels between PD and HC. The 3D mask
8

Page 8 of 13
to their correlation with class labels. However, in [22] it is to improve the classification performance, a decision fusion
proved that a good feature subset has the features which scheme which combines the output of each classifiers for
are not correlated with each other, besides they are cor- the CFS method is used. In literature, the GM or WM
related to the class labels. Therefore, the performance of modalities from sMRI datasets are used individually in
feature evaluation as subset is better than that of evalu- order to detect the PD [5, 10, 1]. In this study, the per-
ation individually. Since CFS assess and therefore ranks formance of the proposed scheme is evaluated for the GM,
the feature subsets rather than the individual features, it WM, and the combination of the GM and WM modalities.
mostly outperforms the Relief-f approach which does not In [18], a decision model is used for FS. In this model, the
identify redundant attributes [22]. The MCFS method accuracy of an SVM classifier starting from the features
which selects features in batch mode, performs especially of the first cluster is calculated and then the features are

t
well when the number of selected features is small [27]. added incrementally into the classifier until the highest

ip
However, in this paper, for some training folds,the number accuracy is obtained. The highest classification accuracy
of selected top-ranked features might be high. of 88.33% is obtained for combination of GM and WM
Regarding the Table 2, among all five classification ap- datasets which are the modalities of self-acquired 30PD

cr
proaches by using five feature ranking method with FC 30HC data. However, there is not an automatic FS scheme
stopping criteria as input, the CFS one indicate a better and the computational cost of this approach would be high
and robust performance for all the classifiers. Therefore, for large datasets. In [10], a projection based learning and
this approach has been evaluated for the GM, WM, and the meta-cognitive radial basis function network with recursive

us
combination of GM as well as WM datasets as inputs. The feature elimination (RFE) as a classifier scheme is used for
experiment results given in Table 3 show that using the GM dataset. In 3D mask building, GM as a covariate and
combination of GM and WM datasets outperforms using t-contrast for model building are used. The classification
GM and WM individually for all the classifiers. Since the accuracy of 87.21% is obtained. However, the computa-
number of features is increased by fusion of the GM and
WM datasets, the variance is increased and the selected
top-ranked features are also increased. Hence, the classifi-
an
tional cost of the used wrapper FS method including RFE
is expensive and the DARTEL normalization in VBM tech-
nique which improves inter-group registration and provides
cation performances of all methods are improved. In Table more precise and accurate localization of structural dif-
M
3, among all five classification methods, the SVM has the ferences of sMRI data is not used. In [5], the preselected
superiority classification performance for all the feature regions, especially substantia nigra (SN), are drawn man-
ranking approaches. As it is seen from Table 3, after apply- ually and classification is performed on these predefined
ing the decision fusion technique by using majority voting, ROIs. The t-test based ranking and mutual information
ed

the specificities and accuracies of individually used GM based FS methods are taken into account. The GM, WM,
and WM datasets are improved. Furthermore, applying and CSF modalities acquired from 30PD and 30HC self-
this scheme to the combination of GM and WM datasets acquired data are used separately and the classification
improve the whole classification performance. From the accuracies of 86.67%, 86.67%, and 83.33% are obtained, re-
experiments, it is observed that the decision fusion tech- spectively. However, using only manually drawn five ROIs
pt

nique improves the highest specificity scores from 75.00%, which have the operator-dependent boundaries might skip
87.50%, and 92.50% to 80.00%, 90.00%, and 100% for GM, the other brain regions affected by the PD. In [7], the prin-
WM, and combination of GM as well as WM, respectively. cipal component analysis followed by Fisher’s discriminant
ce

Additionally, the proposed technique improves the high- ratio is used to reduce the dimensionality of the features
est accuracy scores from 75.00%, 80.00%, and 90.00% to and SVM is used for classification. In order to compare
77.50%, 81.25%, and 95.00% for GM, WM, and combina- the performance of the proposed method with the studies
tion of GM as well as WM, respectively. Using the proposed provided in Table 4, the methods used in these studies are
Ac

methodology increases the sensitivity score from 87.50% to applied to the 40 PD and 40 HC data obtained from PPMI
90.00% for the combination of GM and WM datasets. database and used in this paper. The detailed explanations
of [7, 10, 5, 18] are provided in [39].
5. Discussion In [43], the PD diagnosis based on extreme learning
machine method along with genetic algorithm (GA) fea-
In literature, the performance of CFS technique is com- ture subset selection is studied. The VBM normalized,
pared to the wrappers ones in many applications and in modulated, 10mm FWHM smoothed, and segmented data
general, the CFS method outperforms the wrapper methods are used for 3D mask generation. DARTEL normalization
for the small datasets [22]. Additionally, the computational is not used. In statistical testing, two-sample t-test to
cost of the filter methods, including the CFS one, is less generate the model and t-contrast to build the 3D GM
than the wrappers methods. In this study, five different FS mask are utilized. The GA is used for feature selection.
methods following an adaptive FC stopping approach have After selecting the features, the performance of the ELM
been used for choosing the optimal number of top-ranked classifier is compared with SVM one. In each trial, 80%
features. Additionally, five different classification methods of total data is randomly selected for training and 20%
are studied by using five different FS techniques. In order of it is used for testing. Since the aim of this study is to
9

Page 9 of 13
Table 3: The five different classification performances by using CFS feature ranking and FC feature selecting for GM, WM, and the combination
of GM as well as WM datasets.

GM WM GM+WM
Classification ACC ∗ SEN ∗ SP E ∗ ACC ∗ SEN ∗ SP E ∗ ACC ∗ SEN ∗ SP E ∗
SVM 68.75 70.00 67.50 76.25 70.00 82.50 90.00 87.50 92.50
KNN 61.25 55.00 67.50 80.00 75.00 85.00 85.00 77.50 92.50
NB 72.50 70.00 75.00 68.75 57.50 80.00 87.50 82.50 92.50
ESD 68.75 75.00 62.50 61.25 52.50 70.00 78.75 85.00 72.50
EBT 75.00 77.50 72.50 70.00 70.00 70.00 83.75 80.00 87.50
Fusion-ALL(Proposed) 77.50 75.00 80.00 81.25 72.50 90.00 95.00 90.00 100

t
*
ACC:accuracy(%), SEN:sensitivity(%), SPE:specificity(%).

ip
Table 4: Classification performance comparison with the state-of-the-art.

cr
Research Work HC/PD Cov ∗ -Cont∗ Volume(s) ACC ∗ SEN ∗ SP E ∗
Salvatore et al.[7] 28/28a - SSN M V ∗ 85.80 86.00 86.00
Babu et al.[10] 112/127b GM-T GM 87.21 87.39 87.00
30/30a

us
Rana et al.[5] TIV-T GM 86.67 90.00 83.33
WM 86.67 86.67 86.67
Rana et al.[18] 30/30a TIV-T GM 76.67 66.67 86.67
TIV-T WM 73.33 73.33 73.33

*
Pahuja et al.[43]
Proposed Method
30/30b
40/40b
TIV-T
-T
TIV-F
anGM+WM
GM
GM+WM
88.33
90.94
95.00
90.00
92.45
90.00
86.67
97.30
100
Cov:Covariate, Cont:Contrast, T:t-contrast, F:f-contrast, SSNMV: Skull-stripped &
M
a
normalized MRI volumes. Self-acquired dataset not available publically.
b
Publically available PPMI dataset

Table 5: Comparing classification performances of existing methods with applying these methods to 40 PD and 40 HC PPMI datasets used in
this paper.
ed

Existing Methods Applying Existing Methods


Research Work ACC ∗ SEN ∗ SP E ∗ ACC ∗ SEN ∗ SP E ∗
Salvatore et al.[7] 85.80 86.00 86.00 68.75 62.50 75.00
Babu et al.[10] 67.44† 75.65† 58.00† 62.50 57.00 68.00
pt

Rana et al.[18] 88.33 90.00 86.67 78.75 85.00 72.50


Pahuja et al.[43] 81.15† 90.34† 91.56† 75.00 72.50 77.50
Proposed Method - - - 95.00 90.00 100
ce

*
ACC:accuracy(%), SEN:sensitivity(%), SPE:specificity(%).

SVM classification performances reported in [10] and [43].
Ac

compare the performance of the proposed method with 6. Conclusion


the one provided in [43], only the SVM classifier is applied
to our dataset and the classification accuracy of 75.00% Recently, the diagnosis of neurodegenerative diseases
is obtained as provided in Table 5. As it is known, the based on the neuroimaging data has been extensively stud-
GA has unguided mutation. The mutation operator in GA ied. In this paper, the classification of PD apart from HC
adds a randomly generated number to the parameter of have been investigated by using five different classification
an individual in the population. Therefore, it converges approaches and five different ranking methods followed
slowly which makes it computationally expensive. with an adaptive Fisher criterion stopping method feature
As seen from the Table 5, the proposed method out- selection technique. A decision fusion method using ma-
performs similar studies reported in the state-of-the-art by jority voting is taken into account in order to enhance
means of classication accuracies. the classification performance. The experiments are per-
formed for the GM tissue, WM tissue, and the fusion of
GM and WM ones. The experiments indicate that the
CFS approach provides the best results with all five clas-
sification methods. Additionally, the SVM provides the
10

Page 10 of 13
highest classification performances for all five feature se- imaging for the diagnosis of parkinson’s disease, Journal of
lection techniques. Therefore, comparing to the individual Neural Transmission 124 (8) (2017) 915–964. doi:10.1007/
s00702-017-1717-8.
classification performances of the classifiers ,the fusion of URL https://doi.org/10.1007/s00702-017-1717-8
them using the CFS approach improves the classification [10] G. S. Babu, S. Suresh, B. S. Mahanand, A novel pbl-mcrbfn-rfe
performance importantly. The obtained classification ac- approach for identification of critical brain regions responsible
curacies with the proposed methodology are 77.50% and for parkinsons disease, Expert Systems with Applications 41 (2)
(2014) 478 – 488. doi:https://doi.org/10.1016/j.eswa.2013.
81.25% for the individual GM and WM tissues, respectively. 07.073.
However, the fusion of GM and WM modalities improves URL http://www.sciencedirect.com/science/article/pii/
the classification accuracy of the proposed methodology up S0957417413005605
to 95.00%. [11] R. Schwartz, K. Rothermich, S. A. Kotz, M. D. Pell, Unal-
tered emotional experience in parkinsons disease: Pupillometry

t
and behavioral evidence, Journal of Clinical and Experimen-

ip
Acknowledgement tal Neuropsychology 40 (3) (2018) 303–316, pMID: 28669253.
doi:10.1080/13803395.2017.1343802.
Funding sources: None URL https://doi.org/10.1080/13803395.2017.1343802
[12] A. Pilotto, E. Premi, S. Paola Caminiti, L. Presotto, R. Turrone,

cr
A. Alberici, B. Paghera, B. Borroni, A. Padovani, D. Perani,
References Single-subject spm fdg-pet patterns predict risk of dementia
[1] E. Adeli-Mosabbeb, C.-Y. Wee, L. An, F. Shi, D. Shen, Joint progression in parkinson disease, Neurology.
feature-sample selection and robust classification for parkinson’s [13] J. Pasquini, R. Ceravolo, Z. Qamhawi, J.-Y. Lee,

us
disease diagnosis, in: Medical Computer Vision: Algorithms for G. Deuschl, D. J. Brooks, U. Bonuccelli, N. Pavese,
Big Data, Springer International Publishing, 2016, pp. 127–136. Progression of tremor in early stages of parkinsons dis-
[2] C. Ornelas-Vences, L. P. Sanchez-Fernandez, L. A. Sanchez- ease: a clinical and neuroimaging study, Brain 141 (3)
Perez, A. Garza-Rodriguez, A. Villegas-Bastida, Fuzzy inference (2018) 811–821. arXiv:/oup/backfile/content_public/
model evaluating turn for parkinsons disease patients, Computers journal/brain/141/3/10.1093_brain_awx376/3/awx376.pdf,
in Biology and Medicine 89 (2017) 379 – 388. doi:https://doi.
org/10.1016/j.compbiomed.2017.08.026.
[3] S. Rathore, M. Habes, M. A. Iftikhar, A. Shacklett, C. Da-
vatzikos, A review on neuroimaging-based classification studies
an
[14]
doi:10.1093/brain/awx376.
URL +http://dx.doi.org/10.1093/brain/awx376
C.-H. Lin, C.-M. Chen, M.-K. Lu, C.-H. Tsai, J. C. Chiou, J.-R.
Liao, J.-R. Duann, Vbm reveals brain volume differences be-
and associated feature extraction methods for alzheimer’s dis- tween parkinsons disease and essential tremor patients 7 (2013)
ease and its prodromal stages, NeuroImage 155 (2017) 530 – 548. 247. doi:10.3389/fnhum.2013.00247.
M
doi:https://doi.org/10.1016/j.neuroimage.2017.03.057. URL https://www.frontiersin.org/article/10.3389/fnhum.
URL http://www.sciencedirect.com/science/article/pii/ 2013.00247
S1053811917302823 [15] I. Beheshti, N. Maikusa, H. Matsuda, H. Demirel, G. Anbarjafari,
[4] M. R. Arbabshirani, S. Plis, J. Sui, V. D. Calhoun, Single sub- Histogram-based feature extraction from individual gray matter
ject prediction of brain disorders in neuroimaging: Promises similarity-matrix for alzheimer’s disease classification., Journal
ed

and pitfalls, NeuroImage 145 (2017) 137 – 165, individual Sub- of Alzheimer’s disease : JAD 55 4 (2017) 1571–1582.
ject Prediction. doi:https://doi.org/10.1016/j.neuroimage. [16] U. Ellfolk, J. Joutsa, J. O Rinne, R. Parkkola, P. Jokinen,
2016.02.079. M. Karrasch, Brain volumetric correlates of memory in early
URL http://www.sciencedirect.com/science/article/pii/ parkinson’s disease 3.
S105381191600210X [17] S.-H. Lee, J. S. Lim, Parkinsons disease classification using
pt

[5] B. Rana, A. Juneja, M. Saxena, S. Gudwani, S. S. Kumaran, gait characteristics and wavelet-based feature extraction, Ex-
R. Agrawal, M. Behari, Regions-of-interest based automated pert Systems with Applications 39 (8) (2012) 7338 – 7344.
diagnosis of parkinsons disease using t1-weighted mri, Ex- doi:https://doi.org/10.1016/j.eswa.2012.01.084.
pert Systems with Applications 42 (9) (2015) 4506 – 4516. URL http://www.sciencedirect.com/science/article/pii/
ce

doi:https://doi.org/10.1016/j.eswa.2015.01.062. S0957417412000978
URL http://www.sciencedirect.com/science/article/pii/ [18] B. Rana, A. Juneja, M. Saxena, S. Gudwani, S. S. Kumaran,
S0957417415000858 R. K. Agrawal, M. Behari, Voxel-based morphometry and mini-
[6] D. Joshi, A. Khajuria, P. Joshi, An automatic non-invasive mum redundancy maximum relevance method for classification
method for parkinson’s disease classification, Computer Methods of parkinson’s disease and controls from t1-weighted mri, in:
Ac

and Programs in Biomedicine 145 (2017) 135 – 145. doi:https: Proceedings of the Tenth Indian Conference on Computer Vi-
//doi.org/10.1016/j.cmpb.2017.04.007. sion, Graphics and Image Processing, ICVGIP ’16, 2016, pp.
[7] C. Salvatore, A. Cerasa, I. Castiglioni, F. Gallivanone, 22:1–22:6.
A. Augimeri, M. Lopez, G. Arabia, M. Morelli, M. Gilardi, [19] I. Beheshti, H. Demirel, F. Farokhian, C. Yang, H. Matsuda,
A. Quattrone, Machine learning on brain mri data for differen- Structural mri-based detection of alzheimer’s disease using fea-
tial diagnosis of parkinson’s disease and progressive supranuclear ture ranking and classification error, Computer Methods and
palsy, Journal of Neuroscience Methods 222 (2014) 230 – 237. Programs in Biomedicine 137 (2016) 177 – 193. doi:https:
doi:https://doi.org/10.1016/j.jneumeth.2013.11.016. //doi.org/10.1016/j.cmpb.2016.09.019.
URL http://www.sciencedirect.com/science/article/pii/ URL http://www.sciencedirect.com/science/article/pii/
S0165027013003993 S0169260716301055
[8] R. Pizarro, X. Cheng, A. Barnett, H. Lemaitre, B. Verchin- [20] L. Christopher, Y. Koshimori, A. E. Lang, M. Criaud, A. P.
ski, A. Goldman, E. Xiao, Q. Luo, K. Berman, J. Callicott, Strafella, Uncovering the role of the insula in non-motor symp-
D. Weinberger, V. Mattay, Automated quality assessment of toms of parkinsons disease, Brain 137 (8) (2014) 2143–2154.
structural magnetic resonance brain images based on a super- [21] B. Rana, A. Juneja, M. Saxena, S. Gudwani, S. Senthil Ku-
vised machine learning algorithm, Frontiers in Neuroinformatics maranc, R. Agrawal, M. Behari, Relevant 3d local binary pat-
10. doi:10.3389/fninf.2016.00052. tern based features from fused feature descriptor for differential
URL https://www.frontiersin.org/article/10.3389/fninf. diagnosis of parkinsons disease using structural mri 34 (2017)
2016.00052 134–143. doi:https://doi.org/10.1016/j.bspc.2017.01.007.
[9] B. Heim, F. Krismer, R. De Marzi, K. Seppi, Magnetic resonance URL http://www.sciencedirect.com/science/article/pii/

11

Page 11 of 13
S1746809417300058 classification of structural mri for the detection of alzheimers
[22] M. A. Hall, Correlation-based feature selection for machine disease, Computers in Biology and Medicine 64 (2015) 208 – 216.
learning, Tech. rep. (1999). doi:https://doi.org/10.1016/j.compbiomed.2015.07.006.
[23] L. Zhang, G. Sun, J. Guo, Feature selection for pattern classifica- URL http://www.sciencedirect.com/science/article/pii/
tion problems, in: Proceedings of the The Fourth International S0010482515002486
Conference on Computer and Information Technology, CIT ’04, [37] X. Jia, P. Liang, Y. Li, L. Shi, D. Wang, K. Li, Longitudi-
IEEE Computer Society, Washington, DC, USA, 2004, pp. 233– nal study of gray matter changes in parkinson disease., AJNR.
237. American journal of neuroradiology 36 12 (2015) 2219–26.
URL http://dl.acm.org/citation.cfm?id=1025116.1025310 [38] E. Adeli, G. Wu, B. Saghafi, L. A. andFeng Shi, D. Shen, Kernel-
[24] G. Roffo, Ranking to learn and learning to rank: On the based joint feature selection and max-margin classification for
role of ranking in pattern recognition applications, CoRR early diagnosis of parkinsons disease, Sci. Rep. 7, 41069, 2017.
abs/1706.05933. arXiv:1706.05933. doi:doi:10.1038/srep41069.
URL http://arxiv.org/abs/1706.05933 URL https://www.nature.com/articles/srep41069.pdf

t
[25] K. Kira, L. A. Rendell, The feature selection problem: Tradi- [39] Effects of different covariates and contrasts on classification of

ip
tional methods and a new algorithm, in: Proceedings of the parkinson’s disease using structural mri, Computers in Biology
Tenth National Conference on Artificial Intelligence, AAAI’92, and Medicine 99 (2018) 173 – 181. doi:https://doi.org/10.
AAAI Press, 1992, pp. 129–134. 1016/j.compbiomed.2018.05.006.
URL http://dl.acm.org/citation.cfm?id=1867135.1867155 URL http://www.sciencedirect.com/science/article/pii/

cr
[26] X. He, D. Cai, P. Niyogi, Laplacian score for feature selection, in: S0010482518301136
Proceedings of the 18th International Conference on Neural In- [40] R. J. Urbanowicz, M. Meeker, W. L. Cava, R. S. Olson, J. H.
formation Processing Systems, NIPS’05, MIT Press, Cambridge, Moore, Relief-based feature selection: Introduction and review,
MA, USA, 2005, pp. 507–514. CoRR abs/1711.08421. arXiv:1711.08421.

us
URL http://dl.acm.org/citation.cfm?id=2976248.2976312 URL http://arxiv.org/abs/1711.08421
[27] D. Cai, C. Zhang, X. He, Unsupervised feature selection for [41] L. Du, Y.-D. Shen, Unsupervised feature selection with adaptive
multi-cluster data, in: Proceedings of the 16th ACM SIGKDD structure learning, in: KDD, 2015.
International Conference on Knowledge Discovery and Data [42] A. Narasimhamurthy, Theoretical bounds of majority voting
Mining, KDD ’10, ACM, New York, NY, USA, 2010, pp. 333– performance for a binary classification problem, IEEE Trans-
342. doi:10.1145/1835804.1835848.
URL http://doi.acm.org/10.1145/1835804.1835848
[28] Y. Yang, H. T. Shen, Z. Ma, Z. Huang, X. Zhou, L2,1-norm
regularized discriminative feature selection for unsupervised
an
[43]
actions on Pattern Analysis and Machine Intelligence 27 (12)
(2005) 1988–1995. doi:10.1109/TPAMI.2005.249.
G. Pahuja, T. N. Nagabhushan, A novel ga-elm approach for
parkinson’s disease detection using brain structural t1-weighted
learning, in: Proceedings of the Twenty-Second Interna- mri data, in: 2016 Second International Conference on Cognitive
tional Joint Conference on Artificial Intelligence - Volume Computing and Information Processing (CCIP), 2016, pp. 1–6.
M
Volume Two, IJCAI’11, AAAI Press, 2011, pp. 1589–1594. doi:10.1109/CCIP.2016.7802848.
doi:10.5591/978-1-57735-516-8/IJCAI11-267.
URL http://dx.doi.org/10.5591/978-1-57735-516-8/
IJCAI11-267
[29] C. Bouveyron, C. Brunet, Theoretical and practical consid-
ed

erations on the convergence properties of the fisher-em algo-


rithm, Journal of Multivariate Analysis 109 (2012) 29 – 41.
doi:https://doi.org/10.1016/j.jmva.2012.02.012.
URL http://www.sciencedirect.com/science/article/pii/
S0047259X1200053X
pt

[30] I. Beheshti, H. Demirel, Feature-ranking-based alzheimers dis-


ease classification from structural mri, Magnetic Resonance
Imaging 34 (3) (2016) 252 – 263. doi:https://doi.org/10.
1016/j.mri.2015.11.009.
ce

URL http://www.sciencedirect.com/science/article/pii/
S0730725X15002945
[31] H. D. Tagare, C. DeLorenzo, S. Chelikani, L. Saperstein, R. K.
Fulbright, Voxel-based logistic analysis of ppmi control and
parkinson’s disease datscans, NeuroImage 152 (2017) 299 – 311.
Ac

doi:https://doi.org/10.1016/j.neuroimage.2017.02.067.
URL http://www.sciencedirect.com/science/article/pii/
S1053811917301787
[32] D. Jain, V. Singh, Feature selection and classification systems
for chronic disease prediction: A review, Egyptian Informatics
Journaldoi:https://doi.org/10.1016/j.eij.2018.03.002.
URL http://www.sciencedirect.com/science/article/pii/
S1110866517300294
[33] G. Roffo, Report: Feature selection techniques for classification,
CoRR abs/1607.01327. arXiv:1607.01327.
URL http://arxiv.org/abs/1607.01327
[34] C.-C. Chang, C.-J. Lin, LIBSVM: A library for support vec-
tor machines, ACM Transactions on Intelligent Systems and
Technology 2 (2011) 27:1–27:27, software available at http:
//www.csie.ntu.edu.tw/~cjlin/libsvm.
[35] F. Farokhian, I. Beheshti, D. Sone, H. Matsuda, Comparing
cat12 and vbm8 for detecting brain morphological abnormalities
in temporal lobe epilepsy, in: Front. Neurol., 2017.
[36] I. Beheshti, H. Demirel, Probability distribution function-based

12

Page 12 of 13
Highlights

• The f-contrast is used for obtaining the 3D masks of gray matter and

t
white matter.

ip
• Five different classification and five different feature selection methods
are used.

cr
• An adaptive Fisher criterion is used to select the number of top ranked
features.

us
• In PD detection, a source fusion technique enhances the classification
performance.

an
• In PD detection, a decision fusion technique improves the classification
performance.
M
d
p te
ce
Ac

Page 13 of 13

You might also like