Professional Documents
Culture Documents
Abstract— The objective of this paper is to automatically independence between distributed and neuronal units which
diagnose the mental state disorder named schizophrenia by using are often spatially remote, in various brain regions. It is
multimodal features which are extracted from Magnetic Reso- observed that in schizophrenia patients, the brain connectivity
nance Imaging (MRI) brain scans. The aim is to achieve highest
possible classification (binary) accuracy to achieve best possible is lower and also is different as compared to a healthy
prediction of the schizophrenia diagnosis. The importance of brain connectivity. White matter is associated with structural
feature selection in combination with fine-tuning the parameters connectivity whereas grey matter is associated with functional
of Machine Learning classifiers to solve this problem is explained. connectivity. In a typical schizophrenia patient, there is fewer
Various supervised Machine Learning classifiers were employed amounts of white matter and many complex alterations in
and compared with themselves and then with existing systems.
The proposed solution achieved AUC score of 0.9473 and an functional connectivity related to the grey matter. In the past
accuracy of 0.9412 as opposed to till date best existing system’s two decades, there have been considerable advancements in
AUC score of 0.928. neuroimaging techniques to study structural and functional
Index Terms—MRI Scan, Support Vector Machine (SVM), brain connectivity. This helped to get in-depth information
Boruta Feature Selection, Machine Learning Classifier, about various mental disorders like schizophrenia. There is
Schizophrenia, Functional Network Connectivity (FNC), Source
Based Morphometry (SBM). a vast amount of work done to solve this kind of problem.
The most popular approaches [3]–[5] with highest accuracy
I. I NTRODUCTION till-date used Gaussian Process classifier, feature trimming
with random forest and training with Support Vector Machine
Schizophrenia is a chronic and severe mental state disorder (Gaussian Kernel), and Distance Weighted Discrimination
affecting thinking, feeling and also the behavior of a per- methods respectively.
son [1]. About 1% of the total population in the world suffers The highlight of this study is to tackle the high dimen-
from schizophrenia. The patients suffering from schizophrenia sional data by using suitable feature selection technique. A
may feel that they have completely lost touch with reality. comparative analysis is also performed between the results
Schizophrenia is a rare mental disorder but if a patient is obtained without feature selection and with feature selection.
diagnosed with such a disease, then the symptoms can be The paper kick starts with section II in which the survey
very disabling. Mostly, the symptoms start at the age of is conducted on the existing approaches and methods which
16 to 30. These symptoms fall in three categories namely achieved considerable accuracy. Section III describes the pro-
positive (hallucinations, delusions, thought disorder), negative posed methodology aimed to get a better performance in terms
(no or reduced facial expressions, negligible pleasure feeling, of accuracy as compared with existing approaches. Finally,
reduced speaking) and cognitive (problems in focusing and the results obtained in this study are compared with existing
memorizing) [2]. The causes of this mental disorder are techniques in the section IV.
unknown, hence it is very difficult to treat schizophrenia.
Also, the mental disorders symptoms and the lack of standard
II. L ITERATURE S URVEY
biologically-based clinical tests furthers increases the difficulty
to treat schizophrenia. There are various existing systems [6]–[9] that aimed to
This article aims to propose a novel approach to accurately solve the problem. A literature survey is conducted for the top
classify between the patients having schizophrenia disease and three approaches in order with respect to their performance.
healthy persons. To make prediction accurate, both structural The best performing model [3] to solve this problem used
connectivity as well as functional connectivity has to be con- Gaussian process (GP) classifier. This GP was implemented
sidered as both these measures when combined provides much along with prior distribution and scaled by probit transforma-
more comprehensive description [1]. Structural connectivity tion is given by the following equation,
refers to how the neurons are physically connected with respect
yi f (xi )
to synaptic strength while functional connectivity is more of
a statistical concept as it captures deviations from statistical p(yi = 1|xi ) = Φ(yi f (xi )) = N (z|0, 1)dz (1)
−∞
Φ(.) is called as the Gaussian CDF (Cumulative Distribu- problem statement as the data is of high dimensional. Distance
tion Function). The observations are assumed to be drawn Weighted Discrimination (DWD) was adopted which is quite
from Bernoulli distribution with the probability of success as similar to SVM. DWD has two important features namely,
p(yi = 1|xi ). The aim is to consider both the linear structure • Explicitly accounting for distance between each sample
as well as the linearities (short-scale) in latent space. Hence, and the margin.
the covariance function is set such that it represents a linear • Penalizing the samples which are clustered very close to
combination of three separate coveriance functions given by the solution margin. This assigned penalty prevents the
the following equation, model from overfitting.
v=5/2
k(x, x ) = kconst. (x, x ) + klinear (x, x ) + kM atérn (x, x ) The trained model achieved an AUC score of 0.91282.
(2) III. S CHIZOPHRENIA D IAGNOSIS P REDICTION
where, the individual covaiance functions are given as,
The proposed methodology for schizophrenia diagnosis
kconst. (x, x ) = α1 prediction consists of the following steps:
(i) Data Acquisition/Data Collection.
klinear (x, x ) = α2 xT x (ii) Data Pre-processing.
(iii) Feature Selection.
√ (iv) Classification Model Training.
v=5/2 5r 5r
kM atérn (x, x ) = α3 (1 + + 2) (v) Results Evaluation.
α4 3α4
Each step plays an important role contributing towards the
where r stands for x − x . The covariance function is final performance in terms of prediction accuracy.
represented as the linear combination of those three separate A. Data Acquisition
covariance functions. α = α1 , α2 , α3 , α4 are the hyperpa-
Data collection or Data Acquisition can be considered as an
rameters. To train the model, Laplace approximation scheme
important step in neuroscience as a slight error in collecting
was used until convergence. For latent sampling and for
may cause inaccuracy in prediction after the predictive model
hyperparameters, Elliptical Slice Sampling and Surrogate Slice
is built. The data collection procedure [10] was carried at
Sampler was used respectively. This approach achieved an
Mind Research Network which was funded by a Center
AUC score of 0.928 which is till-date best performing model.
of Biomedical Research Excellence grant. The dataset that
The approach [4] based on feature-trimming proposed by A.
is used in this study consists of two types of information
V. Lebedev consisted of two stages. The first stage comprised
namely, Functional Network Connectivity (FNC, [11]) and
of feature selection where the training data is appended by
Source based Morphometry (SBM, [12]). The FNC data is
both the noisy feature set and normally distributed values. This
collected by Functional Magnetic Resonance Imaging (fMRI)
merged feature set is then passed as an input to random forest
scans which can be considered as functional modality attribute
algorithm. The important features or the most discriminating
representing connecting patterns or synchronization between
features were selected by choosing only those features which
a particular subject’s different cortical regions. On the other
scored the relevance above the noisy one based on Gini
hand, SBM data is collected with the help of structural MRI
coefficient. The features which were extracted from the feature
scans representing the grey matter concentration in different
selection step were inputted to an SVM classifier based on
cortical regions of the subject’s brain.
the Gaussian kernel. The entire process can be summarized as
The data consists of total 86 rows where each row represent-
follows:
ing a data extracted from a particular subject. The FNC data
(i) A random vector is introduced in the feature list. is of 378 dimensional and SBM data is of 32 dimensional. So,
(i) Then, the feature importance score for each feature is this study is dealing with pretty high dimensional data. For,
calculated which is based on the mean decrease of the prediction to be accurate, both the types of data have to be
‘Gini index’ (outputted by Random Forest classifier considered. Hence, the final data is of 410-dimensional. The
algorithm). n
data can be represented as, D = {(xi , yi )}i = 1 where, xi is
(iii) The attributes scoring higher importance score than the total number of samples or instances and each instance is
dummy variables are selected. of R410 dimensional. yi {0, 1} is the target attribute. The data
(iv) The final dataset obtained as an output of step (iii) is is randomly split in 60% training data and remaining 40% test
then used to train the Support Vector Machine with the data using stratified sampling.
Gaussian kernel.
This approach scored an AUC score of 0.923 which is second B. Data Pre-processing
best performing model. The data from both the sources (FNC and SBM) is com-
The third approach [5] was based on a method from the bined which resulted in the final data to be of 410 dimensional.
HDLSS (High-Dimensional Low Sample Size) class. HDLSS Due to the large variance and presence of outliers in the values
class methods were optimized to deal with less amount of data of many features of the combined dataset of FNC and SBM,
samples but having high dimensionality which is useful in this there is a need to normalize the data. Hence, the first step
~ 2 ~
2018 4th International Conference on Computer and Information Sciences (ICCOINS)
~ 3 ~
2018 4th International Conference on Computer and Information Sciences (ICCOINS)
1, then it represents the maximum amount of disorder in the 5) Linear Discriminant Analysis: LDA [20] is primarily
sample. Entropy is given by the following equation, used as a dimensional reduction technique and it shares some
c similar properties with PCA (Principal Component Analysis).
Entropy(S) = pi log2 (pi ) (4) The major difference between PCA and LDA is that LDA
i=1 models the difference between the classes of the data whereas
Next, the algorithm has to decide which feature to split. The PCA does not. LDA performed quite poor on this application
algorithm then finds out the information gain, which is the by scoring an accuracy of 79.41% and AUC score of 0.7947.
resulting homogeneity obtained when a feature is split which 6) Gaussian Process Classifier: Gaussian Processes
is given by the following equation, (GPs) [21] aims to extend multivariate Gaussian distributions
Inf oGain(F ) = Entropy(S1 ) − Entropy(S2 ) (5) to infinite dimensionality. In formal terms, a GP tend to
generate the data which is located over the entire same domain
After the initial split, there are many partitions of data are in such a way that a multivariate. It is a non-parametric
created, hence we need to find entropy across all the partitions, approach where it finds a distribution over the possible
which is given as, functions f (x) that are consistent with the observed data. GP
n
classifier was the best classifier in the existing system. In this
Entropy(S) = wi Entropy(Pi ) (6) paper, five kernels of GP classifier is implemented namely,
i=1
laplacedot, vanilladot, polydot, rbfdot which achieved AUC
C5.0 achieved its highest accuracy of 91.18% and AUC of 0.924, 0.8807018, 0.8807, 0.8421 respectively. Laplacedot
score of 0.91 when adaboost is applied and ntrial (number of kernel performed well with an accuracy of 92% while rbfdot
trials) parameter is set to 90. kernel performed comparatively poor with an accuracy of
3) Random Forest: Random forests [19] are an ensemble
82.35%.
based machine learning algorithms. It is a combination of two
7) Naı̈ve Bayes: Naı̈ve Bayes [22] performs well against
concepts, bagging, and random feature selection. The model
categorical target attributes. It makes the assumption that all
uses a vote to combine prediction of the tree after it creates
predictors are independent and equally important, which is
an ensemble. Random forests require only a small portion
not applicable in most real-world scenarios. The algorithm
out of full feature dataset which allows random forests to
performs well when these assumptions are violated and holds
perform nicely against huge datasets. Continuous as well as
true even in extreme circumstances where strong dependencies
discrete target values can be predicted by random forests. The
are found among features. Naı̈ve Bayes achieved an accuracy
algorithm of the Random forest is as follows:
and AUC score of 85.29% and 0.8543 respectively.
1) For b = 1, ..., B
a) A sampling with replacement is performed for all
IV. R ESULTS AND D ISCUSSION
n training samples called Xb , Yb
b) A regression tree or a decision tree is trained on In this paper, 7 popular state-of-the-art machine learning
fb on Xb , Yb algorithms along with different kernels have been implemented
2) After training the classifier model, predictions for un- and the results with respect to different evaluation metrics is
known examples x can be done as follows, depicted. Table I and table II depicts the comparative analysis
B between performance measures of the algorithms implemented
1 ˆ
fˆ = f (x ) before and after feature selection respectively. With the help
B
b=1 of these two tables and figure 4 it can be deduced that feature
Two important parameters have been considered while im- selection does play a vital role to achieve high performance in
plementing random forests, the number of the tree (ntree) and terms of accuracy. After feature selection, to be precise, SVM
the number of features the algorithm selects during each itera- with RBF (Radial Basis Function) kernel machine learning
tion. Random Forests perform better when the ntree parameter algorithm topped the chart with 94.12% accuracy and AUC
is increased up to a certain limit. The Random Forest achieved of 0.9473 which is followed by Gaussian Classifier with a
considerable accuracy and AUC score of 0.91.18% and 0.921 laplacedot kernel having 92% accuracy and AUC of 0.924.
respectively. Just behind the Gaussian Classifier with the laplacedot kernel,
4) K-Nearest Neighbour (K-NN): Modeling complex re- came Random Forest algorithm having an accuracy of 91.18%
lationships between predictors and target is made easy with and AUC of 0.924 when the number of trees is set to 150
KNN. No assumptions are made about underlying data dis- (ntree=150). Support Vector Machine with sigmoid kernel
tribution in KNN. KNN used a simple Euclidean distance achieved lowest accuracy and AUC of 70.59% and 0.7157
formula. Finding the value of k (number of clusters) is the respectively. Figure 2 represents the comparative analysis of all
main challenge. We can reduce the impact of variance by the algorithms implemented with respect to various evaluation
reducing the value of k but this increases the bias considerably. parameters. Table III compares the AUC score of this paper
Hence we have to take a suitable value for k between two and existing systems. It clearly shows that the approach of
extremes. K-NN achieved an accuracy of 79.41% and an AUC this paper outperforms the existing systems by quite a good
score of 0.81578. margin.
~ 4 ~
2018 4th International Conference on Computer and Information Sciences (ICCOINS)
TABLE I
P ERFORMANCE OF M ACHINE L EARNING A LGORITHMS BEFORE F EATURE S ELECTION (H IGHEST TO L OWEST )
TABLE II
P ERFORMANCE OF M ACHINE L EARNING A LGORITHMS AFTER F EATURE S ELECTION (H IGHEST TO L OWEST )
TABLE III
C OMPARATIVE A NALYSIS OF AUC SCORE WITH EXISTING SYSTEMS
Algorithm AUC
SVM - Radial Basis 0.9473
Solin, Arno et. al 0.928
Lebedev and Alexander V 0.923
Koncevičius and Karolis 0.91282
~ 5 ~
2018 4th International Conference on Computer and Information Sciences (ICCOINS)
~ 6 ~