You are on page 1of 6

2018 4th International Conference on Computer and Information Sciences (ICCOINS)

Optimization of Schizophrenia Diagnosis


Prediction using Machine Learning Techniques
Anant V Nimkar1 and Divesh R Kubal2
1,2 Department of Computer Engineering, Sardar Patel Institute of Technology,
Mumbai, India 400 058.
1anant nimkar@spit.ac.in, 2diveshkubal@spit.ac.in

Abstract— The objective of this paper is to automatically independence between distributed and neuronal units which
diagnose the mental state disorder named schizophrenia by using are often spatially remote, in various brain regions. It is
multimodal features which are extracted from Magnetic Reso- observed that in schizophrenia patients, the brain connectivity
nance Imaging (MRI) brain scans. The aim is to achieve highest
possible classification (binary) accuracy to achieve best possible is lower and also is different as compared to a healthy
prediction of the schizophrenia diagnosis. The importance of brain connectivity. White matter is associated with structural
feature selection in combination with fine-tuning the parameters connectivity whereas grey matter is associated with functional
of Machine Learning classifiers to solve this problem is explained. connectivity. In a typical schizophrenia patient, there is fewer
Various supervised Machine Learning classifiers were employed amounts of white matter and many complex alterations in
and compared with themselves and then with existing systems.
The proposed solution achieved AUC score of 0.9473 and an functional connectivity related to the grey matter. In the past
accuracy of 0.9412 as opposed to till date best existing system’s two decades, there have been considerable advancements in
AUC score of 0.928. neuroimaging techniques to study structural and functional
Index Terms—MRI Scan, Support Vector Machine (SVM), brain connectivity. This helped to get in-depth information
Boruta Feature Selection, Machine Learning Classifier, about various mental disorders like schizophrenia. There is
Schizophrenia, Functional Network Connectivity (FNC), Source
Based Morphometry (SBM). a vast amount of work done to solve this kind of problem.
The most popular approaches [3]–[5] with highest accuracy
I. I NTRODUCTION till-date used Gaussian Process classifier, feature trimming
with random forest and training with Support Vector Machine
Schizophrenia is a chronic and severe mental state disorder (Gaussian Kernel), and Distance Weighted Discrimination
affecting thinking, feeling and also the behavior of a per- methods respectively.
son [1]. About 1% of the total population in the world suffers The highlight of this study is to tackle the high dimen-
from schizophrenia. The patients suffering from schizophrenia sional data by using suitable feature selection technique. A
may feel that they have completely lost touch with reality. comparative analysis is also performed between the results
Schizophrenia is a rare mental disorder but if a patient is obtained without feature selection and with feature selection.
diagnosed with such a disease, then the symptoms can be The paper kick starts with section II in which the survey
very disabling. Mostly, the symptoms start at the age of is conducted on the existing approaches and methods which
16 to 30. These symptoms fall in three categories namely achieved considerable accuracy. Section III describes the pro-
positive (hallucinations, delusions, thought disorder), negative posed methodology aimed to get a better performance in terms
(no or reduced facial expressions, negligible pleasure feeling, of accuracy as compared with existing approaches. Finally,
reduced speaking) and cognitive (problems in focusing and the results obtained in this study are compared with existing
memorizing) [2]. The causes of this mental disorder are techniques in the section IV.
unknown, hence it is very difficult to treat schizophrenia.
Also, the mental disorders symptoms and the lack of standard
II. L ITERATURE S URVEY
biologically-based clinical tests furthers increases the difficulty
to treat schizophrenia. There are various existing systems [6]–[9] that aimed to
This article aims to propose a novel approach to accurately solve the problem. A literature survey is conducted for the top
classify between the patients having schizophrenia disease and three approaches in order with respect to their performance.
healthy persons. To make prediction accurate, both structural The best performing model [3] to solve this problem used
connectivity as well as functional connectivity has to be con- Gaussian process (GP) classifier. This GP was implemented
sidered as both these measures when combined provides much along with prior distribution and scaled by probit transforma-
more comprehensive description [1]. Structural connectivity tion is given by the following equation,
refers to how the neurons are physically connected with respect
 yi f (xi )
to synaptic strength while functional connectivity is more of
a statistical concept as it captures deviations from statistical p(yi = 1|xi ) = Φ(yi f (xi )) = N (z|0, 1)dz (1)
−∞

978-1-5386-4744-8/18/$31.00 ©2018 IEEE ~ 1 ~


2018 4th International Conference on Computer and Information Sciences (ICCOINS)

Φ(.) is called as the Gaussian CDF (Cumulative Distribu- problem statement as the data is of high dimensional. Distance
tion Function). The observations are assumed to be drawn Weighted Discrimination (DWD) was adopted which is quite
from Bernoulli distribution with the probability of success as similar to SVM. DWD has two important features namely,
p(yi = 1|xi ). The aim is to consider both the linear structure • Explicitly accounting for distance between each sample
as well as the linearities (short-scale) in latent space. Hence, and the margin.
the covariance function is set such that it represents a linear • Penalizing the samples which are clustered very close to
combination of three separate coveriance functions given by the solution margin. This assigned penalty prevents the
the following equation, model from overfitting.
   v=5/2 
k(x, x ) = kconst. (x, x ) + klinear (x, x ) + kM atérn (x, x ) The trained model achieved an AUC score of 0.91282.
(2) III. S CHIZOPHRENIA D IAGNOSIS P REDICTION
where, the individual covaiance functions are given as,
The proposed methodology for schizophrenia diagnosis

kconst. (x, x ) = α1 prediction consists of the following steps:
(i) Data Acquisition/Data Collection.
 
klinear (x, x ) = α2 xT x (ii) Data Pre-processing.
(iii) Feature Selection.
√ (iv) Classification Model Training.
v=5/2  5r 5r
kM atérn (x, x ) = α3 (1 + + 2) (v) Results Evaluation.
α4 3α4
  Each step plays an important role contributing towards the
 
where r stands for x − x . The covariance function is final performance in terms of prediction accuracy.
represented as the linear combination of those three separate A. Data Acquisition
covariance functions. α = α1 , α2 , α3 , α4 are the hyperpa-
Data collection or Data Acquisition can be considered as an
rameters. To train the model, Laplace approximation scheme
important step in neuroscience as a slight error in collecting
was used until convergence. For latent sampling and for
may cause inaccuracy in prediction after the predictive model
hyperparameters, Elliptical Slice Sampling and Surrogate Slice
is built. The data collection procedure [10] was carried at
Sampler was used respectively. This approach achieved an
Mind Research Network which was funded by a Center
AUC score of 0.928 which is till-date best performing model.
of Biomedical Research Excellence grant. The dataset that
The approach [4] based on feature-trimming proposed by A.
is used in this study consists of two types of information
V. Lebedev consisted of two stages. The first stage comprised
namely, Functional Network Connectivity (FNC, [11]) and
of feature selection where the training data is appended by
Source based Morphometry (SBM, [12]). The FNC data is
both the noisy feature set and normally distributed values. This
collected by Functional Magnetic Resonance Imaging (fMRI)
merged feature set is then passed as an input to random forest
scans which can be considered as functional modality attribute
algorithm. The important features or the most discriminating
representing connecting patterns or synchronization between
features were selected by choosing only those features which
a particular subject’s different cortical regions. On the other
scored the relevance above the noisy one based on Gini
hand, SBM data is collected with the help of structural MRI
coefficient. The features which were extracted from the feature
scans representing the grey matter concentration in different
selection step were inputted to an SVM classifier based on
cortical regions of the subject’s brain.
the Gaussian kernel. The entire process can be summarized as
The data consists of total 86 rows where each row represent-
follows:
ing a data extracted from a particular subject. The FNC data
(i) A random vector is introduced in the feature list. is of 378 dimensional and SBM data is of 32 dimensional. So,
(i) Then, the feature importance score for each feature is this study is dealing with pretty high dimensional data. For,
calculated which is based on the mean decrease of the prediction to be accurate, both the types of data have to be
‘Gini index’ (outputted by Random Forest classifier considered. Hence, the final data is of 410-dimensional. The
algorithm). n
data can be represented as, D = {(xi , yi )}i = 1 where, xi is
(iii) The attributes scoring higher importance score than the total number of samples or instances and each instance is
dummy variables are selected. of R410 dimensional. yi  {0, 1} is the target attribute. The data
(iv) The final dataset obtained as an output of step (iii) is is randomly split in 60% training data and remaining 40% test
then used to train the Support Vector Machine with the data using stratified sampling.
Gaussian kernel.
This approach scored an AUC score of 0.923 which is second B. Data Pre-processing
best performing model. The data from both the sources (FNC and SBM) is com-
The third approach [5] was based on a method from the bined which resulted in the final data to be of 410 dimensional.
HDLSS (High-Dimensional Low Sample Size) class. HDLSS Due to the large variance and presence of outliers in the values
class methods were optimized to deal with less amount of data of many features of the combined dataset of FNC and SBM,
samples but having high dimensionality which is useful in this there is a need to normalize the data. Hence, the first step

~ 2 ~
2018 4th International Conference on Computer and Information Sciences (ICCOINS)

an important role as it performs a two-sided equality test


for each attribute whose importance is undetermined.
(vi) The attributes scoring less MZSA score are considered
as ‘unimportant’ and are permanently removed from the
feature list.
(vii) Those attributes scoring slightly higher than MZSA are
considered as ‘important’ and are retained in feature list.
(viii) Finally all the shadow attributes are removed.
(ix) Repeat all the steps from (i) to (viii) until all the
variables get an importance score or the algorithm has
reached the specified threshold of random forest runs.
Boruta algorithm uses Z-score as the importance measure.
The computational runtime complexity in average and worst
case is O(NP) where N is the number of objects and P is
the number of attributes(in our case, P is 410) given as input
to Boruta algorithm. Hence, the algorithm took time in order
to converge to a result (depending upon the processing speed
and memory of machine/system). But it is worth waiting as the
results we get is of statistically significant selection of relevant
features. FSelector [15] and PCA(Principal Component Anal-
ysis) [16] feature selection methods were also used but that
proved to be ineffective for this dataset and Boruta emerged
Fig. 1. Flowchart: Schizophrenia Diagnosis Prediction
victorious.

D. Classification Model Training


before proceeding to build the final model, normalization is The features which were selected after performing feature
performed. The normalization step will convert the data values selection are used for building supervised Machine Learning
in the range between 0 and 1 thereby tackling the problem of model. Various algorithms with different kernels were imple-
outliers and large variance. The final normalized combined mented such as Support Vector Machine (SVM), Gaussian
data (FNC and SBM) is then passed to feature selection step. Classifier, C5.0 Decision Tree, Logistic Regression, Naı̈ve
Bayes, Random Forest, Linear Discriminant Analysis (LDA).
C. Feature Selection
1) Support Vector Machine: SVM can be simply thought
Feature selection [13] is considered as one of the most of as a surface that defines the boundary between various
important steps to solve this problem. After implementing this data points plotted. The goal of SVM [17] is to construct
step, it is confirmed that the final model performed much a maximum separating hyperplane which results in fairly
better. The objective is to minimize the total number of homogenous partitions of data on either side. The entire task
predictors or to remove redundant predictors without losing can be represented as a set of constraints of,
the information required to predict so that the model can train
fast, give more accuracy. 1
min →
− 2
In this application, Boruta algorithm [14] is used, which 2w
is an ensemble approach based on decision trees for feature subject to,
yi ( →

w→−
xi − b) ≥ 1, ∀→

selection. The Boruta algorithm can be considered as a random
xi (3)
forest algorithm wrapper. An important characteristic is that
there is no need of fine-tuning the parameters and the algo- In this paper, linear, radial basis, polynomial and sigmoid
rithm presents us with numerical estimates of features that kernels are used. The AUC score obtained by radial basis,
depicts their importance. The algorithm can be summarized polynomial, linear and sigmoid kernels are 0.9412, 0.8824,
with the help of following steps: 0.8824 and 0.7059 respectively. Radial Basis kernel performed
(i) The number of features are extended by adding copies well with accuracy of 94.73% while sigmoid SVM kernel
of all the variables. performed poorly with accuracy of 71.578%
(ii) These added attributes are shuffled so that the correla- 2) C5.0 Decision Tree: C5.0 [18] is a very popular imple-
tions among them can be removed. mentation of Decision Tree algorithm, which is an improve-
(iii) A Random Forest algorithm is run on this so-called ment over C4.5. This algorithm is easy to deploy and easy to
extended system resulting in Z scores computation. interpret. C5.0 can be used on small as well as large datasets.
(iv) The maximum Z score among the shadow attributes Only important attributes which are selected automatically are
(MZSA) is found and then a hit is assigned to each used. Purity is measured with entropy. A sample is completely
attribute which scored better than MZSA. MZSA plays homogeneous if the entropy value is 0. If the entropy value is

~ 3 ~
2018 4th International Conference on Computer and Information Sciences (ICCOINS)

1, then it represents the maximum amount of disorder in the 5) Linear Discriminant Analysis: LDA [20] is primarily
sample. Entropy is given by the following equation, used as a dimensional reduction technique and it shares some
c similar properties with PCA (Principal Component Analysis).
Entropy(S) = pi log2 (pi ) (4) The major difference between PCA and LDA is that LDA
i=1 models the difference between the classes of the data whereas
Next, the algorithm has to decide which feature to split. The PCA does not. LDA performed quite poor on this application
algorithm then finds out the information gain, which is the by scoring an accuracy of 79.41% and AUC score of 0.7947.
resulting homogeneity obtained when a feature is split which 6) Gaussian Process Classifier: Gaussian Processes
is given by the following equation, (GPs) [21] aims to extend multivariate Gaussian distributions
Inf oGain(F ) = Entropy(S1 ) − Entropy(S2 ) (5) to infinite dimensionality. In formal terms, a GP tend to
generate the data which is located over the entire same domain
After the initial split, there are many partitions of data are in such a way that a multivariate. It is a non-parametric
created, hence we need to find entropy across all the partitions, approach where it finds a distribution over the possible
which is given as, functions f (x) that are consistent with the observed data. GP
n
classifier was the best classifier in the existing system. In this
Entropy(S) = wi Entropy(Pi ) (6) paper, five kernels of GP classifier is implemented namely,
i=1
laplacedot, vanilladot, polydot, rbfdot which achieved AUC
C5.0 achieved its highest accuracy of 91.18% and AUC of 0.924, 0.8807018, 0.8807, 0.8421 respectively. Laplacedot
score of 0.91 when adaboost is applied and ntrial (number of kernel performed well with an accuracy of 92% while rbfdot
trials) parameter is set to 90. kernel performed comparatively poor with an accuracy of
3) Random Forest: Random forests [19] are an ensemble
82.35%.
based machine learning algorithms. It is a combination of two
7) Naı̈ve Bayes: Naı̈ve Bayes [22] performs well against
concepts, bagging, and random feature selection. The model
categorical target attributes. It makes the assumption that all
uses a vote to combine prediction of the tree after it creates
predictors are independent and equally important, which is
an ensemble. Random forests require only a small portion
not applicable in most real-world scenarios. The algorithm
out of full feature dataset which allows random forests to
performs well when these assumptions are violated and holds
perform nicely against huge datasets. Continuous as well as
true even in extreme circumstances where strong dependencies
discrete target values can be predicted by random forests. The
are found among features. Naı̈ve Bayes achieved an accuracy
algorithm of the Random forest is as follows:
and AUC score of 85.29% and 0.8543 respectively.
1) For b = 1, ..., B
a) A sampling with replacement is performed for all
IV. R ESULTS AND D ISCUSSION
n training samples called Xb , Yb
b) A regression tree or a decision tree is trained on In this paper, 7 popular state-of-the-art machine learning
fb on Xb , Yb algorithms along with different kernels have been implemented
2) After training the classifier model, predictions for un- and the results with respect to different evaluation metrics is

known examples x can be done as follows, depicted. Table I and table II depicts the comparative analysis
B between performance measures of the algorithms implemented
1 ˆ 
fˆ = f (x ) before and after feature selection respectively. With the help
B
b=1 of these two tables and figure 4 it can be deduced that feature
Two important parameters have been considered while im- selection does play a vital role to achieve high performance in
plementing random forests, the number of the tree (ntree) and terms of accuracy. After feature selection, to be precise, SVM
the number of features the algorithm selects during each itera- with RBF (Radial Basis Function) kernel machine learning
tion. Random Forests perform better when the ntree parameter algorithm topped the chart with 94.12% accuracy and AUC
is increased up to a certain limit. The Random Forest achieved of 0.9473 which is followed by Gaussian Classifier with a
considerable accuracy and AUC score of 0.91.18% and 0.921 laplacedot kernel having 92% accuracy and AUC of 0.924.
respectively. Just behind the Gaussian Classifier with the laplacedot kernel,
4) K-Nearest Neighbour (K-NN): Modeling complex re- came Random Forest algorithm having an accuracy of 91.18%
lationships between predictors and target is made easy with and AUC of 0.924 when the number of trees is set to 150
KNN. No assumptions are made about underlying data dis- (ntree=150). Support Vector Machine with sigmoid kernel
tribution in KNN. KNN used a simple Euclidean distance achieved lowest accuracy and AUC of 70.59% and 0.7157
formula. Finding the value of k (number of clusters) is the respectively. Figure 2 represents the comparative analysis of all
main challenge. We can reduce the impact of variance by the algorithms implemented with respect to various evaluation
reducing the value of k but this increases the bias considerably. parameters. Table III compares the AUC score of this paper
Hence we have to take a suitable value for k between two and existing systems. It clearly shows that the approach of
extremes. K-NN achieved an accuracy of 79.41% and an AUC this paper outperforms the existing systems by quite a good
score of 0.81578. margin.

~ 4 ~
2018 4th International Conference on Computer and Information Sciences (ICCOINS)

TABLE I
P ERFORMANCE OF M ACHINE L EARNING A LGORITHMS BEFORE F EATURE S ELECTION (H IGHEST TO L OWEST )

Algorithms Accuracy AUC kappa Sensitivity Specificity p-value


SVM - Linear 0.8529 0.8473 0.6996 0.8 0.8947 0.0002725
Logistic Regression 0.8529 0.8403 0.6953 0.7333 0.9474 2.73E-04
SVM - Sigmoid 0.8235 0.828 0.6471 0.8667 0.7895 0.01104
K-Nearest Neighbor 0.8235 0.8210526 0.6421 0.8 0.8421 1
Gaussian Classifier - polydot 0.8235 0.814 0.637 0.7333 0.8947 1.10E-03
Gaussian Classifier - rbfdot 0.7941 0.8017 0.5911 0.8667 0.7368 0.003731
Naive Bayes 0.8529 0.8 0.6996 0.8 0.8947 0.0002725
SVM - Radial Basis 0.7941 0.7947 0.5854 0.8 0.7895 0.003731
Gaussian Classifier - vanilladot 0.7941 0.7907 0.5735 0.6667 0.8947 3.73E-03
SVM - Polynomial 0.7059 0.73684 0.4426 1 0.4737 0.004427
Linear Discriminant Analysis 0.7353 0.728 0.4594 0.667 0.7895 0.02669
Random Forest (ntree = 150) 0.7353 0.728 0.4594 0.6667 0.7895 2.67E-02
C5.0 (Number of Trials = 90) 0.6471 0.6421 0.2842 0.6 0.6842 1.95E-01
Gaussian Classifier - laplacedot 0.4412 0.5 0 1 0 3.64E-05

TABLE II
P ERFORMANCE OF M ACHINE L EARNING A LGORITHMS AFTER F EATURE S ELECTION (H IGHEST TO L OWEST )

Algorithms Accuracy AUC kappa Sensitivity Specificity p-value


SVM - Radial Basis 0.9412 0.9473 0.8824 1 0.8947 0.4795
Gaussian Classifier - laplacedot 0.92 0.924 0.85 1 0.864 0.4795
Random Forest (ntree = 150) 0.9118 0.921 0.8247 1 0.8421 8.49E-06
Logistic Regression 0.9118 0.914 0.8223 0.9333 0.8947 8.49E-06
C5.0 (Number of Trials = 90) 0.9118 0.91 0.8247 1 0.8421 8.49E-06
SVM - Polynomial 0.8824 0.8947 0.7679 1 0.7895 0.1336
SVM - Linear 0.8824 0.8877 0.7647 0.9333 0.8421 0.6171
Gaussian Classifier - vanilladot 0.8824 0.8807018 0.7614 0.8667 0.8947 5.45E-05
Gaussian Classifier - polydot 0.8824 0.8807 0.7614 0.8667 0.8947 5.45E-05
Naive Bayes 0.8529 0.8543 0.7038 0.8667 0.8421 0.000272
Gaussian Classifier - rbfdot 0.8235 0.8421 0.6566 1 0.6842 0.04122
K-Nearest Neighbor 0.7941 0.8157895 0.602 1 0.6316 0.003731
Linear Discriminant Analysis 0.7941 0.7947 0.5854 0.8 0.7895 0.003731
SVM - Sigmoid 0.7059 0.7157895 0.4198 0.8 0.6316 0.3428

TABLE III
C OMPARATIVE A NALYSIS OF AUC SCORE WITH EXISTING SYSTEMS

Algorithm AUC
SVM - Radial Basis 0.9473
Solin, Arno et. al 0.928
Lebedev and Alexander V 0.923
Koncevičius and Karolis 0.91282

Implementation is characterized by 2 phases, namely with


feature selection and without feature selection. Using feature
selection machine learning algorithms resulted in achieving Fig. 2. Statistics after Feature Selection
significantly better accuracy. The paper is focused on feature
selection and parameter fine-tuning of each algorithm which in
turn helped to increase accuracy. Figure 3 depicts best three ric is considered. AUC (true positive/false negative) measure
performing algorithms with the highest accuracy and AUC is considered in this paper for making a comparative analysis
score in order. because AUC efficiently deals with situations there exists a
The comparison between other existing approaches is also very skewed sample distribution, and to avoid overfitting to
depicted in the table III. The approach presented in this paper a single class. Accuracy is straightforward and it deals with
outperforms all the top existing systems. To compare this pa- zeros and ones. But there are many classifiers which output a
per’s approach with existing systems, AUC performance met- probability value than 0 and 1. In such cases, it’s intuitive to

~ 5 ~
2018 4th International Conference on Computer and Information Sciences (ICCOINS)

functional approach,” Biological psychiatry, vol. 68, no. 1, pp. 61–69,


2010.
[2] S. R. Kay, A. Fiszbein, and L. A. Opfer, “The positive and negative syn-
drome scale (panss) for schizophrenia.” Schizophrenia bulletin, vol. 13,
no. 2, p. 261, 1987.
[3] A. Solin and S. Särkkä, “The 10th annual mlsp competition: First
place,” in Machine Learning for Signal Processing (MLSP), 2014 IEEE
International Workshop on. IEEE, 2014, pp. 1–3.
[4] A. V. Lebedev, “The 10th annual mlsp competition: Second place,” in
Machine Learning for Signal Processing (MLSP), 2014 IEEE Interna-
tional Workshop on. IEEE, 2014, pp. 1–4.
[5] K. Koncevičius, “The 10th annual mlsp competition: Third place,” in
Machine Learning for Signal Processing (MLSP), 2014 IEEE Interna-
tional Workshop on. IEEE, 2014, pp. 1–2.
[6] A. F. Rodrigues, M. Barros, and P. Furtado, “Squizofrenia: Classification
and correlation from mri,” in Biomedical & Health Informatics (BHI),
2017 IEEE EMBS International Conference on. IEEE, 2017, pp. 381–
Fig. 3. Machine Learning algorithms with highest accuracy and AUC 384.
[7] H. Zhang, L. Zeng, W. Wu, and C. Zhang, “How good are machine
learning clouds for binary classification with good features?” arXiv
preprint arXiv:1707.09562, 2017.
consider 0.5 as the threshold, but it’s not the right way when [8] M. C. Axelsen, N. Bak, and L. K. Hansen, “Testing multimodal
a particular classifier outputs 0.6 for negative samples and 0.9 integration hypotheses with application to schizophrenia data,” in Pattern
for positive samples. In such cases, AUC plays an important Recognition in NeuroImaging (PRNI), 2015 International Workshop on.
IEEE, 2015, pp. 37–40.
role by considering all the thresholds. [9] C. Aine, H. Bockholt, J. Bustillo, J. Cañive, A. Caprihan, C. Gasparovic,
F. Hanlon, J. Houck, R. Jung, J. Lauriello et al., “Multimodal neuroimag-
ing in schizophrenia: Description and dissemination,” Neuroinformatics,
vol. 15, no. 4, pp. 343–364, 2017.
[10] M. S. Çetin, F. Christensen, C. C. Abbott, J. M. Stephen, A. R. Mayer,
J. M. Cañive, J. R. Bustillo, G. D. Pearlson, and V. D. Calhoun,
“Thalamus and posterior temporal lobe show greater inter-network
connectivity at rest and across sensory paradigms in schizophrenia,”
Neuroimage, vol. 97, pp. 117–126, 2014.
[11] E. A. Allen, E. Damaraju, S. M. Plis, E. B. Erhardt, T. Eichele, and V. D.
Calhoun, “Tracking whole-brain connectivity dynamics in the resting
state,” Cerebral cortex, vol. 24, no. 3, pp. 663–676, 2014.
[12] J. M. Segall, E. A. Allen, R. E. Jung, E. B. Erhardt, S. K. Arja, K. Kiehl,
and V. D. Calhoun, “Correspondence between structure and function in
the human brain at rest,” Frontiers in neuroinformatics, vol. 6, 2012.
[13] Z. Wei, “Comparison of feature selection method of clustering model
using machine learning,” Journal of Huaqiao University (Natural Sci-
ence), vol. 1, p. 021, 2017.
Fig. 4. AUC score before and after Feature Selection
[14] M. B. Kursa, W. R. Rudnicki et al., “Feature selection with the boruta
package,” J Stat Softw, vol. 36, no. 11, pp. 1–13, 2010.
[15] T. Cheng, Y. Wang, and S. H. Bryant, “Fselector: a ruby gem for feature
V. C ONCLUSION selection,” Bioinformatics, vol. 28, no. 21, pp. 2851–2852, 2012.
[16] H. Abdi and L. J. Williams, “Principal component analysis,” Wiley
The paper is focused on trying to increase accuracy and interdisciplinary reviews: computational statistics, vol. 2, no. 4, pp. 433–
efficiency of machine learning algorithms. Only applying algo- 459, 2010.
rithms to datasets without analyzing and extracting important [17] J. Zhong, W. T. Peter, and D. Wang, “Novel bayesian inference on
optimal parameters of support vector machines and its application to
features does not give the best accuracy. This paper shows industrial survey data classification,” Neurocomputing, vol. 211, pp. 159–
different steps involved such as data cleaning, data prepro- 171, 2016.
cessing, training model, testing model and using model. For [18] R. Pandya and J. Pandya, “C5. 0 algorithm to improved decision tree
with feature selection and reduced error pruning,” International Journal
getting best out of the algorithm, this paper gives us techniques of Computer Applications, vol. 117, no. 16, 2015.
about how to extract features, reducing the dimensionality of [19] G. Biau, “Analysis of a random forests model,” Journal of Machine
the original dataset, fine-tuning the parameters, etc. Finally, Learning Research, vol. 13, no. Apr, pp. 1063–1095, 2012.
[20] A. J. Izenman, “Linear discriminant analysis,” in Modern multivariate
after comparing the accuracies of the blind application of statistical techniques. Springer, 2013, pp. 237–280.
algorithm against the well-followed process given in this paper, [21] C. E. Rasmussen and C. K. Williams, Gaussian processes for machine
we can see all the steps involved in methodology are important, learning. MIT press Cambridge, 2006, vol. 1.
[22] A. McCallum, K. Nigam et al., “A comparison of event models for
especially feature selection in this problem statement, and naive bayes text classification,” in AAAI-98 workshop on learning for
they all contribute to getting the best out of the algorithms. text categorization, vol. 752. Madison, WI, 1998, pp. 41–48.
This entire system flow resulted in more accurate diagnosis
prediction of Schizophrenia.
R EFERENCES
[1] P. Skudlarski, K. Jagannathan, K. Anderson, M. C. Stevens, V. D.
Calhoun, B. A. Skudlarska, and G. Pearlson, “Brain connectivity is not
only lower but different in schizophrenia: a combined anatomical and

~ 6 ~

You might also like