Professional Documents
Culture Documents
net/publication/332902253
CITATIONS READS
11 178
2 authors:
All content following this page was uploaded by Hosein Kazazi on 14 July 2019.
Hosein Kazazi 4
Abstract
Autism spectrum disorder (ASD) includes different neurodevelopmental disorders characterized by deficits in social communi-
cation, and restricted, repetitive patterns of behavior, interests or activities. Based on the importance of early diagnosis for
effective therapeutic intervention, several strategies have been employed for detection of the disorder. The artificial neural
network (ANN) as a type of machine learning method is a common strategy. In the current study, we extracted genomic data
for 487 ASD patients and 455 healthy individuals. All individuals were genotyped in certain single-nucleotide polymorphisms
within retinoic acid-related orphan receptor alpha (RORA), gamma-aminobutyric acid type A receptor beta3 subunit (GABRB3),
synaptosomal-associated protein 25 (SNAP25) and metabotropic glutamate receptor 7 (GRM7) genes. Subsequently, we used the
BKeras^ package to create and train the ANN model. For cross-validation, samples were divided into ten folds. In the training
process, initially, the first fold was preserved for validation and the other folds were used to train the model. The validation fold
was then used to evaluate model performance. The k-fold cross-validation method was used to ensure model generalizability and
to prevent overfitting. Local interpretable model-agnostic explanations (LIME) were applied to explain model predictions at the
data sample level. The output of loss function was evaluated in the training process for each fold in the k-fold cross-validation
model. Finally, the number of losses was reduced to less than 0.6 after 200 epochs (except in two cases). The accuracy, sensitivity
and specificity of our model were 73.67%, 82.75% and 63.95%, respectively. The area under the curve (AUC) was 80.59.
Consequently, in the current study, we propose an ANN-based method for differentiating ASD status from healthy status with
adequate power.
Introduction
type of machine learning approach which has been successful acid type A receptor beta3 subunit (GABRB3) (Noroozi et al.
in pattern recognition, and has been used previously in the 2018), synaptosomal-associated protein 25 (SNAP25) (Safari
context of ASD. Grossi et al. assessed the prevalence of po- et al. 2017a, b) and glutamate receptor, metabotropic 7 (GRM7)
tential pregnancy-related risk factors for ASD in a the mothers genes (Noroozi et al. 2016).
of 45 ASD children and 68 normal children. Based on the The rs4774388 SNP within RORA has been associated with
obtained data, they constructed specialized ANNs which ASD in an Iranian population (Sayad et al. 2017). The
could differentiate ASD patients from healthy subjects with rs11639084 of this gene has not been associated with ASD
greater than 80% accuracy (Grossi et al. 2016). More recently, in any population, but has been linked with other neurological
Bi et al. extracted imaging data for 50 ASD patients from the disorders such as bipolar disorder in a Taiwanese cohort (Lai
Autism Brain Imaging Data Exchange (ABIDE) database. et al. 2015). The rs4906902 is located in the promoter region
Using data for 42 normal individuals, they identified the ran- of GABRB3, and its association with ASD has been identified
dom Elman neural network (NN) cluster as the best base clas- in different populations including Iranian (Noroozi et al.
sifier. The authors proposed the constructed NN as a new tool 2018) and Taiwanese cohorts (Chen et al. 2014). This SNP
for improved classification performance in ASD diagnosis (Bi may alter the promoter activity of GABRB3 by affecting the
et al. 2018). transcription factor binding motifs (Tanaka et al. 2012). The
Single-nucleotide polymorphisms (SNPs) within several SNAP25 rs3746544 and rs1051312 are located in the regula-
genes have been associated with risk of ASD in different popu- tory 3′-untranslated region, and the latter has been associated
lations. In Iranian patients, we recently assessed associations be- with ASD risk in an Iranian population (Safari et al. 2017a, b).
tween ASD and variants within retinoic acid-related orphan re- These SNPs confer risk of attention deficit hyperactivity dis-
ceptor alpha (RORA) (Sayad et al. 2017), gamma-aminobutyric orders based on a meta-analysis of data in different
Ratio of case
populations (Ye et al. 2016). The rs6782011/rs779867 haplo-
samples in
cross-fold
0.442105
0.557895
0.478723
0.478723
0.574468
0.468085
0.542553
types of GRM7 have been associated with ASD risk in an
0.56383
0.56383
Iranian population (Noroozi et al. 2016), and the
0.5
rs16976358 of RIT2 has also been associated with ASD risk
in an Iranian population. A certain haplotype including
Ratio of control
rs16976358/rs4130047 SNPs of this gene carries increased
samples in
cross-fold
0.557895
0.442105
0.521277
0.521277
0.425532
0.531915
0.457447
risk of ASD in this population (Hamedani et al. 2017). The
0.43617
0.43617
CACNA1C SNPs (rs4765905, rs4765913 and rs1006737)
0.5
have been associated with psychiatric disorders in diverse
populations (Bhat et al. 2012). The associations between
Number of case
FOXP3 SNPs, rs3761548 and rs2232365 have been assessed
in an Iranian population. This lineage-specific factor of regu-
samples in
cross-fold
latory T cells is involved in the process of ASD development
(Safari et al. 2017a, b).
42
53
45
47
45
53
54
44
53
51
In the current study, we have developed a method based on
ANN construction to predict ASD status in individuals based
control samples
in cross-fold
on the SNP genotypes in the above-mentioned genes.
Number of
Validation samples
53
42
49
47
49
41
40
50
41
43
Methods
Cross-fold
Data Processing
size
95
95
94
94
94
94
94
94
94
94
Genotyping data of 15 SNPs within RORA (rs11639084 and
rs4774388), GABRB3 (rs4906902 and rs20317), SNAP25 Ratio of case
(rs3746544 and rs1051312), GRM7 (rs6782011 and
samples in
cross-fold
0.525384
0.512397
0.521226
0.518868
0.521226
0.510613
0.522406
0.514151
0.511792
0.511792
rs779867), RIT2 (rs4130047 and rs16976358), CACNA1C
(rs4765905, rs4765913 and rs1006737) and FOXP3
(rs3761548 and rs2232365) genes from 487 ASD patients
and 455 healthy individuals were included in the model.
Ratio of control
0.474616
0.487603
0.478774
0.478774
0.488208
0.489387
0.477594
0.488208
0.485849
0.481132
were separated and then sorted separately based on ID.
Samples with at least one NaN value were excluded. Case
and control samples of all sheets were then merged separately
based on ID. Finally, case and control samples were stored in
Number of case
847
847
848
848
848
848
848
848
848
848
tion and plots, and the Bscikit-learn^ package was used for
Table 1
0
1
2
3
4
5
6
7
8
9
process, initially, the first fold was preserved for validation not improve in the validation fold on 20 epochs, the learning
and the other folds were used to train the model. The valida- rate factor was divided in half. We used binary cross-entropy
tion fold was then used to evaluate model performance. Next, as a loss function for model training.
the second fold was preserved as validation fold, and the
training was done using the other folds. This process was
repeated for each fold. This method (k-fold cross-validation) Local Interpretable Model-Agnostic Explanations
was used to ensure model generalizability and to prevent
overfitting. We also assessed the data leakage between train- We used local interpretable model-agnostic explanations
ing samples and validation samples in each cross-fold. (LIME) to explain model predictions at the data sample level.
LIME is an algorithm that can reliably explain the predictions
of any classifier or regressor by approximating it locally with
ANN Model
an interpretable model. This method tries to recognize the
model by disturbing the input of data samples and identifying
Input series were fed into an embedding layer as ordinal cat-
how the predictions change. The output of LIME is a list of
egories. The input and output dimensions of the embedding
explanations, indicating the contribution of each feature to the
layer were 3 and 1, respectively. The data were then flattened
prediction of a data sample. This offers local interpretability,
and fed into the dense layer with arbitrary width and depth.
and it also permits one to define which feature alterations will
This layer had 32 neurons, and their activation function was
have the most influence on the prediction (Ribeiro et al. 2016).
SELU (scaled exponential linear units) (Fig. 1).
In this case, we selected five sub-modules of samples and then
The final layer of the ANN was the model output (single
used the LIME method to tweak the feature values and ob-
neuron with sigmoid activation function for binary classifica-
serve the resulting impact on the output, which reflected the
tion). We also used dropout (5% of neurons) and L2 regular-
contribution of each feature to the prediction of each cluster.
ization (0.01) methods on the hidden layer to improve model
generalizability and prevent overfitting. Figure 2 shows the
ANN structure.
Results
Model Training
Patient Characteristics
Training iteration was adjusted to 500 epochs. Training batch
size was set to 32 samples. To prevent overfitting, an early The data set was obtained from 487 ASD patients (406 male,
stop method was used. The training process was fixed to 50 81 female) with a mean age of 10.0 ± 3.6 years and 455
epochs if validation loss did not improve. In addition, to im- healthy individuals (379 male, 76 female) with a mean age
prove model performance, if the amount of loss function did of 10.0 ± 0.53 years.
Fig. 5 Local explanation for five clusters resulting from LIME application. In each plot, the role of various polymorphisms in ASD incidence can be
observed. In each cluster, green features show that the polymorphism can cause ASD, and red features show that the polymorphism can prevent ASD
J Mol Neurosci
Hamedani SY, Gharesouran J, Noroozi R, Sayad A, Omrani MD, Mir A, Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). Why should i trust you?:
Afjeh SSA, Toghi M, Manoochehrabadi S, Ghafouri-Fard S, Taheri Explaining the predictions of any classifier. Paper presented at the
M (2017) Ras-like without CAAX 2 (RIT2): a susceptibility gene Proceedings of the 22nd ACM SIGKDD international conference
for autism spectrum disorder. Metab Brain Dis 32(3):751–755. on knowledge discovery and data mining
https://doi.org/10.1007/s11011-017-9969-4 Safari MR, Omrani MD, Noroozi R, Sayad A, Sarrafzadeh S, Komaki A,
Heinsfeld AS, Franco AR, Craddock RC, Buchweitz A, Meneguzzi F Manjili FA, Mazdeh M, Ghaleiha A, Taheri M (2017a)
(2018) Identification of autism spectrum disorder using deep learn- Synaptosome-associated protein 25 (SNAP25) Gene Association
ing and the ABIDE dataset. Neuroimage Clin 17:16–23. https://doi. analysis revealed risk variants for ASD, in Iranian population. J
org/10.1016/j.nicl.2017.08.017 Mol Neurosci 61(3):305
Iidaka T (2015) Resting state functional magnetic resonance imaging and Safari MR, Ghafouri-Fard S, Noroozi R, Sayad A, Omrani MD, Komaki
neural network classified autism and control. Cortex 63:55–67. A, Eftekharian MM, Taheri M (2017b) FOXP3 gene variations and
https://doi.org/10.1016/j.cortex.2014.08.011 susceptibility to autism: a case-control study. Gene 596:119–122.
Jordan MI, Mitchell TM (2015) Machine learning: trends, perspectives, https://doi.org/10.1016/j.gene.2016.10.019
and prospects. Science 349(6245):255–260 Sayad A, Noroozi R, Omrani MD, Taheri M, Ghafouri-Fard S (2017)
Lai YC, Kao CF, Lu ML, Chen HC, Chen PY, Chen CH, Shen WW, Wu Retinoic acid-related orphan receptor alpha (RORA) variants are
JY, Lu RB, Kuo PH (2015) Investigation of associations between associated with autism spectrum disorder. Metab Brain Dis 32(5):
NR1D1, RORA and RORB genes and bipolar disorder. PLoS One 1595–1601. https://doi.org/10.1007/s11011-017-0049-6
10(3):e0121245. https://doi.org/10.1371/journal.pone.0121245 Tanaka M, Bailey JN, Bai D, Ishikawa-Brush Y, Delgado-Escueta AV,
Mohammad NS, Shruti PS, Bharathi V, Prasad CK, Hussain T, Alrokayan Olsen RW (2012) Effects on promoter activity of common SNPs in
SA, Naik U, Devi ARR (2016) Clinical utility of folate pathway 5′ region of GABRB3 exon 1A. Epilepsia 53(8):1450–1456. https://
genetic polymorphisms in the diagnosis of autism spectrum disor- doi.org/10.1111/j.1528-1167.2012.03572.x
ders. Psychiatr Genet 26(6):281–286. https://doi.org/10.1097/Ypg.
Ye C, Hu Z, Wu E, Yang X, Buford UJ, Guo Z, Saveanu RV (2016) Two
0000000000000152
SNAP-25 genetic variants in the binding site of multiple
Noroozi R, Taheri M, Movafagh A, Mirfakhraie R, Solgi G, Sayad A,
microRNAs and susceptibility of ADHD: a meta-analysis. J
Mazdeh M, Darvish H (2016) Glutamate receptor, metabotropic 7
Psychiatr Res 81:56–62. https://doi.org/10.1016/j.jpsychires.2016.
(GRM7) gene variations and susceptibility to autism: a case–control
06.007
study. Autism Res 9(11):1161–1168
Noroozi R, Taheri M, Movafagh A, Ghafouri-Fard S, Sayad A,
Mirfakhraie R, Ayatollahi SA, Inoko H, Noroozi H, Do AA Publisher’s Note Springer Nature remains neutral with regard to juris-
(2018) Association analysis of the GABRB3 promoter variant and dictional claims in published maps and institutional affiliations.
susceptibility to autism spectrum disorder. Basal Ganglia 11:4–7