You are on page 1of 8

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/332902253

Application of Single-Nucleotide Polymorphisms in the Diagnosis of Autism


Spectrum Disorders: A Preliminary Study with Artificial Neural Networks

Article  in  Journal of Molecular Neuroscience · August 2019


DOI: 10.1007/s12031-019-01311-1

CITATIONS READS

11 178

2 authors:

Hosein Kazazi Hossein Mohammad-Rahimi


Iranian University Shahid Beheshti University of Medical Sciences
1 PUBLICATION   11 CITATIONS    19 PUBLICATIONS   86 CITATIONS   

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Hosein Kazazi on 14 July 2019.

The user has requested enhancement of the downloaded file.


Journal of Molecular Neuroscience
https://doi.org/10.1007/s12031-019-01311-1

Application of Single-Nucleotide Polymorphisms in the Diagnosis


of Autism Spectrum Disorders: A Preliminary Study with Artificial
Neural Networks
Soudeh Ghafouri-Fard 1 & Mohammad Taheri 2 1 3
& Mir Davood Omrani & Amir Daaee & Hossein Mohammad-Rahimi &
4

Hosein Kazazi 4

Received: 4 March 2019 / Accepted: 21 March 2019


# Springer Science+Business Media, LLC, part of Springer Nature 2019

Abstract
Autism spectrum disorder (ASD) includes different neurodevelopmental disorders characterized by deficits in social communi-
cation, and restricted, repetitive patterns of behavior, interests or activities. Based on the importance of early diagnosis for
effective therapeutic intervention, several strategies have been employed for detection of the disorder. The artificial neural
network (ANN) as a type of machine learning method is a common strategy. In the current study, we extracted genomic data
for 487 ASD patients and 455 healthy individuals. All individuals were genotyped in certain single-nucleotide polymorphisms
within retinoic acid-related orphan receptor alpha (RORA), gamma-aminobutyric acid type A receptor beta3 subunit (GABRB3),
synaptosomal-associated protein 25 (SNAP25) and metabotropic glutamate receptor 7 (GRM7) genes. Subsequently, we used the
BKeras^ package to create and train the ANN model. For cross-validation, samples were divided into ten folds. In the training
process, initially, the first fold was preserved for validation and the other folds were used to train the model. The validation fold
was then used to evaluate model performance. The k-fold cross-validation method was used to ensure model generalizability and
to prevent overfitting. Local interpretable model-agnostic explanations (LIME) were applied to explain model predictions at the
data sample level. The output of loss function was evaluated in the training process for each fold in the k-fold cross-validation
model. Finally, the number of losses was reduced to less than 0.6 after 200 epochs (except in two cases). The accuracy, sensitivity
and specificity of our model were 73.67%, 82.75% and 63.95%, respectively. The area under the curve (AUC) was 80.59.
Consequently, in the current study, we propose an ANN-based method for differentiating ASD status from healthy status with
adequate power.

Keywords Autism spectrum disorder . Artificial neural network . Single-nucleotide polymorphism

Introduction

Autism spectrum disorder (ASD) comprises a variety of


neurodevelopmental disorders characterized by deficits in so-
* Mohammad Taheri cial communication, and restricted, repetitive patterns of be-
mohammad_823@yahoo.com havior, interests or activities [Diagnostic and Statistical
* Mir Davood Omrani Manual of Mental Disorders (DSM-5) 2013]. With detection
davood_omrani@yahoo.co.uk as early as in the first 2 years of life, ASD is associated with
significant complications over the lifetime. Timely diagnosis
1
Department of Medical Genetics, Shahid Beheshti University of and appropriate evidence-based therapeutic modalities can
Medical Sciences, Tehran, Iran meaningfully improve quality of life for both patients and
2
Urogenital Stem Cell Research Center, Shahid Beheshti University of caretakers (Elder et al. 2017). The reliance of conventional
Medical Sciences, Tehran, Iran diagnostic approaches on clinical interviews and behavior ob-
3
School of Mechanical Engineering, Sharif University of Technology, servation leads to inaccurate diagnosis (Bi et al. 2018).
Tehran, Iran However, application of machine learning can enhance algo-
4
Dental School, Shahid Beheshti University of Medical Science, rithm performance according to the previous evidence (Jordan
Tehran, Iran and Mitchell 2015). The artificial neural network (ANN) is a
J Mol Neurosci

Fig. 1 SELU plotted for α =


1.6732~, λ = 1.0507~

type of machine learning approach which has been successful acid type A receptor beta3 subunit (GABRB3) (Noroozi et al.
in pattern recognition, and has been used previously in the 2018), synaptosomal-associated protein 25 (SNAP25) (Safari
context of ASD. Grossi et al. assessed the prevalence of po- et al. 2017a, b) and glutamate receptor, metabotropic 7 (GRM7)
tential pregnancy-related risk factors for ASD in a the mothers genes (Noroozi et al. 2016).
of 45 ASD children and 68 normal children. Based on the The rs4774388 SNP within RORA has been associated with
obtained data, they constructed specialized ANNs which ASD in an Iranian population (Sayad et al. 2017). The
could differentiate ASD patients from healthy subjects with rs11639084 of this gene has not been associated with ASD
greater than 80% accuracy (Grossi et al. 2016). More recently, in any population, but has been linked with other neurological
Bi et al. extracted imaging data for 50 ASD patients from the disorders such as bipolar disorder in a Taiwanese cohort (Lai
Autism Brain Imaging Data Exchange (ABIDE) database. et al. 2015). The rs4906902 is located in the promoter region
Using data for 42 normal individuals, they identified the ran- of GABRB3, and its association with ASD has been identified
dom Elman neural network (NN) cluster as the best base clas- in different populations including Iranian (Noroozi et al.
sifier. The authors proposed the constructed NN as a new tool 2018) and Taiwanese cohorts (Chen et al. 2014). This SNP
for improved classification performance in ASD diagnosis (Bi may alter the promoter activity of GABRB3 by affecting the
et al. 2018). transcription factor binding motifs (Tanaka et al. 2012). The
Single-nucleotide polymorphisms (SNPs) within several SNAP25 rs3746544 and rs1051312 are located in the regula-
genes have been associated with risk of ASD in different popu- tory 3′-untranslated region, and the latter has been associated
lations. In Iranian patients, we recently assessed associations be- with ASD risk in an Iranian population (Safari et al. 2017a, b).
tween ASD and variants within retinoic acid-related orphan re- These SNPs confer risk of attention deficit hyperactivity dis-
ceptor alpha (RORA) (Sayad et al. 2017), gamma-aminobutyric orders based on a meta-analysis of data in different

Fig. 2 ANN structure


J Mol Neurosci

Ratio of case
populations (Ye et al. 2016). The rs6782011/rs779867 haplo-

samples in
cross-fold

0.442105
0.557895
0.478723

0.478723

0.574468
0.468085

0.542553
types of GRM7 have been associated with ASD risk in an

0.56383

0.56383
Iranian population (Noroozi et al. 2016), and the

0.5
rs16976358 of RIT2 has also been associated with ASD risk
in an Iranian population. A certain haplotype including

Ratio of control
rs16976358/rs4130047 SNPs of this gene carries increased

samples in
cross-fold

0.557895
0.442105
0.521277

0.521277

0.425532
0.531915

0.457447
risk of ASD in this population (Hamedani et al. 2017). The

0.43617

0.43617
CACNA1C SNPs (rs4765905, rs4765913 and rs1006737)

0.5
have been associated with psychiatric disorders in diverse
populations (Bhat et al. 2012). The associations between

Number of case
FOXP3 SNPs, rs3761548 and rs2232365 have been assessed
in an Iranian population. This lineage-specific factor of regu-

samples in
cross-fold
latory T cells is involved in the process of ASD development
(Safari et al. 2017a, b).

42
53
45
47
45
53
54
44
53
51
In the current study, we have developed a method based on
ANN construction to predict ASD status in individuals based

control samples
in cross-fold
on the SNP genotypes in the above-mentioned genes.

Number of
Validation samples

53
42
49
47
49
41
40
50
41
43
Methods

Cross-fold
Data Processing

size

95
95
94
94
94
94
94
94
94
94
Genotyping data of 15 SNPs within RORA (rs11639084 and
rs4774388), GABRB3 (rs4906902 and rs20317), SNAP25 Ratio of case
(rs3746544 and rs1051312), GRM7 (rs6782011 and
samples in
cross-fold

0.525384
0.512397
0.521226
0.518868
0.521226

0.510613
0.522406

0.514151
0.511792

0.511792
rs779867), RIT2 (rs4130047 and rs16976358), CACNA1C
(rs4765905, rs4765913 and rs1006737) and FOXP3
(rs3761548 and rs2232365) genes from 487 ASD patients
and 455 healthy individuals were included in the model.
Ratio of control

In each sheet of original data, case and control samples


samples in
cross-fold

0.474616
0.487603
0.478774

0.478774
0.488208
0.489387
0.477594
0.488208
0.485849
0.481132
were separated and then sorted separately based on ID.
Samples with at least one NaN value were excluded. Case
and control samples of all sheets were then merged separately
based on ID. Finally, case and control samples were stored in
Number of case

the ‘Autism-dataset.xlsx’. To simplify data access and


samples in

indexing, feature codes were changed to F1-F28 and the


cross-fold

case/control target values were binarized to 1 and 0, respec-


445
434
442
440
442
434
433
443
434
436

tively, in the target column. Information about each column


(feature) of the modified dataset is shown in the ‘data_def’
sheet in the ‘Autism-dataset.xlsx’.
control samples
in cross-fold

The checksum procedure was carried out to ensure there


Number of
Sample distribution in ten folds

were no misalignments in ID/feature in the cleanup procedure.


In total, 455 control samples and 487 case samples were in-
402
413
406
408
406
414
415
405
414
412
Training samples

cluded in the study.


Cross-fold

Deep Learning Model


size

847
847
848
848
848
848
848
848
848
848

We used the BKeras^ package to create and train the ANN


model. The Bmatplotlib^ package was used for data visualiza-
Cross -folds

tion and plots, and the Bscikit-learn^ package was used for
Table 1

0
1
2
3
4
5
6
7
8
9

data preparation and pre-processing. For cross-validation,


samples were divided into ten folds. Briefly, in the training
J Mol Neurosci

process, initially, the first fold was preserved for validation not improve in the validation fold on 20 epochs, the learning
and the other folds were used to train the model. The valida- rate factor was divided in half. We used binary cross-entropy
tion fold was then used to evaluate model performance. Next, as a loss function for model training.
the second fold was preserved as validation fold, and the
training was done using the other folds. This process was
repeated for each fold. This method (k-fold cross-validation) Local Interpretable Model-Agnostic Explanations
was used to ensure model generalizability and to prevent
overfitting. We also assessed the data leakage between train- We used local interpretable model-agnostic explanations
ing samples and validation samples in each cross-fold. (LIME) to explain model predictions at the data sample level.
LIME is an algorithm that can reliably explain the predictions
of any classifier or regressor by approximating it locally with
ANN Model
an interpretable model. This method tries to recognize the
model by disturbing the input of data samples and identifying
Input series were fed into an embedding layer as ordinal cat-
how the predictions change. The output of LIME is a list of
egories. The input and output dimensions of the embedding
explanations, indicating the contribution of each feature to the
layer were 3 and 1, respectively. The data were then flattened
prediction of a data sample. This offers local interpretability,
and fed into the dense layer with arbitrary width and depth.
and it also permits one to define which feature alterations will
This layer had 32 neurons, and their activation function was
have the most influence on the prediction (Ribeiro et al. 2016).
SELU (scaled exponential linear units) (Fig. 1).
In this case, we selected five sub-modules of samples and then
The final layer of the ANN was the model output (single
used the LIME method to tweak the feature values and ob-
neuron with sigmoid activation function for binary classifica-
serve the resulting impact on the output, which reflected the
tion). We also used dropout (5% of neurons) and L2 regular-
contribution of each feature to the prediction of each cluster.
ization (0.01) methods on the hidden layer to improve model
generalizability and prevent overfitting. Figure 2 shows the
ANN structure.
Results
Model Training
Patient Characteristics
Training iteration was adjusted to 500 epochs. Training batch
size was set to 32 samples. To prevent overfitting, an early The data set was obtained from 487 ASD patients (406 male,
stop method was used. The training process was fixed to 50 81 female) with a mean age of 10.0 ± 3.6 years and 455
epochs if validation loss did not improve. In addition, to im- healthy individuals (379 male, 76 female) with a mean age
prove model performance, if the amount of loss function did of 10.0 ± 0.53 years.

Fig. 3 Learning process plots for each cross-fold


J Mol Neurosci

Fig. 4 Confusion matrix and ROC curve

Fig. 5 Local explanation for five clusters resulting from LIME application. In each plot, the role of various polymorphisms in ASD incidence can be
observed. In each cluster, green features show that the polymorphism can cause ASD, and red features show that the polymorphism can prevent ASD
J Mol Neurosci

ANN Model Several other studies in ASD patients have constructed


ANNs from imaging data. Iidaka developed an ANN using
Table 1 shows the distribution of samples in each fold. The imaging data from 312 ASD patients and 328 normal individ-
size of each fold was 94 or 95 samples. uals. Correlation matrices calculated from resting-state function-
The output of loss function was evaluated in the training al magnetic resonance imaging (rs-fMRI) time-series data were
process for each fold in the k-fold cross-validation model inserted into a probabilistic neural network (PNN), which could
(Fig. 3). It can be observed that the amount of losses was less discriminate disease status with almost 90% accuracy (Iidaka
than 0.6 after 200 epochs (except in two cases). 2015). Guo et al. analyzed rs-fMRI data using deep neural net-
We also plotted the ROC curve and confusion matrix (Fig. 4). works (DNN) and reported classification accuracy of 86.36%
The accuracy, sensitivity and specificity of our model were (Guo et al. 2017). In a similar study, Heinsfeld et al. reported
73.67%, 82.75% and 63.95%, respectively. The area under the 70% accuracy using deep learning and the ABIDE data set
curve (AUC) was 80.59. (Heinsfeld et al. 2018). Finally, Bi et al. developed a novel
Finally, we identified the most determinant features in each method through incorporation of numerous NNs into a model.
cluster in order (Fig. 5). Among the assessed SNPs, the CT Their proposed model demonstrated improved feature extraction
and TT genotypes in rs6782011 and rs11639084 had the most and sorting efficiency (Bi et al. 2018).
protective effects against ASD. On the other hand, TT and CT Construction of ANNs from genomic data has the advan-
genotypes within rs6782011 and rs11639084 were recognized tage of consistency of results over the lifetime and non-sub-
as the most significant risk genotypes. jectivity. Consequently, the proposed method in the current
study is a cost-effective method for differentiation of ASD
which can be applied from the time of birth. Future studies
Discussion should attempt to identify other SNPs which could increase
the accuracy of the proposed model.
This study provides a novel model for application of genomic
data in the ASD field and represents one of the first efforts to Acknowledgments This study was financially and technically supported
by Shahid Beheshti University of Medical Sciences.
construct a predictive model for assessment of ASD risk based
on SNPs genotypes that can be applied instantly after birth.
ANN construction is an analytical model which tries to recog- Compliance with Ethical Standards
nize natural procedures and restructure them by means of au-
Conflict of Interest The authors declare that they have no conflict of
tomated models. This method is a marked improvement over interest.
conventional analytical methods and enables the prediction of
events with comprehensive recognition of the correlation be-
tween parameters (Grossi et al. 2016). References
As the primary step, we included genotyping data of 15
SNPs within four genes. Among the assessed SNPs, the CT Bhat S, Dao DT, Terrillion CE, Arad M, Smith RJ, Soldatov NM, Gould
and TT genotypes in rs6782011 and rs11639084 had the most TD (2012) CACNA1C (Cav1.2) in the pathophysiology of psychi-
protective effects against ASD. The suggested model could atric disease. Prog Neurobiol 99(1):1–14. https://doi.org/10.1016/j.
pneurobio.2012.06.001
predict disease status with 80% accuracy. The accuracy of the Bi X-a, Liu Y, Jiang Q, Shu Q, Sun Q, Dai J (2018) The diagnosis of
proposed model is anticipated to be increased by the incorpo- autism spectrum disorder based on the random neural network clus-
ration of further data from other SNPs in the training step. ter. Front Hum Neurosci 12:257
The measured model accuracy in the current study was sim- Chen CH, Huang CC, Cheng MC, Chiu YN, Tsai WC, Wu YY, Liu SK,
Gau SS (2014) Genetic analysis of GABRB3 as a candidate gene of
ilar to that of Grossi et al., who established a specialized ANN
autism spectrum disorders. Mol Autism 5:36. https://doi.org/10.
for ASD diagnosis based on epidemiological factors. Their mod- 1186/2040-2392-5-36
el could distinguish ASD patients and healthy individuals with Diagnostic and statistical manual of mental disorders (DSM-5®) (2013)
80.19% overall precision when the data set was pre-evaluated American Psychiatric Pub
with the Training with Input Selection and Testing (TWIST) Elder JH, Kreider CM, Brasher SN, Ansell M (2017) Clinical impact of
early diagnosis of autism on the prognosis and parent–child relation-
system choosing 16 out of 27 parameters (Grossi et al. 2016).
ships. Psychol Res Behav Manag 10:283
The other attempt to distinguish ASD status based on SNP data Grossi E, Veggo F, Narzisi A, Compare A, Muratori F (2016) Pregnancy
was performed by Mohammad et al., who established an ANN risk factors in autism: a pilot study with artificial neural networks.
model from the data for 138 ASD children and 138 healthy Pediatr Res 79(2):339–347. https://doi.org/10.1038/pr.2015.222
subjects using five SNPs in the folate metabolic pathway. Guo X, Dominick KC, Minai AA, Li H, Erickson CA, Lu LJ (2017)
Diagnosing autism spectrum disorder from brain resting-state func-
Their model demonstrated 63.8% accuracy in predicting the risk tional connectivity patterns using a deep neural network with a novel
of ASD (Mohammad et al. 2016). Therefore, the accuracy of our feature selection method. Front Neurosci 11:460. https://doi.org/10.
model is superior to their model. 3389/fnins.2017.00460
J Mol Neurosci

Hamedani SY, Gharesouran J, Noroozi R, Sayad A, Omrani MD, Mir A, Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). Why should i trust you?:
Afjeh SSA, Toghi M, Manoochehrabadi S, Ghafouri-Fard S, Taheri Explaining the predictions of any classifier. Paper presented at the
M (2017) Ras-like without CAAX 2 (RIT2): a susceptibility gene Proceedings of the 22nd ACM SIGKDD international conference
for autism spectrum disorder. Metab Brain Dis 32(3):751–755. on knowledge discovery and data mining
https://doi.org/10.1007/s11011-017-9969-4 Safari MR, Omrani MD, Noroozi R, Sayad A, Sarrafzadeh S, Komaki A,
Heinsfeld AS, Franco AR, Craddock RC, Buchweitz A, Meneguzzi F Manjili FA, Mazdeh M, Ghaleiha A, Taheri M (2017a)
(2018) Identification of autism spectrum disorder using deep learn- Synaptosome-associated protein 25 (SNAP25) Gene Association
ing and the ABIDE dataset. Neuroimage Clin 17:16–23. https://doi. analysis revealed risk variants for ASD, in Iranian population. J
org/10.1016/j.nicl.2017.08.017 Mol Neurosci 61(3):305
Iidaka T (2015) Resting state functional magnetic resonance imaging and Safari MR, Ghafouri-Fard S, Noroozi R, Sayad A, Omrani MD, Komaki
neural network classified autism and control. Cortex 63:55–67. A, Eftekharian MM, Taheri M (2017b) FOXP3 gene variations and
https://doi.org/10.1016/j.cortex.2014.08.011 susceptibility to autism: a case-control study. Gene 596:119–122.
Jordan MI, Mitchell TM (2015) Machine learning: trends, perspectives, https://doi.org/10.1016/j.gene.2016.10.019
and prospects. Science 349(6245):255–260 Sayad A, Noroozi R, Omrani MD, Taheri M, Ghafouri-Fard S (2017)
Lai YC, Kao CF, Lu ML, Chen HC, Chen PY, Chen CH, Shen WW, Wu Retinoic acid-related orphan receptor alpha (RORA) variants are
JY, Lu RB, Kuo PH (2015) Investigation of associations between associated with autism spectrum disorder. Metab Brain Dis 32(5):
NR1D1, RORA and RORB genes and bipolar disorder. PLoS One 1595–1601. https://doi.org/10.1007/s11011-017-0049-6
10(3):e0121245. https://doi.org/10.1371/journal.pone.0121245 Tanaka M, Bailey JN, Bai D, Ishikawa-Brush Y, Delgado-Escueta AV,
Mohammad NS, Shruti PS, Bharathi V, Prasad CK, Hussain T, Alrokayan Olsen RW (2012) Effects on promoter activity of common SNPs in
SA, Naik U, Devi ARR (2016) Clinical utility of folate pathway 5′ region of GABRB3 exon 1A. Epilepsia 53(8):1450–1456. https://
genetic polymorphisms in the diagnosis of autism spectrum disor- doi.org/10.1111/j.1528-1167.2012.03572.x
ders. Psychiatr Genet 26(6):281–286. https://doi.org/10.1097/Ypg.
Ye C, Hu Z, Wu E, Yang X, Buford UJ, Guo Z, Saveanu RV (2016) Two
0000000000000152
SNAP-25 genetic variants in the binding site of multiple
Noroozi R, Taheri M, Movafagh A, Mirfakhraie R, Solgi G, Sayad A,
microRNAs and susceptibility of ADHD: a meta-analysis. J
Mazdeh M, Darvish H (2016) Glutamate receptor, metabotropic 7
Psychiatr Res 81:56–62. https://doi.org/10.1016/j.jpsychires.2016.
(GRM7) gene variations and susceptibility to autism: a case–control
06.007
study. Autism Res 9(11):1161–1168
Noroozi R, Taheri M, Movafagh A, Ghafouri-Fard S, Sayad A,
Mirfakhraie R, Ayatollahi SA, Inoko H, Noroozi H, Do AA Publisher’s Note Springer Nature remains neutral with regard to juris-
(2018) Association analysis of the GABRB3 promoter variant and dictional claims in published maps and institutional affiliations.
susceptibility to autism spectrum disorder. Basal Ganglia 11:4–7

View publication stats

You might also like