Uncertainty Modeling For Multicenter Autism Spectrum Disorder Classification Using TakagiSugenoKang Fuzzy Systems

730 IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, VOL. 14, NO.
2, JUNE 2022
Uncertainty Modeling for Multicenter Autism

Spectrum Disorder Classification Using
Takagi–Sugeno–Kang Fuzzy Systems
Zhongyi Hu , Senior Member, IEEE, Jun Wang , Chunxiang Zhang, Zhenzhen Luo, Xiaoqing Luo ,
Lei Xiao , and Jun Shi
Abstract—The resting-state functional magnetic resonance I. I NTRODUCTION

imaging (rs-fMRI) is a pivotal tool that can reveal brain
UTISM spectrum disorder (ASD) is a lifelong neurode-
dysfunction in the computer-aided diagnosis of the autism spec-
trum disorder (ASD). However, the instability of data collection
devices, complexity of pathogenesis, and ambiguity in the causes
A velopmental disorder accompanied by many complica-
tions. Its main symptoms include social handicap, speech
of the disease always introduce considerable uncertainty in communication disorders, and stereotypes of actions [1]–[3].
identifying ASD using rs-fMRI. Due to the strong ability of Since Dr. Leo Kanner published the first paper [4] on autism
Takagi–Sugeno–Kang fuzzy inference systems (TSK FISs) in han-
in children during the 1940s, ASD has become one of the
dling the uncertainty of knowledge and expression, we build an
ASD classification model based on TSK FISs and further propose most serious neurological development disorders in the world.
a novel multicenter ASD classification method FCG-MTGS- According to statistics, the prevalence of ASD in the United
TSK. Specifically, the correlation information of multiple imaging States and China is 2.47% [5] and 1.19% [6], respectively,
centers is considered by introducing multitask group sparse and this proportion is increasing year by year. Furthermore,
learning, and the features across multiple imaging centers are
two-thirds of ASD patients cannot live independently and
thus jointly selected. An augmented lagrange multiplier (ALM)
method is further developed to find the optimal solution of the need attentive care during their entire lifetime. In other words,
model. Compared with the other existing methods, the proposed autism is a public health problem that places a heavy economic
method has the advantages of strong interpretability and high burden on families and society.
classification accuracy. The experimental results also identify the Previous, studies have shown that both behavioral and
most discriminative functional connectivity in multicenter ASD
cognitive deficits of ASD are closely related to the poten-
classification.
tial dysfunction of brain regions [7]–[12]. Resting-state
Index Terms—Autism spectrum disorder (ASD), joint group functional magnetic resonance imaging (rs-fMRI) is a piv-
sparsity, multicenter learning, resting-state functional magnetic
resonance imaging (rs-fMRI), TSK fuzzy system.
otal tool that can reveal brain dysfunction based on the
blood-oxygen-level-dependent (BOLD) signals when the sub-
ject is placed in the resting state [13]–[15]. Most of the
rs-fMRI-based ASD classification methods are developed
Manuscript received January 18, 2021; revised April 3, 2021; accepted using functional connectivity (FC) as features, in which
April 8, 2021. Date of publication April 15, 2021; date of current version the Pearson correlation coefficient is calculated as FC by
June 10, 2022. This work was supported in part by the Major Project of using the average BOLD signals of two brain regions in
Zhejiang Provincial Natural Science Foundation under Grant LD21F020001;
in part by the Science and Technology Commission of Shanghai Municipality rs-fMRI [16], [17].
under Grant 20ZR1419900; in part by the National Natural Science Thus, far, it has been reported that rs-fMRI is effec-
Foundation of China under Grant U1809209 and Grant 61772237; in tive in ASD classification. Heinsfeld et al. [18] identified
part by the Major Project of Wenzhou Natural Science Foundation under
Grant ZY2019020; in part by the Key Project of Zhejiang Provincial Natural ASD patients from an rs-fMRI data set based solely on the
Science Foundation under Grant LZ20F020022; and in part by the Six Talent brain activation patterns. Sherkatghanad et al. [19] focused
Climax Foundation of Jiangsu under Grant XYDXX-030. (Corresponding on the automated detection of ASD using a convolutional
author: Jun Wang.)
Zhongyi Hu, Zhenzhen Luo, and Lei Xiao are with the Intelligent neural network, with the most common rs-fMRI data from
Information Systems Institute, Wenzhou University, Wenzhou 325035, China. the autism brain imaging data exchange (ABIDE) data set.
Jun Wang and Jun Shi are with the Key Laboratory of Specialty Fiber However, most of these classification methods are limited
Optics and Optical Access Networks, Shanghai Institute for Advanced
Communication and Data Science, School of Communication and Information to single imaging center, and it is common to acquire rs-
Engineering, Shanghai University, Shanghai 200444, China, and also with fMRI from multiple imaging centers in practice [20]–[23].
the Jiangsu Key Laboratory of Big Data Security and Intelligent Processing, Besides, the multicenter data are collected by different scan-
Nanjing University of Posts and Telecommunications, Nanjing 210023, China
(e-mail: wangjun_shu@shu.edu.cn). ners and parameters, and the model learned from one imag-
Chunxiang Zhang and Xiaoqing Luo are with the School of Artificial ing center can hardly be directly applied to another. To
Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China. this end, it is necessary to develop a novel multicenter
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TCDS.2021.3073368. ASD classification method by jointly using the multicenter
Digital Object Identifier 10.1109/TCDS.2021.3073368 imaging data.
2379-8920
c 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Anil Neerukonda Inst of Tech & Sci-Andhra Pradesh. Downloaded on March 11,2023 at 06:02:30 UTC from IEEE Xplore. Restrictions apply.
HU et al.: UNCERTAINTY MODELING FOR MULTICENTER ASD CLASSIFICATION 731
Fig. 1. Pipeline of ASD diagnosis using FCG-MTGS-TSK.
Another challenging problem in ASD classification using knowledge in the ASD classification, we imposed a novel
rs-fMRI is data uncertainty, which is always caused by the group sparse regularization on the consequent parameters of
instability of data collection devices, complexity of patho- all the fuzzy rules across multiple tasks. This new regu-
genesis, and ambiguity in identifying medical causes. Due larization term encouraged the correlated tasks to share the
to the strong ability of Takagi–Sugeno–Kang fuzzy inference common knowledge in consequent parameters of fuzzy rules
systems (TSK FISs) in handling the uncertainty of knowledge across multiple tasks. To validate our method, extensive exper-
and expression, it is effective to build an ASD classification iments were conducted on the ABIDE database, and the
model based on TSK FISs [24]–[26]. results indicated the promising performance of the proposed
Specifically, the data uncertainty in the rs-fMRI is handled method.
by the fuzzy sets in the IF part of fuzzy rules. Besides, the The pipeline of the proposed method includes two stages,
final outputs of TSK-FIS are combined with the firing strength i.e., feature extraction and FCG-MTGS-TSK classification. In
that indicates how much each rule contributes to the predict the first stage of feature extraction, effective features were
results. extracted from the multicenter rs-fMRI data, and they were
In the literature, TSK-FISs have been fully investigated. then fed to the FCG-MTGS-TSK classifier. In the second stage
For example, Luo et al. [27] proposed a SparseFIS method, of classification, these features were classified by the FCG-
which can considerably reduce the number of fuzzy rules MTGS-TSK classifier, and the multicenter diagnosis results
by constraining the sparsity of the consequent parameter were finally obtained. Fig. 1 shows the entire pipeline of the
vector in the TSK fuzzy system. Juang and Chen [28] method.
proposed a TSK fuzzy system based on support vector The remainder of this article is organized as follows.
machine (SVM) in a principal component space for real- In Section II, we mainly introduce the source of the data
time object detection. Khater et al. [29] proposed a novel sets, the data preprocessing, and the feature extraction of
stable and fast nonlinear learning method called ATSKFC- the imaging data. In Section III, we review the TSK fuzzy
RACL based on an adaptive TSK fuzzy controller. Although system and propose feature-correlation-guided multitask group
these methods have achieved satisfactory results in spe- sparse learning method for multicenter ASD classification. In
cific problems, they are not applicable to ASD classifi- Section IV, we use a set of simulated data sets and three groups
cation because of the high dimensionality of the features of medical data sets from different clustering centers to com-
extracted from neuroimaging data. Besides, they do not con- pare the classification performance of the proposed algorithm
sider the correlation between multiple imaging centers and and analyze the corresponding results. Finally, we draw the
thus, cannot handle the multitask issues in multicenter ASD conclusions in Section V.
classification.
In this study, we considered the multicenter ASD clas-
sification as a multitask learning problem and developed II. M ATERIALS
a novel feature-correlation-guided multitask group sparse TSK The data set used in this work was obtained from
fuzzy system, in which the effective correlation information ABIDE [31]. We used the data of UM_2, OLIN, and
among features was incorporated to provide more prior LEUVEN_2 imaging centers, and the relevant information is
knowledge for classification [30]. To combine the multicenter shown in Table I.
732 IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, VOL. 14, NO. 2, JUNE 2022
TABLE I
I NFORMATION A BOUT THE DATA O BTAINED F ROM T HREE I MAGING the largest correlation with labels were selected and the
C ENTERS AND U SED IN T HIS S TUDY training data X(trn) ∈ RN×P are, therefore, obtained.
During the parameter optimization on the validation set
and the performance evaluation on the testing set, we also
completed the feature extraction of the data according to the
above steps. However, when it came to step (3), we directly
selected the corresponding features of column index pt instead
of calculating Pearson’s correlation. Finally, we could obtain
the validation data set X(val) and the testing data set X(tst)
successfully.
A. Data Preprocessing III. M ETHOD

The rs-fMRI data of each subject were preprocessed using A. Overview of TSK Fuzzy Systems
the data processing assistant for resting-state fMRI (DPARSF)
The TSK fuzzy system defines the rules in the form of “IF–
tool, and the detailed steps are as follows.
THEN.” Suppose that x = (x1 , x2 , . . . , xP )T ∈ RP is an input
1) Removal of the data of the first ten time points of the
feature vector, the rth fuzzy rule of the TSK fuzzy system can
functional magnetic resonance image sequence.
be expressed as follows:
2) Time-level and head motion correction.
3) Normalizing data into the Montreal Neurological IF x1 is Ar1 ∧ x2 is Ar2 ∧ · · · ∧ xP is ArP
Institute 152 (MNI152) standard space after the THEN y = wr0 + wr1 x1 + wr2 x2 + · · · + wrP xP (1)
T1-weighted image segmentation.
4) Dividing the brain into 116 regions by using the anatom- where Arp is the fuzzy subset for the rth fuzzy rule correspond-
ical automatic labeling (AAL) template, each of which ing to the pth element of the input variable x, r denotes the
was resampled according to 3 mm×3 mm×3 mm index of the fuzzy rule, P denotes the number of the variables,
voxels. and ∧ denotes the fuzzy conjunction operator. In this study, we
5) Spatial smoothing using half-height and full-width adopted the Gaussian membership function to express Arp (xp )
2
Gaussian kernels. xp − crp
6) Eliminating noise with the application of a band-pass μArp xp = exp − (2)
2σrp
filter (0.01–0.1 Hz).
7) Removal of the linear drift and interference variables. where crp and σrp can be obtained by clustering or other parti-
8) Calculation of the average time series of each brain tioning methods. In general, we used the fuzzy c-means (FCM)
region. clustering algorithm to partition the data set, and both crp and
Our experiments were conducted based on ten folded cross- σrp could be calculated using
validation, which is a common validation scheme adopted in

N N
many studies [32]–[38]. To derive the data set for training, crp = urn xnp / urn (3)
validation, and testing, the subjects in each imaging center n=1 n=1
were first divided into ten groups: one for testing, one for

N
2
N
validation, and others for training. The training, validation, σrp = h · urn xnp −crp / urn (4)
and testing sets in each imaging center were then combined n=1 n=1
to generate the training, validation, and testing sets for the
where h is a parameter adjusted by users, xn =
entire multicenter data set.
(xn1 , xn2 , . . . , xnP )T ∈ RP is the nth feature vector in FCM,
and urn denotes the membership of xn belonging to the rth
B. Feature Extraction cluster.
Assume that we have determined the premise of the fuzzy
Feature extraction on the training data was performed as
rule. Given the input x, the output of the TSK fuzzy system
follows.
can be computed as the following procedure:
1) For each subject, the Pearson correlation coefficients
between the brain regions were calculated and the FC
R
matrix Mi , i = 1, 2, . . . , N, is obtained. Each element ŷ = φr (x)lr (x) (5)

in Mi denotes the FC between two brain regions. r=1
2) Due to the symmetry of Mi , only the upper triangle is R
used, and the elements in the upper triangle of Mi were φr (x) = μr (x)/ μk (x) (6)
k=1
rearranged to a vector ai , i = 1, 2, . . . , N.
3) Let A(trn) = [a1 , a2 , . . . , aN ]T , in which each row rep-
P

μr (x) = μAip xp (7)
resents a sample and each column represented a feature.
In order to reduce the dimension of A(trn) , the corre-
p=1
lations between features and labels were first evaluated where wr = (wr0 , wr1 , . . . , wrP )T ∈ RP+1 and lr (x) =
with Pearson’s correlation. Then, P features that have wr0 + wr1 x1 + wr2 x2 + · · · + wrP xP = [1, xT ]wr represents
⎡ ⎤⎡ ⎤
the corresponding consequent part of the rth fuzzy rule, φr (x)
ωr(00)
ωr(01) . . .
ωr(0P) mr0
⎢
ωr(11) . . . ⎥⎢ ⎥
is the firing strength indicating how much the rth fuzzy rule ⎢ ωr(10)
ωr(1P) ⎥⎢ mr1 ⎥
=⎢ . .. . . .. ⎥⎢ .. ⎥
contributes to the prediction results. Note that (6) and (7) are ⎣ .. . . . ⎦⎣ . ⎦
effective when P is not very large. When the input data are ωr(P0)
ωr(P1) . . . ωr(PP) mrP
high dimensional, we can select several features (attributes) ⎡ ⎤

ωr(00) mr0 +
ωr(01) mr1 + · · · +
ωr(0P) mrP
with the largest variance and then compute the firing strength ⎢ ⎥
based on these features. ⎢ ωr(10) mr0 +
ωr(11) mr1 + · · · +
ωr(1P) mrP ⎥
T =⎢ .. ⎥. (12)
Let D = {(xi T , yi ) ∈ RP+1 |xi = (xi1 , xi2 , . . . , xiP )T , yi ∈ ⎣ . ⎦
{0, 1}, i = 1, 2, . . . , N} denotes the original input-and-output ωr(P0) mr0 +
ωP1
r m + ··· +
r1 ωPP
r m
rP
data set, where y = (y1 , y2 , . . . , yN )T ∈ RN indicates the true
label of these samples. With the data set D proposed, the We defined vrp = ( ωr(0p) mrp ,
ωr(1p) mrp , . . . ,
ωr(Pp) mrp )T as
output of the TSK fuzzy system can be computed as follows: a latent vector that recorded the pth column in (13). Obviously,
each column vrp depended not only on the margin correlation

R
coefficient mrp but also on the correlation between all of the

y= φr wr = w (8)
(P + 1) features ωr(ij) , i, j = 0, 1, . . . , P. In other words, if
r=1
mrp = 0, vrp would be zero irrespective of whether the pth
where = (φ1 , φ2 , . . . , φR ) ∈ RN×R(P+1) feature was related to the other features or not; if mrp = 0,
T
w = (w1 T , w2 T , . . . , wR T ) ∈ RR(P+1) , and vrp always relied on the correlation between the pth feature
φr = diag(φr (x1 ), φr (x2 ), . . . , φr (xN ))Xe , and and the other features.
In the multicenter ASD classification, we considered learn-
Xe = (1, x1 , x2 , . . . , xN )T . ing in an independent imaging center as a separate clas-
sification task. Thus, the above derivation could be easily
B. Feature-Correlation-Guided Multitask Group Sparse extended to the multitask learning settings. We assumed that
Learning for Multicenter ASD Classification the subdictionary φs1 , φs2 , . . . , φsR of all the R rules in the sth
To show how to use the correlation information between (s = 1, 2, . . . , S) imaging center predicted the output variable
features in our model, we first considered single-task learning ŷrm by using the linear model as in (9), and the consequent
by using the framework of the TSK fuzzy system and then parameters in wsr were decomposed and constructed in the
generalized to the multitask case. form of (10)–(12). Considering the intertask correlation, we
Considering the single-task settings, we constructed the rearranged all of the consequent parameters across multiple
following linear model [39] to predict the output variable yr : tasks as a (P + 1) × (P + 1) × (R·S) tensor; Fig. 2 shows the
details. We, therefore, propose the following learning criterion

yr = φr wr + r (9)
for the feature-correlation-guided multitask group sparse TSK
where yr ∈ RN is the output for the rth fuzzy rule and fuzzy system for multicenter ASD classification:
r = (1r , 2r , . . . , Nr )T ∈ RN is the error vector satisfying
S
2

s s s
R
E(nr ) = 0 and Var(nr ) = δr 2 for each 1 ≤ n ≤ N.
min y − φ r r
w
Suppose that the rth fuzzy subdirectory φr is indepen- wtr
s=1 r=1 F
dent of its error vector r . Denote mr = E(φr T yr /N) = P
2
(mr0 , mr1 , . . . , mrP )T ∈ RP+1 as the marginal correlation vec- 1 1
+λ v1p , v2p , . . . , v1Rp , . . . , vS1p , vS2p , . . . , vSRp
tor between all of the (P + 1) features and output yr . Using F
p=0
(9), we obtained the following:

P
mr = E φr T yr /N = E φr T φr wr /N s.t. wsr = vsrp , s = 1, 2, . . . , S (13)

+ E φr T r /N = r wr (10) p=0
where ys is the label vector in the sth task, sr and wsr repre-
where r ∈ R(P+1)×(P+1) is the covariance matrix of φr . Then,
sent the subdictionaries and the consequent parameters of the
we obtained the following:
rth rule in the sth task, respectively, and λ > 0 is a regulariza-
wr = r −1 mr . (11) tion coefficient. Moreover, as highlighted in Fig. 2, the matrix
[(v11p , v12p , . . . , v1Rp ), . . . , (vS1p , vS2p , . . . , vSRp )] ∈ R(P+1)×(R·S)
For the ease of description, we used r to denote r −1 . To
recorded all of the pth latent vectors across multiple
make the algorithm more robust to noise, we used the graph- 2
r ∈ R(P+1)×(P+1) that tasks. When (v11p , v12p , . . . , v1Rp ), . . . , (vS1p , vS2p , . . . , vSRp ) =
ical Lasso method [40] to construct F
0, the pth joint group of S tasks was excluded. When
recorded the significant partial correlations in r . Therefore, 2
we obtained wr ≈ r mr , which could be further expanded as (v11p , v12p , . . . , v1Rp ), . . . , (vS1p , vS2p , . . . , vSRp ) = 0, the ele-
F
follows: ments in the group were not all 0s, vsrp was jointly determined
⎡ ⎤ by msrp and the pth feature and its correlated features.
wr0 All in all, minimizing the regularization term
⎢ wr1 ⎥ P
⎢ ⎥ 2
p=0 (v1p , v2p , . . . , vRp ), . . . , (v1p , v2p , . . . , vRp )F jointly
1 1 1 S S S
wr ≈ ⎢ . ⎥
⎣ .. ⎦ selected the consequent parameters in different fuzzy rules
wrP across different imaging centers.
−1

R T
w = 2
2
φ2r φ2r 2 + ρ2
r=1
⎛ ⎞
R T
R
P
× ⎝2 φ2r φ2r y2 + ρ φ2r v2rp ⎠ (18)
r=1 r=1 p=0
...
−1

R T
w = 2
S
φSr φSr S + ρS
r=1
⎛ ⎞
R T
R
P
× ⎝2 φSr φSr yS + ρ φSr vSrp ⎠ (19)
r=1 r=1 p=0
where s = (φs1 , φs2 , . . . , φsR ), s = 1, . . . , S.

Similar to the above steps, we executed (20)–(22) in the kth
iteration and obtained the following output of v1rp , v2rp , . . . , vSrp :
Fig. 2. Consequent parameters of fuzzy rules in FCG-MTGS-TSK.
ρ 2 (P+1)w1r
ρw1r − 2λ+ρ(P+1)
v1rp = (20)
2λ
C. Optimization ρ 2 (P+1)w2r
ρw2r − 2λ+ρ(P+1)
We used an alternative optimization method to solve the v2rp = (21)
problem in (13). By taking advantage of the augmented 2λ
Lagrange multiplier (ALM) method [41], we transformed ..
.
the objective of (13) into the following, where the penalty ρ 2 (P+1)wSr
ρwSr −
coefficient ρ> 0 is a very large number: vSrp =
2λ+ρ(P+1)
. (22)
S
2 2λ

s s s
R
Lρ w , vrp =
s s
y − φr wr Eventually, the final w1 , w2 , . . . , wS was the approximate
solution of problem (14). Considering that this was a binary
s=1 r=1 2
classification problem, for the testing data set X(tst) , we let

P
+ λ v11p , v12p , · · · , v1Rp , . . . y(tst) = sigmoid((tst) w) denote the prediction results, where
p=0 the sigmoid function was defined with a threshold of 0.5 as
2 follows:

vS1p , vS2p , . . . , vSRp 1
2
F f (x) = . (23)
1 + e−x
ρ
S P
+ ws − vsrp
r . (14) The algorithm can be described as follows.
s=1
2
p=0 2
For the sth task and fixing the values of vsrp
and wsr , we D. Computation Complexity
calculated the partial derivatives of the variables wsr and vsrp , The complexity of the training procedure of FCG-MTGS-
respectively, and set them to 0, i.e., TSK is determined by both FCM clustering algorithm for
⎛ ⎞ the antecedent parameters and the procedure of computing
∂L s T s R P
the consequent parameters in each task. The complexity of
= 2 −φr y − φsr wsr +ρ ⎝wsr − vsrp ⎠ = 0
∂wsr FCM is O(NRPT), where N is the total number of subjects,
r=1 p=0
R is the rule number, P is the feature number, and T is the
(15)
⎛ ⎞ iteration number in FCM. On the other hand, the complexity
∂L
P of computing the conqsequent parameters ws in the sth task is
= 2λvsrp − ρ ⎝wsr − vsrp ⎠ = 0. (16) O(Ns 2 R(P + 1) + Ns 3 + Ns R2 (P + 1)2 ), where Ns is the subject
∂vsrp
p=0 number in the sth task.
Therefore, with regard to all of the S tasks, we computed
w1 , . . . , wS using (17)–(19) in each iteration and obtained the IV. E XPERIMENT
following output of w1 , w2 , . . . , wS : A. Experimental Settings
R −1
T In this section, we will use the accuracy (ACC), sensitiv-
w = 2
1
φr φr + ρ
1 1 1 1
ity (SEN), specificity (SPE), area under the curve (AUC), and
r=1
⎛ ⎞ receiver operating characteristic (ROC) curves [42] to mea-
R T
R
P sure the classification performance of all the methods involved.
× ⎝2 φ1r φ1r y1 + ρ φ1r v1rp ⎠ (17) These four metrics have been used in many machine learning
r=1 r=1 p=0 works [43]–[45]. The definitions of ACC, SEN, and SPE are
TABLE II
Algorithm 1 Feature-Correlation-Guided Multitask Group D ETAILED PARAMETER S ETTINGS IN A LL M ETHODS
Sparse Takagi–Sugeuo–Kang Fuzzy Systems (FCG-MTGS-
TSK)
Input: Training dataset D1 , D2 , . . . , DS number of fuzzy rules
R, regularization parameter λ, ρ
Output: w1 , w2 , . . . , wS
Step 1: Extract the antecedent part of a fuzzy rule.
Step 1.1: Partition the datasets D1 , D2 , . . . , DS into R clusters
with the FCM clustering algorithm.
Step 1.2: Compute csrp and σrp s in the Gauss membership func-
tion of the s-th imaging center by using Eqs. (3) and (4),
s = 1, . . . , S
Step 1.3: Determine the membership function μsArp (xp ) of the
s-th imaging center by using Eq. (2);
Step 1.4: Select 5 features (attriubtes) with the largest variance,
and compute the firing strength of the rules based on these
features using Eqs. (6) and (7).
Step 1.5: Generate s = (φs1 , φs2 , . . . , φsR )- Step 2: Joint group
sparse learning of consequent parameters across multiple tasks.
Step2.1: Initialize w1 = w2 = · · · = wS = 1 and
v1rp , v2rp , . . . , vsrp by using Eqs. (20)- (22), p = 0, 1, . . . , P,
r = 1, 2, . . . , R.
Step 2.2: Update w1 , w2 , . . . , wS by using Eqs.
(17)-(19).
Step 2.3: Update v1rp , v2rp , . . . , vsrp by using Eqs.
(20)-(22).
Step 2.4: Break if the convergence conditions are satisfied.
Step 3: Return w1 , w2 , . . . , wS .
shown as (24)–(26), where TP, FN, FP, and TN denote the true
positive, false negative, false positive, and the true negative, methods, including deep belief network (DBN) and stacked
respectively autoencoder (SAE), both of which have been successfully
TP applied in the diagnosis of brain diseases [57], [58].
SEN = (24) In the training procedure of both mt-L2TSKFS [59] and
TP + FN
TN FCG-MTGS-TSK, we combined the multicenter training
SPE = (25) data together and then extracted fuzzy rules from the com-
TN + FP
TP + TN bined data. This strategy made all the fuzzy rules share premise
ACC = . (26) parameters, which were also used in validation and testing.
TP + FN + TN + FP
For fair comparison, we combined the multicenter training
B. Summary of Competing Methods data together and fed them into both DBN and SAE, respec-
tively. Table III summarizes the details of the methods used
We conducted comprehensive experiments to validate the
in the experiments.
effectiveness of the proposed FCG-MTGS-TSK. The ASD
patients were labeled as “1” and normal controls as “0.”
We used the SVM as the baseline, which is a popular C. Classification Results and Analysis
method in the machine learning field [46]–[51]. It was trained Tables IV–VI show the classification results obtained by
and tested on a single imaging center. different methods on each imaging center in the experiments.
We regarded the multicenter ASD diagnosis as a multi- Fig. 3 also plots the ROC curves.
task classification problem. The mt-LASSO [52], mt-L21 [53], The results in Tables IV–VI and Fig. 3 show that
[54], and RMTFL [55] were used for comparison. They were the proposed FCG-MTGS-TSK effectively improved the
all from the MALSAR 1.1 (multitask learning via struc- performance of multicenter ASD classification. Compared
tural regularization) [56] toolkit, which is a multitask learning with conventional multitask classification methods mt-LASSO,
software package based on structural regularization. Both mt- mt-L21, and RMTFL, multitask fuzzy systems, including
L2TSKFS and FCG-MTGS-TSK were the nonlinear classifi- both mt-L2TSKFS and FCG-MTGS-TSK, had better results,
cation methods based on the TSK fuzzy system. The parameter which indicated that TSK fuzzy systems have better nonlinear
settings of all the above methods are shown in Table II. approximation ability in handling uncertainty for ASD diag-
We compared the proposed method with some deep learning nosis. Comparing the performance of FCG-MTGS-TSK with
TABLE III TABLE VI

S UMMARY OF THE M ETHODS FOR C OMPARISON C LASSIFICATION R ESULTS ON LEUVEN_2
TABLE IV
C LASSIFICATION R ESULTS ON UM_2
Fig. 3. ROC curves of different methods on UM2 , OLIN, and LEUVEN2 .
TABLE V
C LASSIFICATION R ESULTS ON OLIN
Fig. 4. Convergence curve of FCG-MTGS-TSK method.
mt-L2TSKFS, one may observe that the average AUC results the subjects into normal controls. This is because training
(i.e., 0.7334, 0.6704, and 0.7581) of FCG-MTGS-TSK on both DBN and SAE required a large number of training sam-
three imaging centers were better than those of mt-L2TSKFS ples. However, we could not provide adequate training samples
(i.e., 0.7112, 0.6652, and 0.7669). We consider the reason lies for them.
in the fact that feature correlation provided more knowledge Fig. 4 further plot the value of objective function when
in multitask fuzzy modeling to enhance the performance of FCG-MTGS-TSK was optimized. One may observe that the
FCG-MTGS-TSK, especially when the training samples were FCG-MTGS-TSK convergences in a limited number of itera-
inadequate. tions, which indicated that our method is efficient.
Regarding the results of DBN and SAE, one may observe Both mt-L21 and FCG-MTGS-TSK introduced regular-
that both of them could not obtain satisfactory classification ization terms, which led to group sparsity in multitask
results as FCG-MTGS-TSK could. DBN even classified all classification. However, FCG-MTGS-TSK was better than the
TABLE VII
T OP I NTERREGIONAL F UNCTIONAL F EATURES S ELECTED F ROM RS - F MRI
Fig. 5. Top 30 discriminative rs-fMRI connections selected using the

proposed method.
mt-L21 method in general. The reason was that, in addi-

tion to the unique advantages of the TSK fuzzy system
for handling uncertainty issues, the specific regularization
term in FCG-MTGS-TSK helped to improve the results. By
integrating precision matrix in TSK fuzzy modeling, the FCG-
MTGS-TSK guided the selection of common information
across multiple imaging centers, and effectively improved the
performance of multicenter ASD classification.
D. Discriminative Interregional Features

In order to identify the brain regions that were the most
conducive to the diagnosis of ASD, the total weight of each
FC was computed by adding their absolute values in all the
rules across multiple tasks. The normalized weights are then
obtained by dividing the largest weights values. Thrity pairs
of the most discriminative brain regions were selected based
on the most important functional connectivities, which were
determined by the normalized weights.
Table VII shows 30 most discriminative functional connec-
tivities, together with the normalized weights. The numbers
in the parentheses on the right indicate the index of the
functional regions in the AAL-Label template. Furthermore,
Fig. 5 illustrates the discriminative connections between the
brain regions, which are computed based on the multitask
FIS. Each circle in the figure denotes a brain region, and
their correlations are denoted by arcs. We used green and red
to distinguish between the left and the right brain regions,
respectively, while the others were plotted in yellow. The
intrahemisphere connections in both the left and the right
hemispheres were plotted in blue, while the interhemisphere
connections were plotted in black. In addition, the thickness
which uncertainty of rs-fMRI data in ASD classification can be
of lines represents the weights in the diagnosis.
described by the fuzzy sets in the fuzzy rules. Besides, the final
outputs of the TSK-FIS are combined with the firing strength
V. D ISCUSSION φr (x), which indicates how much each rule contributes to the
There have been many works that used machine learn- predict results.
ing methods to classify ASD. Representative methods The only work reported to classify ASD using TSK-FIS
included kernel methods [60], sparse learning [61], multitask is by Wang et al. [25], in which a multioutput TSK fuzzy
learning [62], multiview learning [16], [63], and deep neural system (MO-TSK) was developed to extract discriminant fea-
networks [64]. However, these works ignored the uncertainty tures in ASD classification. Since it handled uncertainty issue
issue in rs-fMRI. To this end, this article proposes a mul- using fuzzy sets, the method is robust than existing methods.
ticenter ASD classification method based on TSK FISs, in However, it did not consider the correlation between multiple
imaging centers and thus cannot handle the multitask issues [4] L. Kanner, “Autistic disturbances of affective contact,”
in multicenter ASD classification. Acta Paedopsychiatr, vol. 35, no. 4, pp. 100–136, 1968.
[5] G. Xu, L. Strathearn, B. Liu, and W. Bao, “Prevalence of Autism spec-
In the literature, there have been several multicenter ASD trum disorder among U.S. children and adolescents, 2014–2016,” J.
classification methods reported. One pioneer work for multi- Amer. Med. Assoc., vol. 319, no. 1, pp. 81–82, 2018.
center ASD classification is proposed by Nielsen et al. [20], [6] X. Sun et al., “Exploring the underdiagnosis and prevalence of Autism
spectrum conditions in Beijing,” Autism Res., vol. 8, no. 3, pp. 250–260,
in which the linear model was used. However, it did not 2015.
consider the correlation between centers and thus, obtain [7] M. D. Greicius, B. Krasnow, A. L. Reiss, and V. Menon, “Functional
poor results on some imaging centers. To this end, Wang connectivity in the resting brain: A network analysis of the default mode
hypothesis,” Proc. Nat. Acad. Sci. USA, vol. 100, no. 1, pp. 253–258,
et al. [62] proposed M3CC for multicenter ASD classification; 2003.
Huang et al. [61] proposed the self-weighted adaptive structure [8] D. P. Kennedy and E. Courchesne, “The intrinsic functional organiza-
learning model to obtain informative features and used multi- tion of the brain is altered in Autism,” Neuroimage, vol. 39, no. 4,
pp. 1877–1885, 2008.
template ensemble strategy for multicenter ASD classification;
[9] C. S. Monk et al., “Abnormalities of intrinsic functional connectivity in
and Wang et al. [65] proposed low-rank representation in mul- Autism spectrum disorders,” Neuroimage, vol. 47, no. 2, pp. 764–772,
ticenter ASD classification. All these methods considered the 2009.
correlation between imaging centers, and the performance of [10] S.-J. Weng et al., “Alterations of resting state functional connectivity
in the default network in adolescents with Autism spectrum disorders,”
multicenter ASD classification thus improved. However, these Brain Res., vol. 1313, pp. 202–214, Feb. 2010.
works ignored the uncertainty issue in ASD classification. To [11] J. M. Tyszka, D. P. Kennedy, L. K. Paul, and R. Adolphs, “Largely typ-
this end, this article handles the uncertainty in multicenter ical patterns of resting-state functional connectivity in high-functioning
adults with Autism,” Cerebr. Cortex, vol. 24, no. 7, pp. 1894–1905,
ASD classification using multitask TSK fuzzy systems, which 2014.
opens a door in neuroimaging studies. [12] C.-Y. Wee et al., “Resting-state multi-spectrum functional connectivity
There are issues that require further investigation. For exam- networks for identification of MCI patients,” PLoS ONE, vol. 7, no. 5,
2012, Art. no. e37828.
ple, the proposed method generated the antecedent parts of the [13] M. Burke and C. Bührle, “BOLD response during uncoupling of
fuzzy rules by performing FCM across all tasks. This strategy neuronal activity and CBF,” Neuroimage, vol. 32, no. 1, pp. 1–8, 2006.
makes the antecedent part share the common knowledge across [14] Y. Zhang et al., “Strength and similarity guided group-level brain
functional network construction for MCI diagnosis,” Pattern Recognit.,
multiple tasks, and however, ignores the individual characteris- vol. 88, pp. 421–430, Apr. 2019.
tics of each task. In our future work, we will further investigate [15] Y. Zhang, H. Zhang, X. Chen, S.-W. Lee, and D. Shen, “Hybrid high-
effective methods to generate more reasonable fuzzy parti- order functional connectivity networks using resting-state functional
MRI for mild cognitive impairment diagnosis,” Sci. Rep., vol. 7, no. 1,
tions, which makes the antecedent part of fuzzy rules both pp. 1–15, 2017.
share common knowledge across tasks and keep the individual [16] J. Wang, Q. Wang, H. Zhang, J. Chen, S. Wang, and D. Shen, “Sparse
characteristics of individual task. multiview task-centralized ensemble learning for ASD diagnosis based
on age-and sex-related functional connectivity patterns,” IEEE Trans.
Cybern., vol. 49, no. 8, pp. 3141–3154, Aug. 2019.
VI. C ONCLUSION [17] J. Wang et al., “Multi-class ASD classification based on functional
In this article, we proposed a novel feature-correlation- connectivity and functional correlation tensor via multi-source domain
adaptation and multi-view sparse representation,” IEEE Trans. Med.
guided multitask group sparse classifier under the framework Imag., vol. 39, no. 10, pp. 3137–3147, Oct. 2020.
of the TSK fuzzy system. The effective correlation among [18] A. S. Heinsfeld, A. R. Franco, R. C. Craddock, A. Buchweitz, and
features was considered in the learning of the consequent F. Meneguzzi, “Identification of Autism spectrum disorder using deep
learning and the ABIDE dataset,” NeuroImage Clin., vol. 17, pp. 16–23,
parameters. To combine the multicenter knowledge in the ASD Aug. 2018.
classification, we imposed a novel group sparse regulariza- [19] Z. Sherkatghanad et al., “Automated detection of Autism spectrum dis-
tion term on the consequent parameters of all fuzzy rules order using a convolutional neural network,” Front. Neurosci., vol. 13,
p. 1325, Jan. 2020.
across multiple tasks, which effectively encouraged these cor- [20] J. A. Nielsen et al., “Multisite functional connectivity MRI classification
related tasks to share the common consequent parameters. Our of Autism: ABIDE results,” Front. Human Neurosci., vol. 7, p. 599,
FCG-MTGS-TSK method has strong interpretability and high Sep. 2013.
[21] T. Iidaka, “Resting state functional magnetic resonance imaging and neu-
classification accuracy. The experimental results also indicated ral network classified Autism and control,” Cortex, vol. 63, pp. 55–67,
the effectiveness of selecting discriminative brain regions in Feb. 2015.
the assistant diagnosis of ASD. [22] M. Plitt, K. A. Barnes, and A. Martin, “Functional connectivity clas-
sification of Autism identifies highly predictive brain features but falls
short of biomarker standards,” NeuroImage Clin., vol. 7, pp. 359–366,
ACKNOWLEDGMENT Dec. 2015.
[23] H. Chen et al., “Multivariate classification of Autism spectrum disorder
The authors would like to thank the efforts and constructive using frequency-specific resting-state functional connectivity—A multi-
comments of respected editor and anonymous reviewers. center study,” Progr. Neuro Psychopharmacol. Biol. Psychiat., vol. 64,
pp. 1–9, Jan. 2016.
[24] T. Takagi and M. Sugeno, “Fuzzy identification of systems and its appli-
R EFERENCES cations to modeling and control,” IEEE Trans. Syst., Man, Cybern., Syst.,
[1] Diagnostic and Statistical Manual of Mental Disorders (DSM-5). vol. SMC-15, no. 1, pp. 116–132, Jan./Feb. 1985.
Philadelphia, PA, USA: Amer. Psychiatr., 2013. [25] J. Wang et al., “Interpretable feature learning using multi-output Takagi–
[2] R. P. Hobson, “The autistic child’s appraisal of expressions of emotion: Sugeno–Kang fuzzy system for multi-center ASD diagnosis,” in Proc.
A further study,” J. Child Psychol. Psychiat., vol. 27, no. 5, pp. 671–680, Int. Conf. Med. Image Comput. Comput. Assist. Intervent., 2019,
1986. Springer, pp. 790–798.
[3] S. B. Gaigg, “The interplay between emotion and cognition in Autism [26] J. Wang et al., “Multitask TSK fuzzy system modeling by jointly reduc-
spectrum disorder: Implications for developmental theory,” Front. ing rules and consequent parameters,” IEEE Trans. Syst., Man, Cybern.,
Integrative Neurosci., vol. 6, p. 113, Dec. 2012. Syst., early access, Aug. 19, 2019, doi: 10.1109/TSMC.2019.2930616.
[27] M. Luo, F. Sun, and H. Liu, “Hierarchical structured sparse represen- [46] S.-B. Lin, J. Zeng, and X. Chang, “Learning rates for classification
tation for T–S fuzzy systems identification,” IEEE Trans. Fuzzy Syst., with Gaussian kernels,” Neural Comput., vol. 29, no. 12, pp. 3353–3380,
vol. 21, no. 6, pp. 1032–1043, Dec. 2013. 2017.
[28] C.-F. Juang and G.-C. Chen, “A T–S fuzzy system learned through a sup- [47] W. Song, J. Xiang, and Y. Zhong, “A simulation model based fault
port vector machine in principal component space for real-time object diagnosis method for bearings,” J. Intell. Fuzzy Syst., vol. 34, no. 6,
detection,” IEEE Trans. Ind. Electron., vol. 59, no. 8, pp. 3309–3320, pp. 3857–3867, 2018.
Aug. 2012. [48] D. Wang, H. Qiao, B. Zhang, and M. Wang, “Online support vector
[29] A. A. Khater, A. M. El-Nagar, M. El-Bardini, and N. M. El-Rabaie, machine based on convex hull vertices selection,” IEEE Trans. Neural
“Improving the performance of a class of adaptive fuzzy controller based Netw. Learn. Syst., vol. 24, no. 4, pp. 593–609, Apr. 2013.
on stable and fast on-line learning algorithm,” Eur. J. Control, vol. 51, [49] Q. Zhang, D. Wang, and Y. Wang, “Convergence of decomposition
pp. 39–52, Jan. 2020. methods for support vector machines,” Neurocomputing, vol. 317,
[30] S. Yang, L. Yuan, Y.-C. Lai, X. Shen, P. Wonka, and J. Ye, “Feature pp. 179–187, Nov. 2018.
grouping and selection over an undirected graph,” in Proc. 18th ACM [50] H. Chen et al., “An effective machine learning approach for prognosis
SIGKDD Int. Conf. Knowl. Disc. Data Min., 2012, pp. 922–930. of paraquat poisoning patients using blood routine indexes,” Basic Clin.
[31] Autism Brain Imaging Data Exchange (ABIDE). Accessed: Jan. 1, 2018. Pharmacol. Toxicol., vol. 120, no. 1, pp. 86–96, 2017.
[Online]. Available: http://preprocessed-connectomes-project.org/abide/ [51] Z. Cai, J. Gu, and H.-L. Chen, “A new hybrid intelligent framework for
download.html predicting Parkinson’s disease,” IEEE Access, vol. 5, pp. 17188–17200,
[32] J. Xia et al., “Ultrasound-based differentiation of malignant and benign 2017.
thyroid Nodules: An extreme learning machine approach,” Comput. [52] R. Tibshirani, “Regression shrinkage and selection via the lasso,” J. Roy.
Methods Progr. Biomed., vol. 147, pp. 37–49, Aug. 2017. Stat. Soc. B Methodol., vol. 58, no. 1, pp. 267–288, 1996.
[33] C. Li et al., “Developing a new intelligent system for the diagno- [53] J. Liu and J. Ye, “Efficient L1/LQ norm regularization,” 2010. [Online].
sis of tuberculous pleural effusion,” Comput. Methods Progr. Biomed., Available: arXiv:1009.4766.
vol. 153, pp. 211–225, Jan. 2018. [54] A. Argyriou, T. Evgeniou, and M. Pontil, “Convex multi-task feature
[34] H. Huang, S. Zhou, J. Jiang, H. Chen, Y. Li, and C. Li, “A new fruit learning,” Mach. Learn., vol. 73, no. 3, pp. 243–272, 2008.
fly optimization algorithm enhanced support vector machine for diag- [55] P. Gong, J. Ye, and C. Zhang, “Robust multi-task feature learning,” in
nosis of breast cancer based on high-level features,” BMC Bioinformat., Proc. 18th ACM SIGKDD Int. Conf. Knowl. Disc. Data Min., 2012,
vol. 20, no. 8, pp. 1–14, 2019. pp. 895–903.
[35] H.-L. Chen, G. Wang, C. Ma, Z.-N. Cai, W.-B. Liu, and S.-J. Wang, “An [56] J. Zhou, J. Chen, and J. Ye, Malsar: Multi-Task Learning Via Structural
efficient hybrid kernel extreme learning machine approach for early diag- Regularization, Arizona State Univ., Tempe, AZ, USA, 2011.
nosis of Parkinson’s disease,” Neurocomputing, vol. 184, pp. 131–144, [57] M. Akhavan Aghdam, A. Sharifi, and M. M. Pedram, “Combination
Apr. 2016. of RS-fMRI and sMRI data to discriminate Autism spectrum disorders
[36] M. Wang and H. Chen, “Chaotic multi-swarm whale optimizer boosted in young children using deep belief network,” J. Digit. Imag., vol. 31,
support vector machine for medical diagnosis,” Appl. Soft Comput., no. 6, pp. 895–903, 2018.
vol. 88, Mar. 2020, Art. no. 105946. [58] H. I. Suk, S. W. Lee, and D. Shen, “Latent feature representation
[37] M. Wang et al., “Toward an optimal kernel extreme learning with stacked auto-encoder for AD/MCI diagnosis,” Brain Struct. Funct.,
machine using a chaotic moth-flame optimization strategy with appli- vol. 220, no. 2, pp. 841–59, 2015.
cations in medical diagnoses,” Neurocomputing, vol. 267, pp. 69–84, [59] Z. Deng, K. S. Choi, F.-L. Chung, and S. Wang, “Scalable TSK
Dec. 2017. fuzzy modeling for very large datasets using minimal-enclosing-ball
[38] L. Shen et al., “Evolving support vector machines using fruit fly approximation,” IEEE Trans. Fuzzy Syst., vol. 19, no. 2, pp. 210–226,
optimization for medical data classification,” Knowl. Based Syst., vol. 96, Apr. 2011.
pp. 61–75, Mar. 2016. [60] C. Y. Wee, L. Wang, F. Shi, P. T. Yap, and D. Shen, “Diagnosis of
[39] G. Yu, Y. Liu, and D. Shen, “Graph-guided joint prediction of class label Autism spectrum disorders using regional and interregional morpho-
and clinical scores for the Alzheimer’s disease,” Brain Struct. Funct., logical features,” Human Brain Map., vol. 35, no. 7, pp. 3414–30,
vol. 221, no. 7, pp. 3787–3801, 2016. 2014.
[40] J. Friedman, T. Hastie, and R. Tibshirani, “Sparse inverse covari- [61] F. Huang et al., “Self-weighted adaptive structure learning for ASD
ance estimation with the graphical lasso,” Biostatistics, vol. 9, no. 3, diagnosis via multi-template multi-center representation,” Med. Image
pp. 432–441, 2008. Anal., vol. 63, Jul. 2020, Art. no. 101662.
[41] D. P. Bertsekas, Constrained Optimization and Lagrange Multiplier [62] J. Wang et al., “Multi-task diagnosis for Autism spectrum disorders
Methods. New York, NY, USA: Academic, 2014. using multi-modality features: A multi-center study,” Human Brain
[42] T. Fawcett, “An introduction to ROC analysis,” Pattern Recognit. Lett., Map., vol. 38, no. 6, pp. 3081–3097, 2017.
vol. 27, no. 8, pp. 861–874, 2006. [63] H. Huang et al., “Enhancing the representation of functional connectivity
[43] L. Hu, G. Hong, J. Ma, X. Wang, and H. Chen, “An efficient machine networks by fusing multi-view information for Autism spectrum disorder
learning approach for diagnosis of paraquat-poisoned patients,” Comput. diagnosis,” Human Brain Map., vol. 40, no. 3, pp. 833–854, 2019.
Biol. Med., vol. 59, pp. 116–124, Apr. 2015. [64] L. Zhang, M. Wang, M. Liu, and D. Zhang, “A survey on deep learn-
[44] X. Zhao et al., “Chaos enhanced grey wolf optimization wrapped ELM ing for neuroimaging-based brain disorder analysis,” Front. Neurosci.,
for diagnosis of paraquat-poisoned patients,” Comput. Biol. Chem., vol. 14, p. 779, Oct. 2020.
vol. 78, pp. 481–490, Feb. 2019. [65] M. Wang, D. Zhang, J. Huang, D. Shen, and M. Liu, “Low-rank repre-
[45] L. Hu et al., “A new machine-learning method to prognosticate paraquat sentation for multi-center Autism spectrum disorder identification,” Med.
poisoned patients by combining coagulation, liver, and kidney indices,” Image Comput. Comput. Assist. Intervent., vol. 11070, pp. 647–654,
PLoS ONE, vol. 12, no. 10, 2017, Art. no. e0186427. Sep. 2018.

Uncertainty Modeling For Multicenter Autism Spectrum Disorder Classification Using TakagiSugenoKang Fuzzy Systems

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Uncertainty Modeling For Multicenter Autism Spectrum Disorder Classification Using TakagiSugenoKang Fuzzy Systems

Uploaded by

Copyright:

Available Formats

730 IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, VOL. 14, NO.