Professional Documents
Culture Documents
Epistasis Analysis Using An Improved Fuzzy
Epistasis Analysis Using An Improved Fuzzy
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TFUZZ.2019.2914629, IEEE
Transactions on Fuzzy Systems
TFS-2018-0762.R2 1
1063-6706 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TFUZZ.2019.2914629, IEEE
Transactions on Fuzzy Systems
TFS-2018-0762.R2 2
1063-6706 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TFUZZ.2019.2914629, IEEE
Transactions on Fuzzy Systems
TFS-2018-0762.R2 3
1063-6706 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TFUZZ.2019.2914629, IEEE
Transactions on Fuzzy Systems
TFS-2018-0762.R2 4
𝑛𝑖1 0, if 𝑛𝑖1 ∗ = 0
𝑛𝑖1 ∗ =
𝑛++ 2 −1
{ 𝑛𝑖0 (1) 𝑤𝑖𝐻 ∗ = ((1 + 𝑙𝑜𝑔𝑛𝑖0∗ 𝑛𝑖1 ∗ )𝑚−1 ) , other
𝑛𝑖0 ∗ = (6)
𝑛++ ∗
{ 1, if 𝑛𝑖0 = 0
where n++ is the total number of cases and controls among all
0, if 𝑛𝑖0 ∗ = 0
multifactor classes, and ni1 and ni0 are the cases and controls that −1
2
satisfy the ith multifactor class, respectively. The FCM 𝑤𝑖𝐿 ∗ = ((1 + 𝑙𝑜𝑔𝑛𝑖1∗ 𝑛𝑖0 ∗ )𝑚−1 ) , other (7)
approach allows one datum to belong to two groups. The
membership degree function of the FCM approach is employed { 1, if 𝑛𝑖1 ∗ = 0
to evaluate the H (wH) group and the L (wL) group. The degree
of membership of ni* in group cj can be formulated as follows
[19]: 3-3: The four cells’ true positivity (TP), false positivity (FP),
false negative (FN), and true negative (TN) are computed
1 in a 2 2 contingency table. Membership degrees for the
𝑤𝑖𝑗 = 2
(2) four fuzzy cells (TPf, FPf, FNf, and TNf) are determined
𝑑(𝑛𝑖 ∗ , 𝑐𝑗 ) 𝑚−1
∑𝑘={ℎ,𝑙} ( ) based on the FCME approach (H (wH) and L (wL)) and
𝑑(𝑛𝑖 ∗ , 𝑐𝑘 ) frequencies of ith multifactor class (ni1* and ni0*). In all
multifactor classes, TPf sums the H (wH) of frequencies
where j = {h, l}. We define the H group as the matrix ch = {1, 0} n1* and FNf sums the L (wL) of frequencies n1*. In a
and the L group as the matrix cl = {0, 1}. A cross-entropy similar manner, the H (wH) of frequencies n0* and the L
method is applied to evaluate distance between the ith (wL) of frequencies n0* are added to FPf and TNf,
multifactor class and the outcomes (cases and controls). respectively. Membership degree is thus used in
FCMEMDR to reduce the dimensionality of 3m
𝑑(𝑛∗ , 𝑐 ) = − ∑ 𝑐 × log(𝑛∗ ) (3) multifactor classes into 2 2 dimensionality. The four
cells are formulated as follows:
In FCMEMDR, each sample can have simultaneous, partial 𝑇𝑃𝑓 = ∑ 𝑛𝑖1 ∗ × 𝑤𝑖𝐻 ∗
membership in the H (wH) and L (wL) groups. Membership 𝑖
degree in the H (wH) group and the L (wL) groups can be 𝐹𝑃𝑓 = ∑ 𝑛𝑖0 ∗ × 𝑤𝑖𝐻 ∗
obtained using Equations (1–3) for the ith multifactor class. The 𝑖 (8)
membership degrees for the H (wH) group and the L (wL) group
𝐹𝑁𝑓 = ∑ 𝑛𝑖1 ∗ × 𝑤𝑖𝐿 ∗
can be formulated as (4) and (5). The fuzzy rule for the H (wH)
𝑖
group and the L (wL) group can then be formulated as (6) and (7).
𝑇𝑁𝑓 = ∑ 𝑛𝑖0 ∗ × 𝑤𝑖𝐿 ∗
𝑖
3-4: Estimation of models. The m-locus combination is
measured based on the 2 2 contingency table.
1
𝑤𝑖𝐻 = 2 2
−(1 × 𝑙𝑜𝑔(𝑛𝑖1 ∗ ) + 0 × 𝑙𝑜𝑔(𝑛𝑖0 ∗ )) 𝑚−1 −(1 × 𝑙𝑜𝑔(𝑛𝑖1 ∗ ) + 0 × 𝑙𝑜𝑔(𝑛𝑖0 ∗ )) 𝑚−1
( ) +( )
−(1 × 𝑙𝑜𝑔(𝑛𝑖1 ∗ ) + 0 × 𝑙𝑜𝑔(𝑛𝑖0 ∗ )) −(0 × 𝑙𝑜𝑔(𝑛𝑖1 ∗ ) + 1 × 𝑙𝑜𝑔(𝑛𝑖0 ∗ ))
2 −1 (4)
1 𝑙𝑜𝑔(𝑛𝑖1 ∗ ) 𝑚−1
= 2 =((1 + ) )
𝑙𝑜𝑔(𝑛𝑖0 ∗ )
−𝑙𝑜𝑔(𝑛𝑖1 ∗ ) 𝑚−1
1+( ∗ )
−𝑙𝑜𝑔(𝑛𝑖0 )
2 −1
= ((1 + 𝑙𝑜𝑔𝑛𝑖0∗ 𝑛𝑖1 ∗ )𝑚−1 )
1
𝑤𝑖𝐿 = 2 2
−(0 × 𝑙𝑜𝑔(𝑛𝑖1 ∗ ) + 1 × 𝑙𝑜𝑔(𝑛𝑖0 ∗ )) 𝑚−1 −(0 × 𝑙𝑜𝑔(𝑛𝑖1 ∗ ) + 1 × 𝑙𝑜𝑔(𝑛𝑖0 ∗ )) 𝑚−1
( ) +( ) (5)
−(1 × 𝑙𝑜𝑔(𝑛𝑖1 ∗ ) + 0 × 𝑙𝑜𝑔(𝑛𝑖0 ∗ )) −(0 × 𝑙𝑜𝑔(𝑛𝑖1 ∗ ) + 1 × 𝑙𝑜𝑔(𝑛𝑖0 ∗ ))
2 −1
∗ 𝑚−1
= ((1 + 𝑙𝑜𝑔𝑛𝑖1∗ 𝑛𝑖0 ) )
1063-6706 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TFUZZ.2019.2914629, IEEE
Transactions on Fuzzy Systems
TFS-2018-0762.R2 5
Fig. 2. Comparison of six epistatic detection methods used on 60 epistatic models without marginal effects. The figure compares implementations
of MDR-CCR (C), MDR-LR (L), EFMDR-CCR (EC), EFMDR-LR (EL), FCMEMDR-CCR (FC), and FCMEMDR-LR (FL) in 60 epistasis
models without marginal effects. Each block represents models comprising an MAF and an h2, in which the upper and lower regions are detection
success rate and number of CVC = 5, respectively. Darker red (H) denotes superior implementation, and darker blue (L) denotes inferior
implementation in the given region. White indicates moderate implementation in the given region. The numbers refer to the epistasis models. For
each model, the detection success rate is calculated based on the proportion in 100 datasets in which specific instances of epistasis are identified.
Each dataset includes 1000 SNPs and 400 samples (200 cases and 200 controls).
1) CCRfuzzy for FCMEMDR-CCR: The CCRfuzzy is used to Step 4. Stratified random k-fold.
evaluate the proportion of correctly classified individuals 4-1: Models are sorted from low to high. The sets of all
based on an m-locus combination. The CCRfuzzy processes m-locus combinations are sorted according to their
unbalanced datasets to determine accurate balance using the values of measures (i.e., CCRfuzzy and LRfuzzy).
TPf and TNf proportions [24]. The CCRfuzzy is formulated as (9). 4-2: The optimal model is recorded into CVC. The m-locus
𝑇𝑃𝑓 𝑇𝑁𝑓 combination with maximum measures is identified as the
𝐶𝐶𝑅𝑓𝑢𝑧𝑧𝑦 = 0.5 ( + ) (9)
𝑇𝑃𝑓 + 𝐹𝑁𝑓 𝐹𝑃𝑓 + 𝑇𝑁𝑓 optimal model in the CV dataset. This m-locus
2) LRfuzzy for FCMEMDR-LR: LRfuzzy compares the maximum combination is then recorded into CVC.
likelihood of an unrestricted model with that of a restricted Step 5. Saving of the optimal results and statistics from the
model. In the 2 2 contingency table, the unrestricted best model.
model consists of the observed frequencies in the data, and 5-1: Steps are repeated for each possible CV interval. Steps 2
the restricted model consists of the expected frequencies to 4 are repeated until completion of all CV.
under the null hypothesis of no association [25]. The LRfuzzy 5-2: The best model is selected according to CVC. The
is formulated as (10). m-locus combination with the highest CVC is regarded
3-5: Repeat for each combination. Steps 3-1 to 3-4 are as the best model in this experiment.
repeated until evaluation of all m-locus combinations has
been completed.
Observed
𝐿𝑅𝑓𝑢𝑧𝑧𝑦 = 2 ∑ Observed log [ ]
Expected
𝑇𝑃𝑓 𝐹𝑃𝑓 𝐹𝑁𝑓 𝑇𝑁𝑓
= 2 [𝑇𝑃𝑓 × 𝑙𝑜𝑔 ( ∗ ) + 𝐹𝑃𝑓 × 𝑙𝑜𝑔 ( ∗ ) + 𝐹𝑁𝑓 × 𝑙𝑜𝑔 ( ∗ ) + 𝑇𝑁𝑓 × 𝑙𝑜𝑔 ( ∗ )]
𝐴 𝐵 𝐶 𝐷
(𝑇𝑃 +𝐹𝑁 )(𝑇𝑃𝑓 +𝐹𝑃𝑓 )
𝐴∗ = 𝑓 𝑓 (10)
𝑇𝑃𝑓 +𝐹𝑃𝑓 +𝐹𝑁𝑓 +𝑇𝑁𝑓
1063-6706 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TFUZZ.2019.2914629, IEEE
Transactions on Fuzzy Systems
TFS-2018-0762.R2 6
III. RESULTS
A. Comparison of MDR-CCR, MDR-LR, EFMDR-CCR,
EFMDR-LR, FCMEMDR-CCR, and FCMEMDR-LR across
epistatic models without marginal effects and with marginal
effects.
Epistatic model without marginal effects
A total of 60 epistatic models without marginal effects was
used to simulate the datasets and multilocus penetrances of 60
epistatic models obtained from [8]. GAMETES was used to
generate datasets under the settings of heritability (h2) and
minor allele frequency (MAF) values. The h2 value controls
phenotypic variation in disease models. In the 60 epistatic
models in this study, h2 ranged from 0.025 to 0.2, and the MAFs
were 0.2 and 0.4. Each dataset included a specific target Fig. 3. Comparison of six epistasis detection methods used on eight
epistatic models with marginal effects. Each model consists of an MAF
(epistatic interaction) with random architectures [26] and other and an h2. The upper and lower blocks are, respectively, detection
SNPs. The datasets were generated with MAFs selected success rate and the number of CVC = 5, and the left and right blocks,
uniformly within [0.05, 0.5]. The detection success rate was respectively, represent 400 samples (200 cases and 200 controls) and
calculated by counting the number of specific targets identified 2000 samples (1000 cases and 1000 controls). The darker red (H)
in 100 datasets by using the algorithm. denotes superior implementation and the darker blue (L) denotes poor
The detection success rates of MDR-CCR (C), MDR-LR (L), implementation of the model. White indicates moderate
implementation for a region. The numbers refer to the epistatic models.
EFMDR-CCR (EC), EFMDR-LR (EL), FCMEMDR-CCR
(FC), and FCMEMDR-LR (FL) for the 60 epistatic models are
presented in Fig. 2. For epistatic models 1 to 30 (h2 0.2), all 100 datasets, the detection success rates were determined by
methods exhibited relatively strong abilities to accurately counting the targets (i.e., epistatic interactions) that had been
detect the specific targets in each dataset. For epistatic models successfully detected using each algorithm.
31 to 60 (h2 0.1), the results revealed that the detection The detection success rates of MDR-CCR (C), MDR-LR (L),
abilities of FCMEMDR-CCR and FCMEMDR-LR were the EFMDR-CCR (EC), EFMDR-LR (EL), FCMEMDR-CCR (FC),
same, and the detection success rates of these two methods and FCMEMDR-LR (FL) for the eight models are presented in
were higher than those of MDR-CCR, MDR-LR, Fig. 3. Overall, the detection success rates of MDR-CCR and
EFMDR-CCR, and EFMDR-LR. A Wilcoxon signed-rank test MDR-LR were enhanced when the FCME approach was used.
was used to evaluate the performances of FCMEMDR-CCR FCMEMDR-CCR and FCMEMDR-LR outperformed the other
and FCMEMDR-LR as well as those of other algorithms in algorithms in the eight models with marginal effects. For detection
application to the 60 disease models. As denoted in Table I, the success rates at CVC = 5, the FCME approach outperformed the
p values of FCMEMDR-CCR and FCMEMDR-LR were <.05, original method, indicating that the FCME approach can be used
indicating the superior performance of these algorithms over to improve the robustness of models with marginal effects. The
other algorithms. In terms of detection success rates at CVC = 5 Wilcoxon signed-rank test was used to compare the performances
(Fig. 2), the blue blocks indicate the performance of CVC 3. of FCMEMDR and other algorithms in the eight disease models.
In epistatic models 41 to 45 and 51 to 55, CVC = 1 for all FCMEMDR-CCR and FCMEMDR-LR exhibited superior
methods. The FCME approach was more stable than the other detection success rates compared with those of the other
methods in epistatic models 31 to 60, indicating that this algorithms (+, Table I) in application to the eight epistasis models
approach enhanced the ability of MDR to detect disease loci with marginal effects (400 samples and 2000 samples). p < .05
without marginal effects. In terms of computational time, indicated significant superiority of FCMEMDR-CCR and
FCMEMDR-CCR and FCMEMDR-LR cost an average of 19.4 s FCMEMDR-LR over other methods. Our results suggest that the
to run a complete process in the 60 epistatic models, where each FCME approach enhanced MDR due to consideration of the
dataset included 1000 SNPs with 400 samples. uncertainty of H/L classification in disease loci with marginal
effects. Regarding computational time, FCMEMDR-CCR and
Epistatic model with marginal effects FCMEMDR-LR required an average of 36.2 s to run a complete
For the simulation data with marginal effects test, multilocus process for each dataset in the eight epistatic models, where each
penetrances of the eight epistatic models were obtained from dataset included 1000 SNPs with 400 samples. For trials with
Namkung et al. (Models 1–6) [27] and Bush et al. (Models 7 2000 samples, FCMEMDR-CCR and FCMEMDR-LR required
and 8) [25]. In each epistatic model, we used GAMETES an average of 162 s to run a complete process for each dataset.
software [26] to simulate 100 datasets under setting of epistatic
model, with the MAF sets uniformly at [0.05, 0.5]. Thus, in these
1063-6706 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TFUZZ.2019.2914629, IEEE
Transactions on Fuzzy Systems
TFS-2018-0762.R2 7
TABLE I. COMPARISON OF FCMEMDR WITH MDR-BASED, EFMDR-BASED, ANTEPISEEKER, BEAM, BOOST, SNPRULER,
DECMDR, PBMDR, AND IMDR USING WILCOXON SIGNED-RANK TEST
A. FCMEMDR-CCR compares with nine algorithms B. FCMEMDR-LR compares with nine algorithms
i. 60 epistasis models without marginal effects
MDR-CCR EFMDR-CCR AntEpiSeeker MDR-CCR EFMDR-CCR AntEpiSeeker
1063-6706 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TFUZZ.2019.2914629, IEEE
Transactions on Fuzzy Systems
TFS-2018-0762.R2 8
with marginal effects and without marginal effects. The black FCMEMDR-CCR and FCMEMDR-LR outperformed
bars represent that the degree to which FCMEMDR-CCR and AntEpiSeeker, BEAM, SNPRuler, PBMDR, and IMDR in the
FCMEMDR-LR were superior to the algorithm under comparison eight epistatic models with marginal effects. For large samples
(+), the degree to which FCMEMDR-CCR and FCMEMDR-LR (2000 samples), FCMEMDR-CCR and FCMEMDR-LR
were inferior to the algorithm under comparison (−), and the degree exhibited superiority to AntEpiSeeker, BEAM, SNPRuler, and
to which FCMEMDR-CCR and FCMEMDR-LR were equal to the PBMDR in all epistatic models with marginal effects. Although
compared algorithm. The absence of bars indicates a degree of zero. the degree of superiority of FCMEMDR-CCR and
FCMEMDR-LR was equal to that of BOOST for most models,
Epistatic model without marginal effects FCMEMDR-CCR and FCMEMDR-LR were superior to
Sixty epistatic models without marginal effects were used to BOOST for models 1 and 4. FCMEMDR-CCR and
evaluate the performance of AntEpiSeeker, BEAM, SNPRuler, FCMEMDR-LR were superior to DECMDR and IMDR for
BOOST, DECMDR, PBMDR, IMDR, FCMEMDR-CCR, and models 7 and 8. The Wilcoxon signed-rank test indicated the
FCMEMDR-LR. Both FCMEMDR-CCR and FCMEMDR-LR significant superiority of FCMEMDR-CCR and
exhibited superior performance to that of AntEpiSeeker, FCMEMDR-LR to AntEpiSeeker, BEAM, SNPRuler, and
BEAM, SNPRuler, DECMDR, PBMDR, and IMDR. However, PBMDR, but the trend of superiority was only evident in
the performance of FCMEMDR-CCR and FCMEMDR-LR comparisons of FCMEMDR-CCR and FCMEMDR-LR with
was inferior to that of AntEpiSeeker (model 52), SNPRuler BOOST, DECMDR, and IMDR. Our results suggest that the
(model 55), BOOST (models 45, 53–55, 60), DECMDR FCME approach may enhance MDR because of its additional
(model 45), and IMDR (model 36). The Wilcoxon signed-rank consideration for the uncertainty of H/L classification when
test indicated the significant superiority of FCMEMDR-CCR detecting disease loci with marginal effects.
and FCMEMDR-LR to AntEpiSeeker, BEAM, SNPRuler,
DECMDR, PBMDR, and IMDR; however, FCMEMDR-CCR,
C. Real data experiment
and FCMEMDR-LR were superior to BOOST only for the
epistatic models without marginal effects or statistical Real data experiments were developed to validate the ability
significance. of FCMEMDR to correctly determine disease-associated
interactions. To accomplish this, we implemented two cases in
our real data experiments. A real coronary artery disease (CAD)
Epistatic model with marginal effects
dataset was selected from the WTCCC, and a real chronic
Eight epistatic models with marginal effects were used to
dialysis dataset was obtained from the Kaohsiung Chang Gung
evaluate the performance of AntEpiSeeker, BEAM, SNPRuler,
Memorial Hospital.
BOOST, DECMDR, PBMDR, IMDR, FCMEMDR-CCR, and
FCMEMDR-LR. For small samples (400 samples),
Fig. 4. Genotype counts for SNP pairs in CAD and membership degrees of the H and L groups in chromosome 3 in the WTCCC data. (A)
Genotype counts for SNP pairs in CAD and (B) the membership degrees of the H (μH) and L (μL) groups relative to the most common double
homozygous genotype in chromosome 3 in the WTCCC data. In each cell of (A), the values attached to the left-side bars are the numbers of
cases, and the values attached to the right-side bars are the numbers of controls. In each cell of (B), the values attached to the left-side bars are the
membership degrees of the H group, and the values attached to the right-side bars are membership degrees of the L group. Background colors
represent the degrees of the membership function. Darker red indicates H groups, and darker green indicates L groups. White indicates similar
membership degrees between the H and L groups.
1063-6706 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TFUZZ.2019.2914629, IEEE
Transactions on Fuzzy Systems
TFS-2018-0762.R2 9
Fig. 5. Genotype counts for SNP pairs in patients undergoing chronic dialysis, and membership degrees of the H and L groups in mitochondrial
DNA. (A) Genotype counts for SNP pairs in patients undergoing chronic dialysis and (B) the membership degrees of the H (μH) and L (μL) groups
relative to the most common homozygous genotype in mitochondrial DNA.
1063-6706 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TFUZZ.2019.2914629, IEEE
Transactions on Fuzzy Systems
TFS-2018-0762.R2 10
TABLE II. SUMMARY OF FCMEMDR-CCR AND FCMEMDR-LR RESULTS FOR CORONARY ARTERY DISEASE
FROM WTCCC DATA
Location SNP Groups (Gene) CCR CVC SNP Groups (Gene) LR CVC
Chr. 1 rs41399650 (UNKNOWN), 0.784 5 rs2999538 (LOC105371442), 265.099 4
rs17163057 (UNKNOWN) rs41399650 (UNKNOWN)
Chr. 2 rs41509345 (NCKAP5), 0.783 5 242.958 5
rs41453947 (UNKNOWN)
Chr. 3 rs41367044 (GTF2E1), 0.837 5 387.414 3
rs10866051 (LOC105376942)
Chr. 4 rs41426946 (PPA2), 0.797 5 270.066 5
rs41529544 (UNKNOWN)
Chr. 5 rs41493746 (UNKNOWN), 0.674 5 91.429 5
rs41421845 (LINC02107)
Chr. 6 rs3006172 (WDR27), 0.779 5 240.729 5
rs41489047 (ADGRB3)
Chr. 7 rs41437948 (POU6F2), 0.639 5 55.271 5
rs41468749 (GALNT17)
Chr. 8 rs35120859 (UNKNOWN), 0.722 4 139.631 4
rs17480050 (CSGALNACT1)
Chr. 9 rs41354745 (KANK1), 0.736 5 161.290 5
rs2891142 (SLC24A2)
Chr. 10 rs41437948 (FAM107B), 0.779 5 rs41437948 (POU6F2), 267.022 5
rs41468749 (TCERG1L) rs41468749 (GALNT17)
Chr. 11 rs41535846 (UNKOWN), 0.627 2 47.620 2
rs41518446 (MAML2)
Chr. 12 rs16926425 (SOX5), 0.957 5 712.602 5
rs7299571 (UNKNOWN)
Chr. 13 rs7328649 (FAM155A), 0.805 5 275.813 5
rs9540728 (PCDH9)
Chr. 14 rs41324950 (LOC105370603), 0.783 5 234.386 5
rs41453247 (UNKNOWN)
Chr. 15 rs41418744 (UNKNOWN), 0.664 3 80.943 3
rs41418548 (SHC4)
Chr. 16 rs235633 (UNKNOWN), 0.752 5 186.539 5
rs41483646 (UNKNOWN)
Chr. 17 rs4969207 (DNAH17), 0.868 2 rs3785579 (CACNG1), 494.358 2
rs3785579 (CACNG1) rs1870998 (UNKNOWN)
Chr. 18 rs4799934 (CELF4), 0.732 5 191.401 5
rs3794931 (ZNF516)
Chr. 19 rs375299 (UNKNOWN), 0.601 5 31.169 5
rs41370444 (UNKNOWN)
Chr. 20 rs2748666 (UNKNOWN), 0.872 5 440.051 5
rs41405046 (UNKNOWN)
Chr. 21 rs41378546 (CLDN14), 0.585 5 23.811 5
rs41451052 (UNKNOWN)
Chr. 22 rs41437948 (POU6F2), 0.634 4 52.509 4
rs41468749 (GALNT17)
Chr. X rs1419930 (UNKNOWN), 0.637 5 56.305 5
rs41500547 (DMD)
Chr: Chromosome; CCR and LR values are based on membership degree of the FCME approach.
patients undergoing chronic dialysis and 704 normal people low-risk group (Fig. 5B), whereas the red cell (CC, AA) was
amongst Taiwanese [35]. When FCMEMDR-CCR and considered to belong to the high-risk group. The results revealed
FCMEMDR-LR were applied for the optimal detection of that FCMEMDR-CCR and FCMEMDR-LR successfully
epistasis between the SNPs located in explicit genes, the highest identifies the epistasis. In terms of computational time, the
CCRfuzzy (0.617), LRfuzzy (2.209), and CVC (4) values were found FCMEMDR process required approximately 0.21 s.
for epistasis between C150T and G185A (both located in
D-loop gene) in mitochondrial DNA, with a high level of IV. DISCUSSION
significance (p < 0.0001). A graphical representation of the MDR is a robust, nonparametric method that detects nonlinear
interaction in the two-locus SNP combination is displayed in Fig. interactions among multiple, discrete genetic factors. MDR can be
5, which has the same figure legend as Fig. 4. The distinctions used to facilitate conversion of high-dimensional space into
between cases and controls for the epistasis between C150T and low-dimensional space, in turn enabling conversion of 3m
G185A were identified in the genotype pairs of each cell (Fig. prediction rules into 2 2 contingency tables for evaluating
5A). The green cell (TT, GG) was considered to belong to the
1063-6706 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TFUZZ.2019.2914629, IEEE
Transactions on Fuzzy Systems
TFS-2018-0762.R2 11
epistasis [11]. Regarding the limitations of MDR, the uncertainty classification enabled MDR to assign different membership
of binary classification may result in the loss of critical functions to two multifactor genotypes, thereby enabling the
information [36]. Binary classification uses the probabilities of binary membership on the set {0, 1} to be extended to the
case and control and multiple genotypes to identify H and L membership function on the interval [0, 1]. FCMEMDR
groups. In this study, an equilibrium dataset was assumed (i.e., the addresses information uncertainty by using the FCME
classification threshold was 1) and a two-SNP combination approach. In the FCME approach, FCM produces patterns that
comprising nine multifactor genotypes was used. A multifactorial can belong to any of the cluster classes with a certain degree of
genotype with odds of 5.5 and a multifactorial genotype with odds fuzzy membership, thus increasing the degree of difference
of 2.5 were categorized into the H group. Therefore, significant between similar distributions of m-locus combinations and
differences between the two multifactor genotypes could not be ultimately resulting in the more accurate detection of
distinguished using MDR. Many studies have focused on this significant epistasis. For interval [0, 1], a cross-entropy method
limitation [18, 36]. Research and development of resources for was used to evaluate the distance between the ith multifactor
improving MDR-based classification have increased; however, class and the outcomes (cases and controls). We compared the
work in this field remains limited. FCME and empirical fuzzy (EF) [18] for evaluating an m-locus
In this study, our results agree with those of Leem and Park combination, thereby detailing the improvement of
[18], in which the epistasis detection ability of the fuzzy-based FCMEMDR. For illustrative purposes, two SNP combinations
classification approach was superior to that of binary-based were considered: SNP A and B (Fig. 6A). Differences between
classification in MDR. The application of fuzzy-based the measurements of membership degree for multifactor classes
1063-6706 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TFUZZ.2019.2914629, IEEE
Transactions on Fuzzy Systems
TFS-2018-0762.R2 12
are illustrated in Fig. 6B. The red line indicates the distance [3] E. E. Eichler, J. Flint, G. Gibson, A. Kong, S. M. Leal, J. H. Moore,
et al., "Missing heritability and strategies for finding the underlying
between absolute cases and the ith cell, and the green dotted line causes of complex disease," Nature Reviews Genetics, vol. 11, pp.
indicates the distance between absolute controls and the ith cell. 446-450, 2010.
A short distance suggests a high probability that the ith cell [4] H. J. Cordell, "Detecting gene–gene interactions that underlie
belongs to the associated class. In this study, results for the cells human diseases," Nature Reviews Genetics, vol. 10, Art. no. 392,
2009.
(AA, BB), (aa, BB), and (Aa, Bb) revealed that the FCME was [5] T. F. Mackay and J. H. Moore, "Why epistasis is important for
superior in distinguishing cells associated with cases and tackling complex human disease genetics," Genome Medicine, vol.
controls. Thus, the membership degrees of the H and L groups 6, Art. no. 42, 2014.
[6] T. F. C. Mackay, "Epistasis and quantitative traits: using model
could be influenced to improve the degree of distinction (Fig. organisms to study gene-gene interactions," Nature Reviews
6C and 6D). A more accurate estimation of the difference Genetics, vol. 15, pp. 22-33, 2014.
between four cells in the 2 2 contingency table was achieved [7] Y. Zhang and J. S. Liu, "Bayesian inference of epistatic interactions
in case-control studies," Nature Genetics, vol. 39, pp. 1167-1173,
using FCME. Simulation experiments demonstrated that the 2007.
detection success rate of FCMEMDR was higher than those of [8] X. Wan, C. Yang, Q. Yang, H. Xue, N. L. S. Tang, and W. C. Yu,
other fuzzy-based MDR methods. "Predictive rule inference for epistatic interaction detection in
genome-wide association studies," Bioinformatics, vol. 26, pp.
In this study, MDR revealed that experience ambiguity can 30-37, 2010.
improve detection capabilities. Our results confirmed that [9] X. Wan, C. Yang, Q. Yang, H. Xue, X. Fan, N. L. Tang, et al.,
FCMEMDR can be used to perform and detect epistasis in "BOOST: A fast approach to detecting gene-gene interactions in
genome-wide case-control studies," The American Journal of
simulated and real datasets. FCMEMDR retained the advantages Human Genetics, vol. 87, pp. 325-340, 2010.
of the MDR method. First, FCMEMDR used FCM measurements [10] Y. Wang, X. Liu, K. Robbins, and R. Rekaya, "AntEpiSeeker:
to determine potential instances of epistasis that could be used to detecting epistatic interactions for case-control studies using a
two-stage ant colony optimization algorithm," BMC Research Notes,
enhance distinctions between pairs of multifactor genotypes. vol. 3, Art. no. 117, 2010.
Second, FCMEMDR was used to graphically represent [11] M. D. Ritchie, L. W. Hahn, N. Roodi, L. R. Bailey, W. D. Dupont, F.
membership of the epistasis associated with the disease group. F. Parl, et al., "Multifactor-dimensionality reduction reveals
high-order interactions among estrogen-metabolism genes in
Reduction of the dimensionality of the multisite data enabled clear sporadic breast cancer," American Journal of Human Genetics, vol.
determination of whether multiple loci associated with a disease 69, pp. 138-147, 2001.
are more common in affected or unaffected individuals. Third, [12] J. Gui, J. H. Moore, S. M. Williams, P. Andrews, H. L. Hillege, P.
FCMEMDR did not select the optimal adjustment parameter value van der Harst, et al., "A simple and computationally efficient
approach to multifactor dimensionality reduction analysis of
for the fuzzy set theory in practical data applications. In various gene-gene interactions for quantitative traits," PLoS One, vol. 8, Art.
simulation models, FCMEMDR exhibited higher power than no. e66545, 2013.
fuzzy MDR-based methods. [13] O. Y. Fu, H. W. Chang, Y. D. Lin, L. Y. Chuang, M. F. Hou, and C.
n
H. Yang, "Breast cancer-associated high-order SNP-SNP
FCMEMDR requires a total computation time of k × × s× interaction of CXCL12/CXCR4-related genes by an improved
m
m multifactor dimensionality reduction (MDR-ER)," Oncology
3 to determine the optimal m-locus combination between the Reports, vol. 36, pp. 1739-1747, 2016.
number of k-subsets in the number of n SNPs and the total [14] L. Y. Chuang, H. Y. Lane, Y. D. Lin, M. T. Lin, C. H. Yang, and H.
number of s samples. FCMEMDR exhibited satisfactory W. Chang, "Identification of SNP barcode biomarkers for genes
associated with facial emotion perception using particle swarm
runtimes, and powerful computing methods such as parallel optimization algorithm," Annals of General Psychiatry, vol. 13, Art.
operations [37], GPU-based MDR [38], the greedy search no. 15, 2014.
strategy [39], and DE-based MDR [28] may be used to further [15] D. Gola, J. M. M. John, K. van Steen, and I. R. Konig, "A roadmap
to multifactor dimensionality reduction methods," Briefings in
improve the runtime of FCMEMDR. Bioinformatics, vol. 17, pp. 293-308, 2016.
[16] C. H. Yang, Y. D. Lin, and L. Y. Chuang, "Multiple-criteria
V. CONCLUSIONS decision analysis-based multifactor dimensionality reduction for
detecting gene-gene interactions," IEEE Journal of Biomedical and
In this work, we introduced a powerful FCMEMDR method for Health Informatics, vol. 23, pp. 416-426, 2018.
the detection of epistasis. The FCMEMDR method was [17] C. H. Yang, Y. D. Lin, and L. Y. Chuang, "Class balanced
multifactor dimensionality reduction to detect gene—gene
formulated based on an FCME approach to address the uncertainty interactions," IEEE/ACM Transactions on Computational Biology
associated with MDR-based methods. Under the application of and Bioinformatics, doi:10.1109/TCBB.2018.2858776, 2018.
FCM, each cell derived from a multifactor genotype could assess [18] S. Leem and T. Park, "An empirical fuzzy multifactor
its own membership, allowing MDR to detect the most dimensionality reduction method for detecting gene-gene
interactions," BMC Genomics, vol. 18, Art. no. 115, 2017.
biologically significant instances of epistasis. Performance [19] J. C. Bezdek, C. Coray, R. Gunderson, and J. Watson, "Detection
evaluation based on simulations using real GWAS datasets and characterization of cluster substructure i. linear structure: Fuzzy
confirmed that FCMEMDR satisfactorily detected epistasis. c-lines," SIAM Journal on Applied Mathematics, vol. 40, pp.
339-357, 1981.
[20] J. Nayak, B. Naik, and H. Behera, "Fuzzy C-means (FCM)
VI. REFERENCES clustering algorithm: a decade review from 2000 to 2014," in
Computational intelligence in data mining-volume 2, ed: Springer,
[1] W. S. Bush and J. H. Moore, "Genome-wide association studies,"
2015, pp. 133-149.
PLoS Computational Biology, vol. 8, Art. no. e1002822, 2012.
[21] H.-Y. Jung, S. Leem, and T. Park, "Fuzzy set-based generalized
[2] J. H. Moore, F. W. Asselbergs, and S. M. Williams, "Bioinformatics
multifactor dimensionality reduction analysis of gene-gene
challenges for genome-wide association studies," Bioinformatics,
interactions," BMC Medical Genomics, vol. 11, Art. no. 32, 2018.
vol. 26, pp. 445-455, 2010.
1063-6706 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TFUZZ.2019.2914629, IEEE
Transactions on Fuzzy Systems
TFS-2018-0762.R2 13
1063-6706 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.