2022 2655 Moesm2 Esm

Supplementary figures for “LinDA: linear
models for differential abundance analysis of

microbiome compositional data”
1
S1 Additional main comparisons of numerical studies
Fig. S1 and Fig. S2 compare the proposed method LinDA with different zero-handling
approaches under settings S6C0 and S0C0. Fig. S3 depicts the results of LinDA, CLR-OLS
and MaAsLin2 with different normalization approaches under setting S0C0. Fig. S4–S10,
S12 and S13–S14 show the results of settings S0C1, S0C2, S1C0, S2C0, S4C0, S5C0, S6C0,
S7C0, S8.1C0, and S8.2C0, respectively. The comparison between disabling and enabling
zero treatment of the ANCOM-BC method is depicted in Fig. S11 under setting S6C0.
Fig. S15 shows the results of setting S0C0 with stronger compositional effects.
S2 Additional results of real data applications

Fig. S16–S19 show the effect size plots and volcano plots for the four datasets (CDI, IBD,
RA, and SMOKE) respectively.
S3 Full comparisons of numerical studies

Fig. S20–S30 present the full result of all methods under different simulation settings.
2
A n = 50 n = 200
0.8
0.6
Sparse Signal
Empirical False Discovery Rate
0.4
0.2
Method
Adaptive
0.8 Pseudo−count
Imputation
0.6
Dense Signal
0.4
0.2
2 4 6 2 4 6
Signal Strength
B n = 50 n = 200
1.0
Sparse Signal
0.8
True Positive Rate
0.6
Method
Adaptive
1.0 Pseudo−count
Imputation
Dense Signal
0.8
0.6
2 4 6 2 4 6
Signal Strength
3
Fig. S1: Performance of LinDA with different zero-handling approaches (S6C0: 10-fold
difference in library size, a binary covariate). Empirical false discovery rate (A) and true
positive rates (B) were averaged over 100 simulation runs. The dashed horizontal line (A)
indicates the target FDR level of 0.05. Note that the red and blue lines are overlapped as
the covariate and sequencing depth are significantly correlated.
A n = 50 n = 200
0.05
Sparse Signal
0.04
0.03
0.02 Method
Adaptive
0.05 Pseudo−count
Imputation
Dense Signal
0.04
0.03
0.02
2 4 6 2 4 6
Signal Strength
B n = 50 n = 200
0.8
0.7
Sparse Signal
0.6
0.5
True Positive Rate
0.4
Method
0.3 Adaptive
Pseudo−count
0.8 Imputation
0.7
Dense Signal
0.6
0.5
0.4
0.3
2 4 6 2 4 6
Signal Strength
4
Fig. S2: Performance of LinDA with different zero-handling approaches (S0C0: log normal
abundance distribution, a binary covariate). Empirical false discovery rate (A) and true
indicates the target FDR level of 0.05.
A n = 50 n = 200
0.4
Sparse Signal
0.3
0.2
0.1 Method
LinDA
0.0 CLR−OLS
MaAsLin2−TSS
0.4 MaAsLin2−TMM
MaAsLin2−CSS
MaAsLin2−CLR
Dense Signal
0.3
0.2
0.1
0.0
2 4 6 2 4 6
Signal Strength
B n = 50 n = 200
0.8
Sparse Signal
0.6
True Positive Rate
0.4 Method
LinDA
CLR−OLS
MaAsLin2−TSS
MaAsLin2−TMM
0.8
MaAsLin2−CSS
MaAsLin2−CLR
Dense Signal
0.6
0.4
2 4 6 2 4 6
Signal Strength
5
Fig. S3: Performance comparison between LinDA and MaAsLin2 (S0C0: log normal abun-
dance distribution, a binary covariate). Empirical false discovery rate (A) and true positive
rates (B) were averaged over 100 simulation runs. The dashed horizontal line (A) indicates
the target FDR level of 0.05.
A n = 50 n = 200
0.20
0.15
Sparse Signal
0.10
0.05
Method
LinDA
0.00
ANCOM−BC
0.20 ALDEx2
MaAsLin2
Spearman
0.15
Dense Signal
0.10
0.05
0.00
2 4 6 2 4 6
Signal Strength
B n = 50 n = 200
0.75
Sparse Signal
0.50
True Positive Rate
0.25
Method
LinDA
0.00
ANCOM−BC
ALDEx2
MaAsLin2
0.75 Spearman
Dense Signal
0.50
0.25
0.00
2 4 6 2 4 6
Signal Strength
6
Fig. S4: Performance comparison (S0C1: log normal abundance distribution, a continuous
covariate). Empirical false discovery rate (A) and true positive rates (B) were averaged
over 100 simulation runs. Error bars (A) represent the 95% CIs of the method LinDA and
the dashed horizontal line indicates the target FDR level of 0.05.
A n = 50 n = 200
0.6
Sparse Signal
0.4
0.2
Method
LinDA
0.0
ANCOM−BC
0.6 ALDEx2
MaAsLin2
Wilcoxon
Dense Signal
0.4
0.2
0.0
2 4 6 2 4 6
Signal Strength
B n = 50 n = 200
0.8
0.6
Sparse Signal
0.4
True Positive Rate
0.2
Method
LinDA
0.0
ANCOM−BC
0.8 ALDEx2
MaAsLin2
Wilcoxon
0.6
Dense Signal
0.4
0.2
0.0
2 4 6 2 4 6
Signal Strength
7
Fig. S5: Performance comparison (S0C2: log normal abundance distribution, a binary
variable of interest and two confounders). Empirical false discovery rate (A) and true
positive rates (B) were averaged over 100 simulation runs. Error bars (A) represent the
95% CIs of the method LinDA and the dashed horizontal line indicates the target FDR
level of 0.05.
A n = 50 n = 200
0.2
Sparse Signal
0.1
Method
LinDA
0.0 ANCOM−BC
ALDEx2
metagenomeSeq2
MaAsLin2
0.2 Wilcoxon
Dense Signal
0.1
0.0
2 4 6 2 4 6
Signal Strength
B n = 50 n = 200
0.8
0.6
Sparse Signal
0.4
True Positive Rate
0.2 Method
LinDA
ANCOM−BC
ALDEx2
0.8 metagenomeSeq2
MaAsLin2
Wilcoxon
0.6
Dense Signal
0.4
0.2
2 4 6 2 4 6
Signal Strength
8
Fig. S6: Performance comparison (S1C0: zero inflated absolute abundances, a binary co-
variate). Empirical false discovery rate (A) and true positive rates (B) were averaged over
100 simulation runs. Error bars (A) represent the 95% CIs of the method LinDA and the
dashed horizontal line indicates the target FDR level of 0.05.
A n = 50 n = 200
0.4
0.3
Sparse Signal
0.2
0.1 Method
LinDA
0.0 ANCOM−BC
ALDEx2
0.4 metagenomeSeq2
MaAsLin2
0.3 Wilcoxon
Dense Signal
0.2
0.1
0.0
2 4 6 2 4 6
Signal Strength
B n = 50 n = 200
0.75
Sparse Signal
0.50
0.25
True Positive Rate
Method
LinDA
0.00 ANCOM−BC
ALDEx2
metagenomeSeq2
0.75 MaAsLin2
Wilcoxon
Dense Signal
0.50
0.25
0.00
2 4 6 2 4 6
Signal Strength
9
Fig. S7: Performance comparison (S2C0: correlated absolute abundances, a binary covari-
ate). Empirical false discovery rate (A) and true positive rates (B) were averaged over
A n = 50 n = 200
0.10
Sparse Signal
0.05
Method
LinDA
0.00 ANCOM−BC
ALDEx2
metagenomeSeq2
MaAsLin2
Wilcoxon
0.10
Dense Signal
0.05
0.00
2 4 6 2 4 6
Signal Strength
B n = 50 n = 200
0.6
Sparse Signal
0.4
0.2
True Positive Rate
Method
LinDA
0.0 ANCOM−BC
ALDEx2
0.6
metagenomeSeq2
MaAsLin2
Wilcoxon
Dense Signal
0.4
0.2
0.0
2 4 6 2 4 6
Signal Strength
10
Fig. S8: Performance comparison (S4C0: smaller m, a binary covariate). Empirical false
discovery rate (A) and true positive rates (B) were averaged over 1000 simulation runs.
Error bars (A) represent the 95% CIs of the method LinDA and the dashed horizontal line
A n = 20 n = 30
0.3
Sparse Signal
0.2
0.1
Method
LinDA
0.0 ANCOM−BC
0.3 ALDEx2
metagenomeSeq2
MaAsLin2
Wilcoxon
Dense Signal
0.2
0.1
0.0
2 4 6 2 4 6
Signal Strength
B n = 20 n = 30
0.5
0.4
Sparse Signal
0.3
0.2
True Positive Rate
0.1 Method
LinDA
0.0 ANCOM−BC
ALDEx2
0.5 metagenomeSeq2
MaAsLin2
0.4 Wilcoxon
Dense Signal
0.3
0.2
0.1
0.0
2 4 6 2 4 6
Signal Strength
11
Fig. S9: Performance comparison (S5C0: smaller n, a binary covariate). Empirical false
discovery rate (A) and true positive rates (B) were averaged over 100 simulation runs.
A n = 50 n = 200
0.75
Sparse Signal
0.50
0.25
Method
LinDA
0.00 ANCOM−BC
ALDEx2
metagenomeSeq2
MaAsLin2
0.75 Wilcoxon
Dense Signal
0.50
0.25
0.00
2 4 6 2 4 6
Signal Strength
B n = 50 n = 200
1.00
0.75
Sparse Signal
0.50
True Positive Rate
0.25 Method
LinDA
ANCOM−BC
0.00
ALDEx2
1.00
metagenomeSeq2
MaAsLin2
0.75 Wilcoxon
Dense Signal
0.50
0.25
0.00
2 4 6 2 4 6
Signal Strength
12
Fig. S10: Performance comparison (S6C0: 10-fold difference in library size, a binary co-
variate). Empirical false discovery rate (A) and true positive rates (B) were averaged over
A n = 50 n = 200
0.8
Sparse Signal
0.6
0.4
0.2
Method
ANCOM−BC−1
0.8 ANCOM−BC−2
0.6
Dense Signal
0.4
0.2
2 4 6 2 4 6
Signal Strength
B n = 50 n = 200
1.000
0.975
Sparse Signal
0.950
0.925
True Positive Rate
0.900 Method
1.000 ANCOM−BC−1
ANCOM−BC−2
0.975
Dense Signal
0.950
0.925
0.900
2 4 6 2 4 6
Signal Strength
13
Fig. S11: Performance of ANCOM-BC disabling (ANCOM-BC-1) and enabling (ANCOM-

BC-2) zero treatment (S6C0: 10-fold difference in library size, a binary covariate). Empir-
ical false discovery rate (A) and true positive rates (B) were averaged over 100 simulation
runs. The dashed horizontal line (A) indicates the target FDR level of 0.05.
A n = 50 n = 200
0.15
Sparse Signal
0.10
0.05
Method
LinDA
0.00 ANCOM−BC
ALDEx2
metagenomeSeq2
0.15 MaAsLin2
Wilcoxon
Dense Signal
0.10
0.05
0.00
2 4 6 2 4 6
Signal Strength
B n = 50 n = 200
0.6
Sparse Signal
0.4
True Positive Rate
0.2 Method
LinDA
ANCOM−BC
0.0
ALDEx2
metagenomeSeq2
MaAsLin2
0.6 Wilcoxon
Dense Signal
0.4
0.2
0.0
2 4 6 2 4 6
Signal Strength
14
Fig. S12: Performance comparison (S7C0: negative binomial abundance distribution, a

binary covariate). Empirical false discovery rate (A) and true positive rates (B) were
averaged over 100 simulation runs. Error bars (A) represent the 95% CIs of the method
LinDA and the dashed horizontal line indicates the target FDR level of 0.05.
A n = 50 n = 200
0.2
Sparse Signal
0.1
Method
LinDA−LMM
0.0
LinDA−OLS
CLR−LMM
CLR−OLS
MaAsLin2
0.2
Dense Signal
0.1
0.0
2 4 6 2 4 6
Signal Strength
B n = 50 n = 200
0.8
0.6
Sparse Signal
0.4
True Positive Rate
0.2
Method
LinDA−LMM
0.0 LinDA−OLS
0.8 CLR−LMM
CLR−OLS
MaAsLin2
0.6
Dense Signal
0.4
0.2
0.0
2 4 6 2 4 6
Signal Strength
15
Fig. S13: Performance comparison (S8.1C0: pre-treatment and post-treatment comparison,

a binary covariate). Empirical false discovery rate (A) and true positive rates (B) were
averaged over 100 simulation runs. The dashed horizontal line (A) indicates the target
FDR level of 0.05.
A n = 50 n = 200
0.3
Sparse Signal
0.2
0.1
Method
LinDA−LMM
0.0
LinDA−OLS
CLR−LMM
CLR−OLS
0.3
MaAsLin2
Dense Signal
0.2
0.1
0.0
2 4 6 2 4 6
Signal Strength
B n = 50 n = 200
0.6
Sparse Signal
0.4
True Positive Rate
0.2
Method
LinDA−LMM
0.0
LinDA−OLS
CLR−LMM
CLR−OLS
0.6 MaAsLin2
Dense Signal
0.4
0.2
0.0
2 4 6 2 4 6
Signal Strength
16
Fig. S14: Performance comparison (S8.2C0: replicate sampling, a binary covariate). Em-
pirical false discovery rate (A) and true positive rates (B) were averaged over 100 simulation
A n = 50 n = 200
0.6
Sparse Signal
0.4
0.2
Method
LinDA
0.0 ANCOM−BC
ALDEx2
metagenomeSeq2
0.6
MaAsLin2
Wilcoxon
Dense Signal
0.4
0.2
0.0
2 4 6 2 4 6
Signal Strength
B n = 50 n = 200
0.2
Sparse Signal
0.1
True Positive Rate
Method
LinDA
0.0 ANCOM−BC
ALDEx2
metagenomeSeq2
MaAsLin2
0.2 Wilcoxon
Dense Signal
0.1
0.0
2 4 6 2 4 6
Signal Strength
17
Fig. S15: Performance comparison (S0C0 with strong compositional effects). Empirical
false discovery rate (A) and true positive rates (B) were averaged over 100 simulation runs.
A
Disease: Case v.s. DiarrhealControl
Otu00161
Otu00156
Otu00131
Otu00106
Taxa
Otu00047 Debiased
Non−debiased
Otu00044
Otu00042
Otu00036
Otu00013
−5 −4 −3 −2 −1 0 1
Log2FoldChange
Disease: Case v.s. DiarrhealControl

15 Otu00013
10
−Log10Padj
padj>0.1 & lfc<=1

padj>0.1 & lfc>1
padj<=0.1 & lfc<=1
padj<=0.1 & lfc>1
Otu00047
Otu00042
Otu00044 Otu00106 Otu00156

Otu00131
Otu00036
0
−4 −3 −2 −1 0 1
Log2FoldChange
18
Fig. S16: Effect size plot (A) of differential taxa at FDR level of 0.1 and volcano plot (B) for
the CDI dataset. The “Debiased” points represent the bias-corrected regression coefficients,
and “Non-debiased” points represent the original (biased) regression coefficients. The error
bars represent the 95% CIs of the “Debiased” points. The taxa in black are detected by
LinDA, taxa in red are detected solely by LinDA, and the taxa in blue are missed by LinDA
but detected by one or more of the other methods (A).
A
Disease: Crohn's disease v.s. Healthy

529292
469972
287666
237324
199013
15054
194924
300644
196082
182994
269360
234912
176115
176014
319681
192252
33833
350467
183824
332860
188238
186456
179903
Taxa
470392
546227 Debiased
179655
72853
Non−debiased
179381
470172
294672
245916
193312
584417
208565
469991
358798
319455
426436
299777
204932
329241
178915
208543
290251
204072
469888
203708
308873
16076
−2 0 2 4 6
Log2FoldChange
Disease: Crohn's disease v.s. Healthy
183824
294672
192252
4 182994
194924
204072 470392
−Log10Padj
padj>0.1 & lfc<=1

176014
padj>0.1 & lfc>1
196082 padj<=0.1 & lfc<=1
269360 208565 padj<=0.1 & lfc>1
186456 193312 179903
2 350467
426436 178915
358798
179655 300644 72853
546227
319455 33833 234912
308873 299777 290251
245916 188238 208543 329241
179381 16076
469991 584417 203708
469888 319681 176115
204932
332860470172
−2 0 2 4
Log2FoldChange
19
the IBD dataset. The “Debiased” points represent the bias-corrected regression coefficients,
LinDA, taxa in red are detected solely by LinDA, and taxa in blue are missed by LinDA
but detected by two or more of the other methods (A).
A
Disease: HLT v.s. NORA

Otu784
Otu414
Otu347
Otu16
Otu934
Otu870
Otu859
Otu857
Otu853
Otu830
Otu794
Otu792
Otu760
Otu735
Otu722
Otu704
Otu686
Otu685
Otu629
Otu627
Otu612
Otu611
Otu608
Otu581
Otu555
Otu542
Otu494
Otu480
Otu454
Otu453
Otu436
Otu430
Taxa
Otu429
Otu427
Otu417 Debiased
Otu411
Otu409
Otu389
Non−debiased
Otu341
Otu336
Otu320
Otu296
Otu294
Otu287
Otu264
Otu261
Otu249
Otu236
Otu235
Otu232
Otu207
Otu204
Otu176
Otu172
Otu151
Otu147
Otu98
Otu95
Otu92
Otu70
Otu58
Otu50
Otu47
Otu23
Otu22
Otu20
Otu19
Otu4
Otu2
−5 0 5
Log2FoldChange
Disease: HLT v.s. NORA

3
Otu92
Otu4
Otu20 Otu235
Otu236 Otu685
Otu336 Otu454
Otu429 Otu627
Otu264 Otu494
Otu722
Otu58 Otu98
Otu287 Otu735
2 Otu294 Otu411
Otu480 Otu555 Otu341
Otu204
−Log10Padj
Otu389 Otu581 Otu934

Otu296 Otu70
Otu232 Otu249 Otu792 Otu417 Otu857 padj>0.1 & lfc<=1
Otu151
Otu23 Otu261 Otu859 Otu760 padj>0.1 & lfc>1
Otu19 Otu207 Otu612 Otu794 Otu870 Otu95
Otu172 padj<=0.1 & lfc<=1
Otu47 Otu22 Otu430 Otu427 Otu608 Otu176
Otu611
padj<=0.1 & lfc>1
Otu2 Otu853 Otu453 Otu436
Otu409 Otu147 Otu704 Otu629
1 Otu50 Otu320 Otu830
Otu686
Otu542
−4 −2 0 2 4 6
Log2FoldChange
20
the RA dataset. The “Debiased” points represent the bias-corrected regression coefficients,
LinDA, taxa in red are detected solely by LinDA, and taxa in blue are missed by LinDA
but detected by two or more of the other methods (A).
A
Smoke: n v.s. y
573384
570119
529659
518865
484437
470738
469920
428237
239506
237323
Taxa
191687 Debiased
Non−debiased
186277
185969
149109
94166
92743
86047
74391
70671
15555
3931
−4 −2 0 2
Log2FoldChange
Smoke: n v.s. y
2.0
470738 3931
518865 237323 149109

94166 573384 570119
239506
86047
1.5
428237
92743 185969 484437 529659
74391 469920 70671

−Log10Padj
191687 15555
186277 padj>0.1 & lfc<=1
1.0 padj>0.1 & lfc>1
padj<=0.1 & lfc<=1
padj<=0.1 & lfc>1
0.5
0.0
−3 −2 −1 0 1 2
Log2FoldChange
21
Fig. S19: Effect size plot (A) of differential taxa detected by LinDA at FDR level of 0.1
and volcano plot (B) for the SMOKE dataset. The “Debiased” points represent the bias-
corrected regression coefficients, and “Non-debiased” points represent the original (biased)
regression coefficients. The error bars represent the 95% CIs of the “Debiased” points. The
taxa in black are detected by LinDA, taxa in red are detected by LinDA but missed by
MaAsLin2, and no taxa are detected by MaAsLin2 but missed by LinDA (A).
A n = 50 n = 200
0.8
0.6
Sparse Signal
0.4
Method
0.2 LinDA
ANCOM−BC
ALDEx2
0.0
metagenomeSeq2
0.8 MaAsLin2
Wilcoxon
DESeq2
0.6 edgeR
Dense Signal
metagenomeSeq
0.4
0.2
0.0
2 4 6 2 4 6
Signal Strength
B n = 50 n = 200
0.75
Sparse Signal
0.50
Method
True Positive Rate
0.25
LinDA
ANCOM−BC
ALDEx2
0.00
metagenomeSeq2
MaAsLin2
Wilcoxon
0.75
DESeq2
edgeR
Dense Signal
metagenomeSeq
0.50
0.25
0.00
2 4 6 2 4 6
Signal Strength
22
Fig. S20: Full performance comparison (S0C0: log normal abundance distribution, a binary
over 100 simulation runs. The dashed horizontal line (A) indicates the target FDR level of
0.05.
A n = 50 n = 200
0.6
Sparse Signal
0.4
0.2
Method
LinDA
ANCOM−BC
0.0 ALDEx2
MaAsLin2
Spearman
DESeq2
0.6
edgeR
Dense Signal
metagenomeSeq
0.4
0.2
0.0
2 4 6 2 4 6
Signal Strength
B n = 50 n = 200
1.00
0.75
Sparse Signal
0.50
Method
True Positive Rate
0.25
LinDA
ANCOM−BC
0.00 ALDEx2
1.00 MaAsLin2
Spearman
DESeq2
0.75 edgeR
Dense Signal
metagenomeSeq
0.50
0.25
0.00
2 4 6 2 4 6
Signal Strength
23
Fig. S21: Full performance comparison (S0C1: log normal abundance distribution, a con-
tinuous covariate). Empirical false discovery rate (A) and true positive rates (B) were
FDR level of 0.05.
A n = 50 n = 200
0.8
0.6
Sparse Signal
0.4
Method
0.2
LinDA
ANCOM−BC
0.0 ALDEx2
MaAsLin2
0.8
Wilcoxon
DESeq2
0.6 edgeR
Dense Signal
metagenomeSeq
0.4
0.2
0.0
2 4 6 2 4 6
Signal Strength
B n = 50 n = 200
0.8
0.6
Sparse Signal
0.4
Method
True Positive Rate
0.2
LinDA
ANCOM−BC
0.0 ALDEx2
MaAsLin2
0.8 Wilcoxon
DESeq2
edgeR
0.6
Dense Signal
metagenomeSeq
0.4
0.2
0.0
2 4 6 2 4 6
Signal Strength
24
Fig. S22: Full performance comparison (S0C2: log normal abundance distribution, a binary
variable of interest and two confounders). Empirical false discovery rate (A) and true
A n = 50 n = 200
0.8
0.6
Sparse Signal
0.4
Method
0.2 LinDA
ANCOM−BC
ALDEx2
0.0
metagenomeSeq2
0.8 MaAsLin2
Wilcoxon
DESeq2
0.6 edgeR
Dense Signal
metagenomeSeq
0.4
0.2
0.0
2 4 6 2 4 6
Signal Strength
B n = 50 n = 200
0.75
Sparse Signal
0.50
Method
True Positive Rate
0.25 LinDA
ANCOM−BC
ALDEx2
metagenomeSeq2
MaAsLin2
Wilcoxon
0.75 DESeq2
edgeR
Dense Signal
metagenomeSeq
0.50
0.25
2 4 6 2 4 6
Signal Strength
25
Fig. S23: Full performance comparison (S1C0: zero inflated absolute abundances, a binary
0.05.
A n = 50 n = 200
0.8
0.6
Sparse Signal
0.4
Method
0.2 LinDA
ANCOM−BC
ALDEx2
0.0
metagenomeSeq2
0.8 MaAsLin2
Wilcoxon
DESeq2
0.6
edgeR
Dense Signal
metagenomeSeq
0.4
0.2
0.0
2 4 6 2 4 6
Signal Strength
B n = 50 n = 200
0.75
Sparse Signal
0.50
Method
True Positive Rate
0.25
LinDA
ANCOM−BC
ALDEx2
0.00
metagenomeSeq2
MaAsLin2
Wilcoxon
0.75 DESeq2
edgeR
Dense Signal
metagenomeSeq
0.50
0.25
0.00
2 4 6 2 4 6
Signal Strength
26
Fig. S24: Full performance comparison (S2C0: correlated absolute abundances, a binary
0.05.
A n = 50 n = 200
0.6
Sparse Signal
0.4
Method
0.2
LinDA
ANCOM−BC
ALDEx2
0.0 metagenomeSeq2
MaAsLin2
0.6 Wilcoxon
DESeq2
edgeR
Dense Signal
0.4 metagenomeSeq
0.2
0.0
2 4 6 2 4 6
Signal Strength
B n = 50 n = 200
1.00
0.75
Sparse Signal
0.50
Method
True Positive Rate
LinDA
0.25 ANCOM−BC
ALDEx2
metagenomeSeq2
1.00 MaAsLin2
Wilcoxon
DESeq2
0.75 edgeR
Dense Signal
metagenomeSeq
0.50
0.25
2 4 6 2 4 6
Signal Strength
27
Fig. S25: Full performance comparison (S3C0: gamma abundance distribution, a binary
0.05.
A n = 50 n = 200
0.8
0.6
Sparse Signal
0.4
Method
0.2 LinDA
ANCOM−BC
ALDEx2
0.0
metagenomeSeq2
0.8 MaAsLin2
Wilcoxon
DESeq2
0.6
edgeR
Dense Signal
metagenomeSeq
0.4
0.2
0.0
2 4 6 2 4 6
Signal Strength
B n = 50 n = 200
0.6
Sparse Signal
0.4
0.2 Method
True Positive Rate
LinDA
ANCOM−BC
ALDEx2
0.0
metagenomeSeq2
0.6 MaAsLin2
Wilcoxon
DESeq2
edgeR
Dense Signal
0.4
metagenomeSeq
0.2
0.0
2 4 6 2 4 6
Signal Strength
28
Fig. S26: Full performance comparison (S4C0: smaller m, a binary covariate). Empirical
false discovery rate (A) and true positive rates (B) were averaged over 1000 simulation
A n = 20 n = 30
0.8
0.6
Sparse Signal
0.4
Method
0.2 LinDA
ANCOM−BC
ALDEx2
0.0
metagenomeSeq2
MaAsLin2
0.8
Wilcoxon
DESeq2
0.6 edgeR
Dense Signal
metagenomeSeq
0.4
0.2
0.0
2 4 6 2 4 6
Signal Strength
B n = 20 n = 30
0.6
Sparse Signal
0.4
0.2 Method
True Positive Rate
LinDA
ANCOM−BC
ALDEx2
0.0
metagenomeSeq2
0.6 MaAsLin2
Wilcoxon
DESeq2
edgeR
Dense Signal
0.4
metagenomeSeq
0.2
0.0
2 4 6 2 4 6
Signal Strength
29
Fig. S27: Full performance comparison (S5C0: smaller n, a binary covariate). Empirical
The dashed horizontal line (A) indicates the target FDR level of 0.05.
A n = 50 n = 200
1.00
0.75
Sparse Signal
0.50
Method
0.25 LinDA
ANCOM−BC
ALDEx2
0.00
metagenomeSeq2
1.00 MaAsLin2
Wilcoxon
DESeq2
0.75
edgeR
Dense Signal
metagenomeSeq
0.50
0.25
0.00
2 4 6 2 4 6
Signal Strength
B n = 50 n = 200
1.00
0.75
Sparse Signal
0.50
Method
True Positive Rate
0.25 LinDA
ANCOM−BC
ALDEx2
0.00 metagenomeSeq2
1.00 MaAsLin2
Wilcoxon
DESeq2
0.75 edgeR
Dense Signal
metagenomeSeq
0.50
0.25
0.00
2 4 6 2 4 6
Signal Strength
30
Fig. S28: Full performance comparison (S6C0: 10-fold difference in library size, a binary
0.05.
A n = 50 n = 200
0.8
0.6
Sparse Signal
0.4
Method
0.2 LinDA
ANCOM−BC
ALDEx2
0.0
metagenomeSeq2
MaAsLin2
0.8 Wilcoxon
DESeq2
0.6 edgeR
Dense Signal
metagenomeSeq
0.4
0.2
0.0
2 4 6 2 4 6
Signal Strength
B n = 50 n = 200
0.75
Sparse Signal
0.50
Method
True Positive Rate
0.25 LinDA
ANCOM−BC
ALDEx2
0.00 metagenomeSeq2
MaAsLin2
Wilcoxon
0.75
DESeq2
edgeR
Dense Signal
metagenomeSeq
0.50
0.25
0.00
2 4 6 2 4 6
Signal Strength
31
Fig. S29: Full performance comparison (S7C0: negative binomial abundance distribution,
a binary covariate). Empirical false discovery rate (A) and true positive rates (B) were
FDR level of 0.05.
A n = 50 n = 200
0.75
Sparse Signal
0.50
Method
0.25
LinDA
ANCOM−BC
ALDEx2
0.00
metagenomeSeq2
MaAsLin2
Wilcoxon
0.75 DESeq2
edgeR
Dense Signal
metagenomeSeq
0.50
0.25
0.00
2 4 6 2 4 6
Signal Strength
B n = 50 n = 200
0.5
0.4
Sparse Signal
0.3
0.2
Method
True Positive Rate
0.1 LinDA
ANCOM−BC
ALDEx2
0.0
metagenomeSeq2
0.5 MaAsLin2
Wilcoxon
0.4 DESeq2
edgeR
Dense Signal
0.3 metagenomeSeq
0.2
0.1
0.0
2 4 6 2 4 6
Signal Strength
32
Fig. S30: Full performance comparison (S0C0 with strong compositional effects). Empirical
The dashed horizontal line (A) indicates the target FDR level of 0.05.

2022 2655 Moesm2 Esm

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2022 2655 Moesm2 Esm

Uploaded by

Copyright:

Available Formats

Supplementary figures for “LinDA: linear

models for differential abundance analysis of

S2 Additional results of real data applications

S3 Full comparisons of numerical studies

Fig. S11: Performance of ANCOM-BC disabling (ANCOM-BC-1) and enabling (ANCOM-

Fig. S12: Performance comparison (S7C0: negative binomial abundance distribution, a

Fig. S13: Performance comparison (S8.1C0: pre-treatment and post-treatment comparison,

Disease: Case v.s. DiarrhealControl

Disease: Case v.s. DiarrhealControl

padj>0.1 & lfc<=1

Otu00044 Otu00106 Otu00156

Disease: Crohn's disease v.s. Healthy

Disease: Crohn's disease v.s. Healthy

padj>0.1 & lfc<=1

Disease: HLT v.s. NORA

Disease: HLT v.s. NORA

Otu389 Otu581 Otu934

518865 237323 149109

92743 185969 484437 529659

74391 469920 70671

You might also like