You are on page 1of 32

Supplementary figures for “LinDA: linear

models for differential abundance analysis of


microbiome compositional data”

1
S1 Additional main comparisons of numerical studies
Fig. S1 and Fig. S2 compare the proposed method LinDA with different zero-handling
approaches under settings S6C0 and S0C0. Fig. S3 depicts the results of LinDA, CLR-OLS
and MaAsLin2 with different normalization approaches under setting S0C0. Fig. S4–S10,
S12 and S13–S14 show the results of settings S0C1, S0C2, S1C0, S2C0, S4C0, S5C0, S6C0,
S7C0, S8.1C0, and S8.2C0, respectively. The comparison between disabling and enabling
zero treatment of the ANCOM-BC method is depicted in Fig. S11 under setting S6C0.
Fig. S15 shows the results of setting S0C0 with stronger compositional effects.

S2 Additional results of real data applications


Fig. S16–S19 show the effect size plots and volcano plots for the four datasets (CDI, IBD,
RA, and SMOKE) respectively.

S3 Full comparisons of numerical studies


Fig. S20–S30 present the full result of all methods under different simulation settings.

2
A n = 50 n = 200

0.8

0.6

Sparse Signal
Empirical False Discovery Rate

0.4

0.2
Method
Adaptive
0.8 Pseudo−count
Imputation

0.6

Dense Signal
0.4

0.2

2 4 6 2 4 6
Signal Strength

B n = 50 n = 200
1.0
Sparse Signal

0.8
True Positive Rate

0.6

Method
Adaptive
1.0 Pseudo−count
Imputation
Dense Signal

0.8

0.6

2 4 6 2 4 6
Signal Strength
3

Fig. S1: Performance of LinDA with different zero-handling approaches (S6C0: 10-fold
difference in library size, a binary covariate). Empirical false discovery rate (A) and true
positive rates (B) were averaged over 100 simulation runs. The dashed horizontal line (A)
indicates the target FDR level of 0.05. Note that the red and blue lines are overlapped as
the covariate and sequencing depth are significantly correlated.
A n = 50 n = 200
0.05

Sparse Signal
0.04
Empirical False Discovery Rate

0.03

0.02 Method
Adaptive
0.05 Pseudo−count
Imputation

Dense Signal
0.04

0.03

0.02

2 4 6 2 4 6
Signal Strength

B n = 50 n = 200

0.8

0.7
Sparse Signal

0.6

0.5
True Positive Rate

0.4
Method
0.3 Adaptive
Pseudo−count
0.8 Imputation

0.7
Dense Signal

0.6

0.5

0.4

0.3
2 4 6 2 4 6
Signal Strength
4

Fig. S2: Performance of LinDA with different zero-handling approaches (S0C0: log normal
abundance distribution, a binary covariate). Empirical false discovery rate (A) and true
positive rates (B) were averaged over 100 simulation runs. The dashed horizontal line (A)
indicates the target FDR level of 0.05.
A n = 50 n = 200

0.4

Sparse Signal
0.3
Empirical False Discovery Rate

0.2

0.1 Method
LinDA
0.0 CLR−OLS
MaAsLin2−TSS
0.4 MaAsLin2−TMM
MaAsLin2−CSS
MaAsLin2−CLR

Dense Signal
0.3

0.2

0.1

0.0
2 4 6 2 4 6
Signal Strength

B n = 50 n = 200

0.8
Sparse Signal

0.6
True Positive Rate

0.4 Method
LinDA
CLR−OLS
MaAsLin2−TSS
MaAsLin2−TMM
0.8
MaAsLin2−CSS
MaAsLin2−CLR
Dense Signal

0.6

0.4

2 4 6 2 4 6
Signal Strength
5

Fig. S3: Performance comparison between LinDA and MaAsLin2 (S0C0: log normal abun-
dance distribution, a binary covariate). Empirical false discovery rate (A) and true positive
rates (B) were averaged over 100 simulation runs. The dashed horizontal line (A) indicates
the target FDR level of 0.05.
A n = 50 n = 200
0.20

0.15

Sparse Signal
Empirical False Discovery Rate

0.10

0.05
Method
LinDA
0.00
ANCOM−BC
0.20 ALDEx2
MaAsLin2
Spearman
0.15

Dense Signal
0.10

0.05

0.00
2 4 6 2 4 6
Signal Strength

B n = 50 n = 200

0.75
Sparse Signal

0.50
True Positive Rate

0.25
Method
LinDA
0.00
ANCOM−BC
ALDEx2
MaAsLin2
0.75 Spearman
Dense Signal

0.50

0.25

0.00
2 4 6 2 4 6
Signal Strength
6

Fig. S4: Performance comparison (S0C1: log normal abundance distribution, a continuous
covariate). Empirical false discovery rate (A) and true positive rates (B) were averaged
over 100 simulation runs. Error bars (A) represent the 95% CIs of the method LinDA and
the dashed horizontal line indicates the target FDR level of 0.05.
A n = 50 n = 200

0.6

Sparse Signal
0.4
Empirical False Discovery Rate

0.2

Method
LinDA
0.0
ANCOM−BC
0.6 ALDEx2
MaAsLin2
Wilcoxon

Dense Signal
0.4

0.2

0.0
2 4 6 2 4 6
Signal Strength

B n = 50 n = 200
0.8

0.6
Sparse Signal

0.4
True Positive Rate

0.2
Method
LinDA
0.0
ANCOM−BC
0.8 ALDEx2
MaAsLin2
Wilcoxon
0.6
Dense Signal

0.4

0.2

0.0
2 4 6 2 4 6
Signal Strength
7

Fig. S5: Performance comparison (S0C2: log normal abundance distribution, a binary
variable of interest and two confounders). Empirical false discovery rate (A) and true
positive rates (B) were averaged over 100 simulation runs. Error bars (A) represent the
95% CIs of the method LinDA and the dashed horizontal line indicates the target FDR
level of 0.05.
A n = 50 n = 200

0.2

Sparse Signal
Empirical False Discovery Rate

0.1

Method
LinDA
0.0 ANCOM−BC
ALDEx2
metagenomeSeq2
MaAsLin2
0.2 Wilcoxon

Dense Signal
0.1

0.0
2 4 6 2 4 6
Signal Strength

B n = 50 n = 200

0.8

0.6
Sparse Signal

0.4
True Positive Rate

0.2 Method
LinDA
ANCOM−BC
ALDEx2
0.8 metagenomeSeq2
MaAsLin2
Wilcoxon
0.6
Dense Signal

0.4

0.2

2 4 6 2 4 6
Signal Strength
8

Fig. S6: Performance comparison (S1C0: zero inflated absolute abundances, a binary co-
variate). Empirical false discovery rate (A) and true positive rates (B) were averaged over
100 simulation runs. Error bars (A) represent the 95% CIs of the method LinDA and the
dashed horizontal line indicates the target FDR level of 0.05.
A n = 50 n = 200
0.4

0.3

Sparse Signal
Empirical False Discovery Rate

0.2

0.1 Method
LinDA
0.0 ANCOM−BC
ALDEx2
0.4 metagenomeSeq2
MaAsLin2
0.3 Wilcoxon

Dense Signal
0.2

0.1

0.0
2 4 6 2 4 6
Signal Strength

B n = 50 n = 200

0.75
Sparse Signal

0.50

0.25
True Positive Rate

Method
LinDA
0.00 ANCOM−BC
ALDEx2
metagenomeSeq2
0.75 MaAsLin2
Wilcoxon
Dense Signal

0.50

0.25

0.00
2 4 6 2 4 6
Signal Strength
9

Fig. S7: Performance comparison (S2C0: correlated absolute abundances, a binary covari-
ate). Empirical false discovery rate (A) and true positive rates (B) were averaged over
100 simulation runs. Error bars (A) represent the 95% CIs of the method LinDA and the
dashed horizontal line indicates the target FDR level of 0.05.
A n = 50 n = 200

0.10

Sparse Signal
Empirical False Discovery Rate

0.05
Method
LinDA
0.00 ANCOM−BC
ALDEx2
metagenomeSeq2
MaAsLin2
Wilcoxon
0.10

Dense Signal
0.05

0.00
2 4 6 2 4 6
Signal Strength

B n = 50 n = 200
0.6
Sparse Signal

0.4

0.2
True Positive Rate

Method
LinDA
0.0 ANCOM−BC
ALDEx2
0.6
metagenomeSeq2
MaAsLin2
Wilcoxon
Dense Signal

0.4

0.2

0.0
2 4 6 2 4 6
Signal Strength
10

Fig. S8: Performance comparison (S4C0: smaller m, a binary covariate). Empirical false
discovery rate (A) and true positive rates (B) were averaged over 1000 simulation runs.
Error bars (A) represent the 95% CIs of the method LinDA and the dashed horizontal line
indicates the target FDR level of 0.05.
A n = 20 n = 30
0.3

Sparse Signal
0.2
Empirical False Discovery Rate

0.1
Method
LinDA
0.0 ANCOM−BC
0.3 ALDEx2
metagenomeSeq2
MaAsLin2
Wilcoxon

Dense Signal
0.2

0.1

0.0
2 4 6 2 4 6
Signal Strength

B n = 20 n = 30

0.5

0.4
Sparse Signal

0.3

0.2
True Positive Rate

0.1 Method
LinDA
0.0 ANCOM−BC
ALDEx2
0.5 metagenomeSeq2
MaAsLin2
0.4 Wilcoxon
Dense Signal

0.3

0.2

0.1

0.0
2 4 6 2 4 6
Signal Strength
11

Fig. S9: Performance comparison (S5C0: smaller n, a binary covariate). Empirical false
discovery rate (A) and true positive rates (B) were averaged over 100 simulation runs.
Error bars (A) represent the 95% CIs of the method LinDA and the dashed horizontal line
indicates the target FDR level of 0.05.
A n = 50 n = 200

0.75

Sparse Signal
Empirical False Discovery Rate

0.50

0.25
Method
LinDA
0.00 ANCOM−BC
ALDEx2
metagenomeSeq2
MaAsLin2
0.75 Wilcoxon

Dense Signal
0.50

0.25

0.00
2 4 6 2 4 6
Signal Strength

B n = 50 n = 200
1.00

0.75
Sparse Signal

0.50
True Positive Rate

0.25 Method
LinDA
ANCOM−BC
0.00
ALDEx2
1.00
metagenomeSeq2
MaAsLin2
0.75 Wilcoxon
Dense Signal

0.50

0.25

0.00
2 4 6 2 4 6
Signal Strength
12

Fig. S10: Performance comparison (S6C0: 10-fold difference in library size, a binary co-
variate). Empirical false discovery rate (A) and true positive rates (B) were averaged over
100 simulation runs. Error bars (A) represent the 95% CIs of the method LinDA and the
dashed horizontal line indicates the target FDR level of 0.05.
A n = 50 n = 200

0.8

Sparse Signal
0.6
Empirical False Discovery Rate

0.4

0.2

Method
ANCOM−BC−1
0.8 ANCOM−BC−2

0.6

Dense Signal
0.4

0.2

2 4 6 2 4 6
Signal Strength

B n = 50 n = 200
1.000

0.975
Sparse Signal

0.950

0.925
True Positive Rate

0.900 Method
1.000 ANCOM−BC−1
ANCOM−BC−2

0.975
Dense Signal

0.950

0.925

0.900

2 4 6 2 4 6
Signal Strength
13

Fig. S11: Performance of ANCOM-BC disabling (ANCOM-BC-1) and enabling (ANCOM-


BC-2) zero treatment (S6C0: 10-fold difference in library size, a binary covariate). Empir-
ical false discovery rate (A) and true positive rates (B) were averaged over 100 simulation
runs. The dashed horizontal line (A) indicates the target FDR level of 0.05.
A n = 50 n = 200

0.15

Sparse Signal
0.10
Empirical False Discovery Rate

0.05
Method
LinDA
0.00 ANCOM−BC
ALDEx2
metagenomeSeq2
0.15 MaAsLin2
Wilcoxon

Dense Signal
0.10

0.05

0.00
2 4 6 2 4 6
Signal Strength

B n = 50 n = 200

0.6
Sparse Signal

0.4
True Positive Rate

0.2 Method
LinDA
ANCOM−BC
0.0
ALDEx2
metagenomeSeq2
MaAsLin2
0.6 Wilcoxon
Dense Signal

0.4

0.2

0.0
2 4 6 2 4 6
Signal Strength
14

Fig. S12: Performance comparison (S7C0: negative binomial abundance distribution, a


binary covariate). Empirical false discovery rate (A) and true positive rates (B) were
averaged over 100 simulation runs. Error bars (A) represent the 95% CIs of the method
LinDA and the dashed horizontal line indicates the target FDR level of 0.05.
A n = 50 n = 200

0.2

Sparse Signal
Empirical False Discovery Rate

0.1

Method
LinDA−LMM
0.0
LinDA−OLS
CLR−LMM
CLR−OLS
MaAsLin2
0.2

Dense Signal
0.1

0.0
2 4 6 2 4 6
Signal Strength

B n = 50 n = 200
0.8

0.6
Sparse Signal

0.4
True Positive Rate

0.2
Method
LinDA−LMM
0.0 LinDA−OLS
0.8 CLR−LMM
CLR−OLS
MaAsLin2
0.6
Dense Signal

0.4

0.2

0.0
2 4 6 2 4 6
Signal Strength
15

Fig. S13: Performance comparison (S8.1C0: pre-treatment and post-treatment comparison,


a binary covariate). Empirical false discovery rate (A) and true positive rates (B) were
averaged over 100 simulation runs. The dashed horizontal line (A) indicates the target
FDR level of 0.05.
A n = 50 n = 200

0.3

Sparse Signal
0.2
Empirical False Discovery Rate

0.1
Method
LinDA−LMM
0.0
LinDA−OLS
CLR−LMM
CLR−OLS
0.3
MaAsLin2

Dense Signal
0.2

0.1

0.0
2 4 6 2 4 6
Signal Strength

B n = 50 n = 200

0.6
Sparse Signal

0.4
True Positive Rate

0.2
Method
LinDA−LMM
0.0
LinDA−OLS
CLR−LMM
CLR−OLS
0.6 MaAsLin2
Dense Signal

0.4

0.2

0.0
2 4 6 2 4 6
Signal Strength
16

Fig. S14: Performance comparison (S8.2C0: replicate sampling, a binary covariate). Em-
pirical false discovery rate (A) and true positive rates (B) were averaged over 100 simulation
runs. The dashed horizontal line (A) indicates the target FDR level of 0.05.
A n = 50 n = 200

0.6

Sparse Signal
0.4
Empirical False Discovery Rate

0.2
Method
LinDA
0.0 ANCOM−BC
ALDEx2
metagenomeSeq2
0.6
MaAsLin2
Wilcoxon

Dense Signal
0.4

0.2

0.0
2 4 6 2 4 6
Signal Strength

B n = 50 n = 200

0.2
Sparse Signal

0.1
True Positive Rate

Method
LinDA
0.0 ANCOM−BC
ALDEx2
metagenomeSeq2
MaAsLin2
0.2 Wilcoxon
Dense Signal

0.1

0.0
2 4 6 2 4 6
Signal Strength
17

Fig. S15: Performance comparison (S0C0 with strong compositional effects). Empirical
false discovery rate (A) and true positive rates (B) were averaged over 100 simulation runs.
Error bars (A) represent the 95% CIs of the method LinDA and the dashed horizontal line
indicates the target FDR level of 0.05.
A

Disease: Case v.s. DiarrhealControl

Otu00161

Otu00156

Otu00131

Otu00106
Taxa

Otu00047 Debiased
Non−debiased

Otu00044

Otu00042

Otu00036

Otu00013

−5 −4 −3 −2 −1 0 1
Log2FoldChange

Disease: Case v.s. DiarrhealControl


15 Otu00013

10
−Log10Padj

padj>0.1 & lfc<=1


padj>0.1 & lfc>1
padj<=0.1 & lfc<=1
padj<=0.1 & lfc>1

Otu00047

Otu00042

Otu00044 Otu00106 Otu00156


Otu00131
Otu00036
0

−4 −3 −2 −1 0 1
Log2FoldChange
18
Fig. S16: Effect size plot (A) of differential taxa at FDR level of 0.1 and volcano plot (B) for
the CDI dataset. The “Debiased” points represent the bias-corrected regression coefficients,
and “Non-debiased” points represent the original (biased) regression coefficients. The error
bars represent the 95% CIs of the “Debiased” points. The taxa in black are detected by
LinDA, taxa in red are detected solely by LinDA, and the taxa in blue are missed by LinDA
but detected by one or more of the other methods (A).
A

Disease: Crohn's disease v.s. Healthy


529292
469972
287666
237324
199013
15054
194924
300644
196082
182994
269360
234912
176115
176014
319681
192252
33833
350467
183824
332860
188238
186456
179903
Taxa

470392
546227 Debiased
179655
72853
Non−debiased
179381
470172
294672
245916
193312
584417
208565
469991
358798
319455
426436
299777
204932
329241
178915
208543
290251
204072
469888
203708
308873
16076

−2 0 2 4 6
Log2FoldChange

Disease: Crohn's disease v.s. Healthy

183824

294672
192252

4 182994
194924
204072 470392
−Log10Padj

padj>0.1 & lfc<=1


176014
padj>0.1 & lfc>1
196082 padj<=0.1 & lfc<=1
269360 208565 padj<=0.1 & lfc>1
186456 193312 179903
2 350467
426436 178915
358798
179655 300644 72853
546227
319455 33833 234912
308873 299777 290251
245916 188238 208543 329241
179381 16076
469991 584417 203708
469888 319681 176115
204932
332860470172

−2 0 2 4
Log2FoldChange
19
Fig. S17: Effect size plot (A) of differential taxa at FDR level of 0.1 and volcano plot (B) for
the IBD dataset. The “Debiased” points represent the bias-corrected regression coefficients,
and “Non-debiased” points represent the original (biased) regression coefficients. The error
bars represent the 95% CIs of the “Debiased” points. The taxa in black are detected by
LinDA, taxa in red are detected solely by LinDA, and taxa in blue are missed by LinDA
but detected by two or more of the other methods (A).
A

Disease: HLT v.s. NORA


Otu784
Otu414
Otu347
Otu16
Otu934
Otu870
Otu859
Otu857
Otu853
Otu830
Otu794
Otu792
Otu760
Otu735
Otu722
Otu704
Otu686
Otu685
Otu629
Otu627
Otu612
Otu611
Otu608
Otu581
Otu555
Otu542
Otu494
Otu480
Otu454
Otu453
Otu436
Otu430
Taxa

Otu429
Otu427
Otu417 Debiased
Otu411
Otu409
Otu389
Non−debiased
Otu341
Otu336
Otu320
Otu296
Otu294
Otu287
Otu264
Otu261
Otu249
Otu236
Otu235
Otu232
Otu207
Otu204
Otu176
Otu172
Otu151
Otu147
Otu98
Otu95
Otu92
Otu70
Otu58
Otu50
Otu47
Otu23
Otu22
Otu20
Otu19
Otu4
Otu2

−5 0 5
Log2FoldChange

Disease: HLT v.s. NORA


3
Otu92

Otu4
Otu20 Otu235
Otu236 Otu685
Otu336 Otu454
Otu429 Otu627
Otu264 Otu494
Otu722
Otu58 Otu98
Otu287 Otu735
2 Otu294 Otu411
Otu480 Otu555 Otu341
Otu204
−Log10Padj

Otu389 Otu581 Otu934


Otu296 Otu70
Otu232 Otu249 Otu792 Otu417 Otu857 padj>0.1 & lfc<=1
Otu151
Otu23 Otu261 Otu859 Otu760 padj>0.1 & lfc>1
Otu19 Otu207 Otu612 Otu794 Otu870 Otu95
Otu172 padj<=0.1 & lfc<=1
Otu47 Otu22 Otu430 Otu427 Otu608 Otu176
Otu611
padj<=0.1 & lfc>1
Otu2 Otu853 Otu453 Otu436
Otu409 Otu147 Otu704 Otu629
1 Otu50 Otu320 Otu830
Otu686
Otu542

−4 −2 0 2 4 6
Log2FoldChange
20
Fig. S18: Effect size plot (A) of differential taxa at FDR level of 0.1 and volcano plot (B) for
the RA dataset. The “Debiased” points represent the bias-corrected regression coefficients,
and “Non-debiased” points represent the original (biased) regression coefficients. The error
bars represent the 95% CIs of the “Debiased” points. The taxa in black are detected by
LinDA, taxa in red are detected solely by LinDA, and taxa in blue are missed by LinDA
but detected by two or more of the other methods (A).
A

Smoke: n v.s. y
573384

570119

529659

518865

484437

470738

469920

428237

239506

237323
Taxa

191687 Debiased
Non−debiased
186277

185969

149109

94166

92743

86047

74391

70671

15555

3931

−4 −2 0 2
Log2FoldChange

Smoke: n v.s. y
2.0
470738 3931

518865 237323 149109


94166 573384 570119
239506
86047

1.5

428237

92743 185969 484437 529659

74391 469920 70671


−Log10Padj

191687 15555
186277 padj>0.1 & lfc<=1
1.0 padj>0.1 & lfc>1
padj<=0.1 & lfc<=1
padj<=0.1 & lfc>1

0.5

0.0

−3 −2 −1 0 1 2
Log2FoldChange
21
Fig. S19: Effect size plot (A) of differential taxa detected by LinDA at FDR level of 0.1
and volcano plot (B) for the SMOKE dataset. The “Debiased” points represent the bias-
corrected regression coefficients, and “Non-debiased” points represent the original (biased)
regression coefficients. The error bars represent the 95% CIs of the “Debiased” points. The
taxa in black are detected by LinDA, taxa in red are detected by LinDA but missed by
MaAsLin2, and no taxa are detected by MaAsLin2 but missed by LinDA (A).
A n = 50 n = 200
0.8

0.6

Sparse Signal
Empirical False Discovery Rate

0.4

Method
0.2 LinDA
ANCOM−BC
ALDEx2
0.0
metagenomeSeq2
0.8 MaAsLin2
Wilcoxon
DESeq2
0.6 edgeR

Dense Signal
metagenomeSeq
0.4

0.2

0.0
2 4 6 2 4 6
Signal Strength

B n = 50 n = 200

0.75
Sparse Signal

0.50

Method
True Positive Rate

0.25
LinDA
ANCOM−BC
ALDEx2
0.00
metagenomeSeq2
MaAsLin2
Wilcoxon
0.75
DESeq2
edgeR
Dense Signal

metagenomeSeq
0.50

0.25

0.00
2 4 6 2 4 6
Signal Strength
22

Fig. S20: Full performance comparison (S0C0: log normal abundance distribution, a binary
covariate). Empirical false discovery rate (A) and true positive rates (B) were averaged
over 100 simulation runs. The dashed horizontal line (A) indicates the target FDR level of
0.05.
A n = 50 n = 200

0.6

Sparse Signal
Empirical False Discovery Rate

0.4

0.2
Method
LinDA
ANCOM−BC
0.0 ALDEx2
MaAsLin2
Spearman
DESeq2
0.6
edgeR

Dense Signal
metagenomeSeq
0.4

0.2

0.0
2 4 6 2 4 6
Signal Strength

B n = 50 n = 200
1.00

0.75
Sparse Signal

0.50

Method
True Positive Rate

0.25
LinDA
ANCOM−BC
0.00 ALDEx2
1.00 MaAsLin2
Spearman
DESeq2
0.75 edgeR
Dense Signal

metagenomeSeq

0.50

0.25

0.00
2 4 6 2 4 6
Signal Strength
23

Fig. S21: Full performance comparison (S0C1: log normal abundance distribution, a con-
tinuous covariate). Empirical false discovery rate (A) and true positive rates (B) were
averaged over 100 simulation runs. The dashed horizontal line (A) indicates the target
FDR level of 0.05.
A n = 50 n = 200
0.8

0.6

Sparse Signal
Empirical False Discovery Rate

0.4

Method
0.2
LinDA
ANCOM−BC
0.0 ALDEx2
MaAsLin2
0.8
Wilcoxon
DESeq2
0.6 edgeR

Dense Signal
metagenomeSeq

0.4

0.2

0.0
2 4 6 2 4 6
Signal Strength

B n = 50 n = 200

0.8

0.6
Sparse Signal

0.4

Method
True Positive Rate

0.2
LinDA
ANCOM−BC
0.0 ALDEx2
MaAsLin2
0.8 Wilcoxon
DESeq2
edgeR
0.6
Dense Signal

metagenomeSeq

0.4

0.2

0.0
2 4 6 2 4 6
Signal Strength
24

Fig. S22: Full performance comparison (S0C2: log normal abundance distribution, a binary
variable of interest and two confounders). Empirical false discovery rate (A) and true
positive rates (B) were averaged over 100 simulation runs. The dashed horizontal line (A)
indicates the target FDR level of 0.05.
A n = 50 n = 200
0.8

0.6

Sparse Signal
Empirical False Discovery Rate

0.4
Method
0.2 LinDA
ANCOM−BC
ALDEx2
0.0
metagenomeSeq2
0.8 MaAsLin2
Wilcoxon
DESeq2
0.6 edgeR

Dense Signal
metagenomeSeq
0.4

0.2

0.0
2 4 6 2 4 6
Signal Strength

B n = 50 n = 200

0.75
Sparse Signal

0.50

Method
True Positive Rate

0.25 LinDA
ANCOM−BC
ALDEx2
metagenomeSeq2
MaAsLin2
Wilcoxon
0.75 DESeq2
edgeR
Dense Signal

metagenomeSeq
0.50

0.25

2 4 6 2 4 6
Signal Strength
25

Fig. S23: Full performance comparison (S1C0: zero inflated absolute abundances, a binary
covariate). Empirical false discovery rate (A) and true positive rates (B) were averaged
over 100 simulation runs. The dashed horizontal line (A) indicates the target FDR level of
0.05.
A n = 50 n = 200
0.8

0.6

Sparse Signal
Empirical False Discovery Rate

0.4

Method
0.2 LinDA
ANCOM−BC
ALDEx2
0.0
metagenomeSeq2
0.8 MaAsLin2
Wilcoxon
DESeq2
0.6
edgeR

Dense Signal
metagenomeSeq
0.4

0.2

0.0
2 4 6 2 4 6
Signal Strength

B n = 50 n = 200

0.75
Sparse Signal

0.50

Method
True Positive Rate

0.25
LinDA
ANCOM−BC
ALDEx2
0.00
metagenomeSeq2
MaAsLin2
Wilcoxon
0.75 DESeq2
edgeR
Dense Signal

metagenomeSeq
0.50

0.25

0.00
2 4 6 2 4 6
Signal Strength
26

Fig. S24: Full performance comparison (S2C0: correlated absolute abundances, a binary
covariate). Empirical false discovery rate (A) and true positive rates (B) were averaged
over 100 simulation runs. The dashed horizontal line (A) indicates the target FDR level of
0.05.
A n = 50 n = 200

0.6

Sparse Signal
0.4
Empirical False Discovery Rate

Method
0.2
LinDA
ANCOM−BC
ALDEx2
0.0 metagenomeSeq2
MaAsLin2
0.6 Wilcoxon
DESeq2
edgeR

Dense Signal
0.4 metagenomeSeq

0.2

0.0
2 4 6 2 4 6
Signal Strength

B n = 50 n = 200
1.00

0.75
Sparse Signal

0.50
Method
True Positive Rate

LinDA
0.25 ANCOM−BC
ALDEx2
metagenomeSeq2
1.00 MaAsLin2
Wilcoxon
DESeq2
0.75 edgeR
Dense Signal

metagenomeSeq

0.50

0.25

2 4 6 2 4 6
Signal Strength
27

Fig. S25: Full performance comparison (S3C0: gamma abundance distribution, a binary
covariate). Empirical false discovery rate (A) and true positive rates (B) were averaged
over 100 simulation runs. The dashed horizontal line (A) indicates the target FDR level of
0.05.
A n = 50 n = 200
0.8

0.6

Sparse Signal
Empirical False Discovery Rate

0.4

Method
0.2 LinDA
ANCOM−BC
ALDEx2
0.0
metagenomeSeq2
0.8 MaAsLin2
Wilcoxon
DESeq2
0.6
edgeR

Dense Signal
metagenomeSeq
0.4

0.2

0.0
2 4 6 2 4 6
Signal Strength

B n = 50 n = 200
0.6
Sparse Signal

0.4

0.2 Method
True Positive Rate

LinDA
ANCOM−BC
ALDEx2
0.0
metagenomeSeq2
0.6 MaAsLin2
Wilcoxon
DESeq2
edgeR
Dense Signal

0.4
metagenomeSeq

0.2

0.0
2 4 6 2 4 6
Signal Strength
28

Fig. S26: Full performance comparison (S4C0: smaller m, a binary covariate). Empirical
false discovery rate (A) and true positive rates (B) were averaged over 1000 simulation
runs. The dashed horizontal line (A) indicates the target FDR level of 0.05.
A n = 20 n = 30

0.8

0.6

Sparse Signal
Empirical False Discovery Rate

0.4
Method
0.2 LinDA
ANCOM−BC
ALDEx2
0.0
metagenomeSeq2
MaAsLin2
0.8
Wilcoxon
DESeq2
0.6 edgeR

Dense Signal
metagenomeSeq

0.4

0.2

0.0
2 4 6 2 4 6
Signal Strength

B n = 20 n = 30
0.6
Sparse Signal

0.4

0.2 Method
True Positive Rate

LinDA
ANCOM−BC
ALDEx2
0.0
metagenomeSeq2
0.6 MaAsLin2
Wilcoxon
DESeq2
edgeR
Dense Signal

0.4
metagenomeSeq

0.2

0.0
2 4 6 2 4 6
Signal Strength
29

Fig. S27: Full performance comparison (S5C0: smaller n, a binary covariate). Empirical
false discovery rate (A) and true positive rates (B) were averaged over 100 simulation runs.
The dashed horizontal line (A) indicates the target FDR level of 0.05.
A n = 50 n = 200
1.00

0.75

Sparse Signal
Empirical False Discovery Rate

0.50

Method
0.25 LinDA
ANCOM−BC
ALDEx2
0.00
metagenomeSeq2
1.00 MaAsLin2
Wilcoxon
DESeq2
0.75
edgeR

Dense Signal
metagenomeSeq
0.50

0.25

0.00
2 4 6 2 4 6
Signal Strength

B n = 50 n = 200
1.00

0.75
Sparse Signal

0.50
Method
True Positive Rate

0.25 LinDA
ANCOM−BC
ALDEx2
0.00 metagenomeSeq2
1.00 MaAsLin2
Wilcoxon
DESeq2
0.75 edgeR
Dense Signal

metagenomeSeq

0.50

0.25

0.00
2 4 6 2 4 6
Signal Strength
30

Fig. S28: Full performance comparison (S6C0: 10-fold difference in library size, a binary
covariate). Empirical false discovery rate (A) and true positive rates (B) were averaged
over 100 simulation runs. The dashed horizontal line (A) indicates the target FDR level of
0.05.
A n = 50 n = 200

0.8

0.6

Sparse Signal
Empirical False Discovery Rate

0.4
Method
0.2 LinDA
ANCOM−BC
ALDEx2
0.0
metagenomeSeq2
MaAsLin2
0.8 Wilcoxon
DESeq2
0.6 edgeR

Dense Signal
metagenomeSeq

0.4

0.2

0.0
2 4 6 2 4 6
Signal Strength

B n = 50 n = 200

0.75
Sparse Signal

0.50

Method
True Positive Rate

0.25 LinDA
ANCOM−BC
ALDEx2
0.00 metagenomeSeq2
MaAsLin2
Wilcoxon
0.75
DESeq2
edgeR
Dense Signal

metagenomeSeq
0.50

0.25

0.00
2 4 6 2 4 6
Signal Strength
31

Fig. S29: Full performance comparison (S7C0: negative binomial abundance distribution,
a binary covariate). Empirical false discovery rate (A) and true positive rates (B) were
averaged over 100 simulation runs. The dashed horizontal line (A) indicates the target
FDR level of 0.05.
A n = 50 n = 200

0.75

Sparse Signal
Empirical False Discovery Rate

0.50

Method
0.25
LinDA
ANCOM−BC
ALDEx2
0.00
metagenomeSeq2
MaAsLin2
Wilcoxon
0.75 DESeq2
edgeR

Dense Signal
metagenomeSeq
0.50

0.25

0.00
2 4 6 2 4 6
Signal Strength

B n = 50 n = 200
0.5

0.4
Sparse Signal

0.3

0.2
Method
True Positive Rate

0.1 LinDA
ANCOM−BC
ALDEx2
0.0
metagenomeSeq2
0.5 MaAsLin2
Wilcoxon
0.4 DESeq2
edgeR
Dense Signal

0.3 metagenomeSeq

0.2

0.1

0.0
2 4 6 2 4 6
Signal Strength
32

Fig. S30: Full performance comparison (S0C0 with strong compositional effects). Empirical
false discovery rate (A) and true positive rates (B) were averaged over 100 simulation runs.
The dashed horizontal line (A) indicates the target FDR level of 0.05.

You might also like