You are on page 1of 11

Journal of Clinical Epidemiology 146 (2022) 86–96

REVIEW
Diagnostic test accuracy network meta-analysis methods: A scoping
review and empirical assessment
Areti Angeliki Veroniki a,b,1,∗, Sofia Tsokani c,1, Ridhi Agarwal d, Eirini Pagkalidou e,
Gerta Rücker f, Dimitris Mavridis c,g, Yemisi Takwoingi d,h
a Knowledge Translation Program, Li Ka Shing Knowledge Institute, St. Michael’s Hospital, 209 Victoria Street, East Building, Toronto, Ontario M5B
1T8, Canada
b Institute of Health Policy Management and Evaluation, University of Toronto, Toronto, Ontario, Canada
c Department of Primary Education, School of Education, University of Ioannina, Ioannina, Greece
d Test Evaluation Research Group, Institute of Applied Health Research, University of Birmingham, Birmingham, UK
e Department of Hygiene, Social-Preventive Medicine and Medical Statistics, Medical School, Aristotle University of Thessaloniki, Thessaloniki, Greece
f Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center – University of Freiburg, Stefan-Meier-Strasse 26, 79104

Freiburg, Germany
g Paris Descartes University, Sorbonne Paris Cité, Faculté de Médecine, Paris, France
h NIHR Birmingham Biomedical Research Centre, University Hospitals Birmingham NHS Foundation Trust and University of Birmingham, Birmingham,

UK
Accepted 9 February 2022; Available online 16 February 2022

Abstract
Objectives: To (a) identify methodological and application papers reporting a model developed specifically for diagnostic test
accuracy network meta-analysis (DTA-NMA) or a hierarchical meta-regression method for comparing at least three index tests; (b)
review and summarize the characteristics of the methods and the application papers; and (c) compare DTA-NMA and hierarchical
meta-regression methods empirically.
Study Design and Setting: We performed a scoping review and searched major databases until March 3rd, 2021. We assessed the
characteristics of the identified methods, conducted a descriptive analysis of characteristics of the application articles, and applied the
DTA-NMA and meta-regression methods to the available data.
Results: We included 49 articles (plus one companion report), of which nine were methodological (describing 11 DTA-NMA
methods) and 40 were application papers (data available for 32 DTA-NMAs). Our results showed a steep increase in recent years in
DTA-NMA publications. DTA-NMA models may lead to different results. Although sensitivity estimates were comparable between
meta-regression and DTA-NMA models, specificity estimates were higher in meta-regression.
Conclusion: The choice of a DTA-NMA model will depend on the available data, including the use of different thresholds for test
positivity, different study designs, and software familiarity. Selection between the methods may impact on the NMA results, especially
for specificity. © 2022 Elsevier Inc. All rights reserved.

Keywords: Network meta-analysis; Diagnostic test; Accuracy; Indirect comparison; Sensitivity; Specificity

Declaration of interests: The authors declare that they have no competing interests.
Contributors: AAV, DM, and YT conceived and designed the study. AAV coordinated the review, screened citations, abstracted data, conducted
analysis, interpreted results, and wrote a draft manuscript. ST screened citations, abstracted data, conducted analysis, interpreted results, and edited
the manuscript. RA screened citations, abstracted data, and edited the manuscript. EP screened citations, and edited the manuscript. GR, DM, and YT
provided input into the design, interpreted results, and edited the manuscript. All authors read and approved the final manuscript.
Funding: This review received no funding. ST and DM were funded from the European Union’s Horizon 2020 [No. 754936]. GR was funded by the
German Research Foundation (DFG), grant number RU 1747/1-2. YT is funded by a UK National Institute for Health Research (NIHR) Postdoctoral
Fellowship, and is supported by the NIHR Birmingham Biomedical Research Centre. The views expressed are those of the authors and not necessarily
those of the NHS, NIHR, or the Department of Health and Social Care.
1 joint first authors: Areti Angeliki Veroniki, MSc, PhD Sofia Tsokani, MSc.
∗ Corresponding author at: Areti Angeliki Veroniki, MSc, PhD, Li Ka Shing Knowledge Institute, St. Michael’s Hospital, Unity Health Toronto, 209

Victoria Street, East Building, Toronto, Ontario, M5B 1T8, Tel.: +1 416-864-6060, Fax: +1 416-864-5805.
E-mail address: areti-angeliki.veroniki@unityhealth.to (A.A. Veroniki).

https://doi.org/10.1016/j.jclinepi.2022.02.001
0895-4356/© 2022 Elsevier Inc. All rights reserved.
A.A. Veroniki et al. / Journal of Clinical Epidemiology 146 (2022) 86–96 87

tions, NMA can be performed using several approaches,


What is new? including meta-regression when there are no multi-arm tri-
als in the network [14]. The extension of NMA to com-
• Diagnostic test accuracy (DTA) studies can include
parisons of test accuracy (DTA-NMA) is less established.
different subsets of index tests and the reference
DTA-NMA simultaneously models the accuracy of at least
standard used may be imperfect. A DTA network
three index tests and (1) results in diagnostic accuracy es-
meta-analysis (NMA) borrows strength from indi-
timates with increased precision [15]; (2) allows drawing
rect evidence, which can improve statistical accu-
inference on the difference in accuracy between tests that
racy and reduce bias when comparing test accuracy.
previously have not been compared head-to-head; and (3)
• To date, 11 models developed specifically for DTA-
ranks the performance of multiple tests according to their
NMA and two hierarchical meta-regression meth-
diagnostic accuracy, at varying test thresholds, if applica-
ods have been suggested for the comparative meta-
ble, using all available evidence [16].
analysis of at least three index tests in a single
NMA methods for comparing interventions cannot be
model. All DTA-NMA models but one were de-
applied directly to DTA studies for three main reasons:
veloped in a Bayesian setting. A potential advan-
(1) DTA studies primarily estimate two quantities (eg sen-
tage over the popular meta-regression approach is
sitivity and specificity) in contrast to intervention stud-
that several DTA-NMA models account for within-
ies that estimate a single effect size; (2) a comparative
study correlation between tests when the primary
DTA meta-analysis may include studies of any design (ie,
studies included in the meta-analysis use a within-
single-test, paired-test and multiple-test studies of three or
participant design (participants receive all tests) to
more index tests); and (3) comparative DTA studies fre-
compare test accuracy.
quently evaluate the index tests in the same study par-
• Comparative meta-analysis using a hierarchical
ticipants (ie within-participant or paired designs), whereas
meta-regression model tended to be published in
intervention studies often compare independent participant
journals with a higher impact factor, which is
groups in a study. Comparative accuracy studies may use
mainly driven by the dominance of Cochrane DTA
a randomized design to produce independent groups but
reviews in the dataset. None of the DTA-NMA
such designs are rare [17]. Ideally, when comparing mul-
models were applied in the Cochrane DTA reviews.
tiple tests, if studies used a within-participant design, the
• On average, different DTA-NMA methods gave
joint classification tables between pairs (ie 2 × 4 table) or
similar results in sensitivity, but the bivariate meta-
all index tests are required for each study [18]. Such ta-
regression approach provided higher estimates of
bles provide information on discordant and concordant in-
specificity. The Nyaga beta-binomial model es-
dex test results for those with and those without the target
timated on average lower between-study hetero-
condition.
geneities for sensitivity and specificity compared to
In this paper our focus was three-fold. First, through a
the other DTA-NMA models.
scoping review we aimed to identify methodological and
• Selection among the DTA-NMA models depends
application papers of models developed for DTA-NMA and
on the data to be analysed, such as the inclusion of
hierarchical meta-regression methods for comparing the ac-
different test thresholds, reference standards, and
curacy of at least three index tests. Second, we aimed to
study designs, as well as software familiarity.
review and summarize the characteristics of the DTA-NMA
and regression methods and application papers including,
the network, methods and reporting characteristics. Third,
1. Introduction where possible, we compared DTA-NMA and bivariate
meta-regression methods empirically using available data
Most diagnostic test accuracy (DTA) meta-analyses as- from the application papers.
sess the accuracy of one index test at a time, but in prac-
tice, there may be many tests available for a target con-
2. Methods
dition. For decision making, interest is in the comparative
accuracy of the index tests, that is how does the accuracy We performed a scoping review based on the Joanna
of a new test compare to that of existing test(s) or current Briggs Institute methods manual [19], and followed the
practice. Several comparative meta-analysis methods have Preferred Items for Systematic Reviews and Meta-analysis
been introduced in recent years to compare the accuracy (PRISMA) extension for scoping reviews [20]. Our meth-
of multiple index tests [1–11]. ods are detailed in our protocol [11]. Deviations from the
Network meta-analysis (NMA), a statistical approach protocol are described in Appendix A. We briefly present
for combining direct and indirect evidence across a net- our methods below, but more details can be found in Ap-
work of studies, has been used extensively to assess the pendix B.
comparative efficacy or safety of multiple interventions to We searched PubMed, Web of Science, and Scopus until
inform health care decision-making [12,13]. For interven- March 3rd, 2021 (Appendix C), and included published or
88 A.A. Veroniki et al. / Journal of Clinical Epidemiology 146 (2022) 86–96

Fig. 1. Flowchart for article selection.


Abbreviations: DTA, diagnostic test accuracy; NMA, network meta-analysis; MA, meta-analysis.

unpublished methodological and application papers report- teristics of test comparisons performed using DTA-NMA
ing a DTA-NMA or hierarchical meta-regression method models with those performed using the standard HSROC
for comparing the accuracy of at least three index tests in or bivariate meta-regression model [21],[22]. We per-
a single model. The commonly used bivariate and hierar- formed an empirical assessment of DTA-NMA methods
chical summary receiver operating characteristic (HSROC) using test comparisons with available data. We also ap-
meta-regression approaches which involve adding test type plied the bivariate meta-regression model with test-type
as a categorical variable was used as the control method as a covariate [21]. We used boxplots and gosh plots to
[21,22]. We used the abstrackr tool for the screening pro- compare estimation of sensitivity, specificity, their corre-
cess [23], and extracted data regarding the general charac- sponding uncertainty, and the variance components, includ-
teristics of the articles, that is, first author name and coun- ing between-study and between-test heterogeneity, across
try, year of publication, journal, journal’s impact factor, models.
funding source, and discipline according to its content. We
categorised articles as a methodological or an application 3. Results
paper. For each methodological paper, we also extracted
the DTA-NMA method, setting applied, type of model, We included 49 articles plus one companion report
type of data required, properties of the method, and soft- -nine methodological (describing 11 different DTA-NMA
ware and/or available code. For each application paper, we methods) and 40 application papers of DTA-NMAs or
also extracted existence of a protocol, target condition, ref- meta-regression in this review (Fig. 1; Appendix D).
erence standard and type of index tests, accuracy measure For simplicity we refer to test comparisons performed
used, approaches to reporting summary estimates, reporting using DTA-NMA models or meta-regression approaches
guideline used, design of primary studies in the DTA-NMA as DTA-NMAs henceforth.
or meta-regression, and tool for assessing methodological
quality. Also, we extracted the data to replicate the DTA- 3.1. Description of the DTA-NMA and hierarchical
NMA, where available. meta-regression methods
We conducted a descriptive analysis of the character-
istics of the included papers, and compared the charac- Table 1 summarizes the key properties of the 11 DTA-
NMA methods (from nine methodological articles plus one
Table 1. Properties of 13 diagnostic test accuracy (DTA) comparative meta-analysis methods
Format of data Arm-based Can model Can model Type of Bayesian Accounts for Models more Software
tables requireda model imperfect multiple studies that setting correlation than two index
reference thresholds can be between tests tests
standards modelled
Bivariate 2×2 Yes No No Any No No Yes R (CopulaDTA [24],lme4 [25],
meta-regression [21] mada [26],meta4diag
[27],Metatron [28],Mvmeta [29]),
Stata (meqrlogit [30])

A.A. Veroniki et al. / Journal of Clinical Epidemiology 146 (2022) 86–96


HSROC 2×2 Yes No Yesd Any No No Yes OpenBUGS/ WinBUGS
meta-regression [22] [31]R(NMADiagT [32])
Trikalinos 2014 [5] Joint Yes No No Single- / Yes Yes Nob R (rjags [33])
classification Paired-test
Menten-Lesaffre 2×2 No Yesc No Paired- / Yes No Yes OpenBUGS/ WinBUGS [31]
2015 [4] Multiple-test
Dimou 2016 [3] Joint Yes No No Single- / No Yes Nob Stata (mvmeta [34])
classification Paired-test
Cheng 2016 [Model Joint Yes No No Any Yes No Yes R (R2jags [35])
A] [8] classification
Cheng 2016 [Model Joint Yes No Yesd Any Yes No Yes R (R2jags [35])
B] [8] classification
Cheng 2016 [Model Joint Yes No No Any Yes Yes Yes R (R2jags [35])
C] [8] classification
Nyaga (ANOVA) 2×2 Yes No No Any Yes Yes Yes Stan (rstan [36],[37] in R)
2018 [2]
Nyaga 2×2 Yes No No Any Yes Yes Yes Stan (rstan [36],[37] in R)
(beta-binomial) 2018
[38]
Ma 2018 [9]e Joint Yes Yes No Any Yes Yes Yes OpenBUGS/ WinBUGS [31], R
classification (NMADiagT [45])
Owen 2018 [39] 2×2 Yes No Yes Any Yes Yes Yes OpenBUGS/ WinBUGS [31]
Lian 2019 [40] Joint Yes Yes Yesd Any Yes Yes Yes Stan (rstan [36],[37] in R), R
classification (NMADiagT [45])
Abbreviations: DOR, diagnostic odds ratio; SE, standard error; DTA, diagnostic test accuracy; NMA, network meta-analysis; HSROC, hierarchical summary receiver operating characteristic.
a 2 × 2 data includes the number of true positives, true negatives, false positives and false negatives.
b The model can be extended to evaluate more than two index tests.
c Only two of the models suggested account for imperfect reference standards.
d Accounts for the threshold effect, but does not provide different estimates at the various thresholds.
e The method was also presented in a PhD thesis, which we considered as a companion report [41].

89
90 A.A. Veroniki et al. / Journal of Clinical Epidemiology 146 (2022) 86–96

companion report) identified by the search and included, 3.2.1. General characteristics
as well as two hierarchical meta-regression methods. See Table 2 presents a description of the general and epi-
Appendix D for the seven methods papers excluded along demiological characteristics of the 40 application networks
with reasons for exclusion. Common properties across all (see also Appendix H for coding target conditions with
models include that they ‘borrow strength’ across stud- International Classification of Diseases -11). Most of the
ies by simultaneously analysing multiple comparative DTA included DTA-NMAs were published in general medi-
studies, and account for between-study correlations be- cal journals (33, 83%), of which the most common was
tween sensitivity and specificity induced through thresh- the Cochrane Database of Systematic Reviews (27, 82%).
old effects. Below we present a short description of the Nearly four in five DTA-NMAs were publicly funded.
approaches and their properties, but more details can be Single-test studies were included in 97% of the DTA-
found in Appendix E. NMAs that used a meta-regression model and in 45% of
The simplest and most commonly used approach to the DTA-NMAs with a DTA-NMA model. Most of the
model multiple tests within a single analysis is the meta- first authors of the DTA-NMA papers (16, 40%) had an
regression model with test type used as a covariate [21,22]. affiliation in the UK, followed by China (9 DTA-NMAs,
In 2014, Trikalinos et al [5]. suggested an extension of the 23%; Appendix I). Overall, there is a steep increase in
bivariate meta-analysis to model studies comparing two in- recent years in DTA-NMA publications, where the ma-
dex tests on the same participants accounting for within- jority perform a bivariate/HSROC meta-regression model
study correlation between tests. The model can be extended (Fig. 2).
for the comparison of more than two tests. In 2015, Menten
and Lesaffre [4] introduced a contrast-based model allow-
ing for the comparison of the accuracy of two diagnostic
tests using both direct and indirect comparisons through 3.2.2. Network geometry
a common index test and using paired- and/or multiple- The graphical representation of a network of primary
test studies. Both perfect and imperfect reference standards DTA studies differs from the usual network plots encoun-
can be included in this DTA-NMA. In 2016, Dimou et al. tered in a NMA of interventions, which cannot involve sin-
[3] proposed a DTA-NMA, in which logit sensitivity and gle arm or studies disconnected from the network. This is
logit specificity could be modelled in a frequentist setting. mainly due to the different types of studies (eg, single-test,
The same year, Cheng [8] proposed three DTA-NMA mod- paired-test, and multiple-test studies) that can be included
els: the first was an extension of the bivariate model, the in a DTA-NMA.
second was a multivariate extension of the HSROC model, Figure 3 presents an example of a full-shaped network
and the third used the beta-binomial marginal distributions diagram of seven primary DTA studies comparing four in-
to model the observed number of individuals with true and dex tests. In particular, the network includes two single-test
false positive rates of a test. In 2018, Nyaga et al. sug- (ie, assessing C and D tests separately), four paired-test
gested two DTA-NMA models. The first model was based (ie, comparing A vs. D, B vs. C, B vs. D, and C vs. D
on repeated measurements, where each individual is be- tests), and one triple-test (ie, comparing A vs. B vs. C
ing measured across multiple tests within each study, and tests) studies. The reference standard is not depicted in
hence can be treated as a repeated measurement [2]. The the network as a separate node, since its diagnostic perfor-
second model was based on a bivariate beta distribution mance is not being assessed; the reference standard merely
and copula densities to describe the dependency between verifies the presence/absence of the target condition. The
sensitivity and specificity [1]. In the same year, Ma et al. network plots of the 32 DTA-NMAs with available data
[9] published a DTA-NMA to model studies of any design (ie the 2 × 2 table) are presented in Appendix J. Of the
with or without a reference standard, and Owen et al. pro- 32 networks, where a network plot could be drawn, nine
posed a DTA-NMA to incorporate different tests at multi- (28%) networks had disconnected test comparisons.
ple thresholds per study [8]. In 2019, Lian et al. suggested The median number of studies per network was 32 (IQR
an extension of the single-test HSROC approach to com- 17 to 58), and the median number of tests was five (IQR
pare multiple tests and incorporate studies of any design 3 to 7). Most DTA-NMAs were full-shaped (27, 68%),
with or without a reference standard [40]. whereas a quarter of the networks did not contain a loop
of tests (Table 2). Networks are categorized as open-loop
or closed-loop shaped networks, where a loop refers to a
cycle formed by studies directly comparing the underlying
index tests [10,16]. Multiple-test studies were included in
3.2. DTA network database: application DTA-NMAs
26 DTA-NMAs (65%). The median number of test com-
Below we describe the general, network geometry, and parisons in a network was five (IQR 3 to 14), and the
methodological characteristics of the 40 DTA-NMAs, sep- median number of participants in a network was 10,166
arately. In Appendices F and G we also describe the DTA- (IQR 1,810 to 18,172). Eight DTA-NMAs (20%) included
NMA reporting characteristics. studies with multiple thresholds in their network.
A.A. Veroniki et al. / Journal of Clinical Epidemiology 146 (2022) 86–96 91

Table 2. Epidemiological and descriptive statistics of the application network meta-analyses


Characteristic Bivariate/HSROC DTA-NMA modelb Total
meta-regression model
Target condition (ICD-11 Codinga )
Total papers 29 (73%) 11 (27%) 40 (100%)
Certain infectious or parasitic diseases (01) 3 (9%) 0 (0%) 3 (8%)
Developmental anomalies (20) 6 (21%) 0 (0%) 6 (14%)
Diseases of the circulatory system (11) 2 (7%) 0 (0%) 2 (5%)
Diseases of the digestive system (13) 4 (14%) 1 (9%) 5 (12%)
Diseases of the genitourinary system (16) 1 (3%) 1 (9%) 2 (5%)
Diseases of the musculoskeletal system or connective tissue (15) 1 (3%) 1 (9%) 2 (5%)
Diseases of the nervous system (08) 1 (3%) 1 (9%) 2 (5%)
Diseases of the respiratory system (12) 1 (3%) 0 (0%) 1 (2%)
Diseases of the visual system (09) 1 (3%) 0 (0%) 1 (2%)
Infectious Agents (X) 2 (7%) 2 (19%) 4 (10%)
Injury, poisoning or certain other consequences of external causes (22) 1 (3%) 0 (0%) 1 (2%)
Mental, behavioural or neurodevelopmental disorders (06) 2 (7%) 0 (0%) 2 (5%)
Neoplasms (02) 2 (7%) 5 (45%) 7 (18%)
Pregnancy, childbirth or the puerperium (18) 2 (7%) 0 (0%) 2 (5%)
Symptoms, signs or clinical findings, not elsewhere classified (21) 1 (3%) 0 (0%) 1 (2%)
Type of index tests
Imaging (no, %) 13 (45%) 5 (45%) 18 (45%)
Physical examination (no, %) 2 (7%) 0 (0%) 2 (5%)
Clinical examination (no, %) 11 (38%) 6 (55%) 17 (43%)
Imaging, Clinical examination 2 (7%) 0 (0%) 2 (5%)
Imaging, Physical examination 1 (3%) 0 (0%) 1 (2%)
Journal of publication
General Cochrane medical journals (no, %) 27 (94%) 0 (0%) 27 (69%)
General Non-Cochrane medical journals 1 (3%) 5 (45%) 6 (14%)
Specialist medical journals (no, %) 1 (3%) 6 (55%) 7 (17%)
Funding sourcec
Publicly sponsored (no, %) 25 (86%) 6 (55%) 31 (78%)
Funding not reported (no, %) 2 (7%) 3 (27%) 5 (12%)
Non-sponsored (no, %) 2 (7%) 2 (18%) 4 (10%)
Network characteristics
No. studies in a network (median, IQR) 34 [15–63] 31 [18–40] 32 [17–58]
No. index tests in a network (median, IQR)d 4 [3–7] 6 [5–7] 5 [3–7]
No. total participants in a network (median, IQR)e 10,166 [1,891–47,477] 6,809 [1,922–14,982] 10,166
[1,810–18,172]
No. participants with target condition in a network (median, IQR)f 1,760 [689–4,115] 1,941 [1,784–3,736] 1,777 [732–4,469]
No. loops in a network
Unclear/Not reported 0 (0%) 3 (27%) 3 (7%)
0 loops 7 (24%) 2 (19%) 9 (22%)
1 loop 8 (28%) 3 (27%) 11 (28%)
2 loops 2 (7%) 0 (0%) 2 (5%)
3 loops 1 (3%) 0 (0%) 1 (3%)
>3 loops 11 (38%) 3 (27%) 14 (35%)
No. test comparisons in a network (median, IQR)g 5 [3–14] 4 [3–12] 5 [3–14]
Max index tests in a study (median, IQR) 3 [2–5] 3 [2–4] 3 [2–5]
(continued on next page)
92 A.A. Veroniki et al. / Journal of Clinical Epidemiology 146 (2022) 86–96

Table 2 (continued)

Characteristic Bivariate/HSROC DTA-NMA modelb Total


meta-regression model
Shape of network
Full-shaped (no, %) 21 (72%) 6 (55%) 27 (68%)
Open-shaped (no, %) 8 (28%) 2 (18%) 10 (25%)
Unclear/ Not reported (no, %) 0 (0%) 3 (27%) 3 (7%)
Inclusion of different test thresholds
Yes (no, %) 7 (24%) 1 (9%) 8 (20%)
No (no, %) 22 (76%) 10 (91%) 32 (80%)
Types of studies in the network
Single-test studies (no, %) 28 (97%) 5 (45%) 33 (83%)
Paired-test studies (no, %) 26 (90%) 7 (64%) 33 (83%)
Multiple-test studies (no, %) 20 (69%) 6 (55%) 26 (65%)
Unclear/ Not reported 0 (0%) 4 (36%) 4 (10%)
Abbreviations: DTA, diagnostic test accuracy; NMA, network meta-analysis; HSROC, hierarchical summary receiver operating characteristic.
a Target conditions were classified according to ICD-11 [International Classification of Diseases - 11] for Mortality and Morbidity Statistics

(Version: 05/2021) (https://icd.who.int/browse11/l-m/en, see target conditions of each category in Appendix H).
b DTA-NMA models also include NMA of interventions methodology applied to DTA data. In such a case, review authors compared the diagnostic

accuracy of multiple tests through the combination of direct and indirect evidence for each test comparison in a single NMA model.
c Funding source was categorized according to both the systematic review and author funding.
d Reported in 29 DTA-NMAs using a meta-regression model and in 11 DTA-NMAs using a DTA-NMA model.
e Reported in 29 DTA-NMA using a meta-regression model and in 10 DTA-NMAss using a DTA-NMA model.
f Reported in 29 DTA-NMAs using a meta-regression model and in 3 DTA-NMAsusing a DTA-NMA model.
g Reported in 36 DTA-NMAs.

Fig. 2. Distribution of DTA-NMAs per year of publication and article type (Application DTA-NMA, Application bivariate/HSROC meta-regression,
Methodological article).
Abbreviations: DTA, diagnostic test accuracy; NMA, network meta-analysis; HSROC, hierarchical summary receiver operating characteristic.
A.A. Veroniki et al. / Journal of Clinical Epidemiology 146 (2022) 86–96 93

Fig. 3. Network plot example of four tests and seven primary studies. Each vertex represents a different test, and each edge corresponds to at
least one study comparing two tests. Tests in closed circles represent single-test studies. Dashed lines represent paired-test studies, whereas solid
lines represent triple-test studies.

3.2.3. Methodological characteristics DTA-NMAs, 32 provided data for analysis of multiple tests
Thirty-one DTA-NMAs (78%) were performed in a (ie 2 × 2 table for each study in the network; see Ap-
frequentist setting, 33 (83%) used an arm-based ap- pendices C and J for a list of networks and their shape,
proach, two (5%) accounted for correlation between tests respectively). We were unable to apply the seven methods
from the same participants (Appendix K). In particular, that require joint classification tables because only 2 × 2
29 (73%) DTA-NMAs used a bivariate/HSROC meta- tables were available for the 32 networks. We applied the
regression model, five (12%) used a NMA model de- Nyaga ANOVA [2], Nyaga beta-binomial [1], and bivari-
veloped for comparing interventions, one (2%) used the ate meta-regression [21,22] models to the 32 networks, but
Menten-Lesaffre approach [4], two (5%) used the Nyaga- the Menten-Lesaffre [4] model to 29 networks, because
ANOVA model [1], and the NMA approach in three (8%) one network was informed by single-test studies only, and
papers was unclear. two networks were informed by one multi-test study after
The type of data used from each primary study was the exclusion of single-test studies. Eight of the included
the 2 × 2 table in 34 (85%) DTA-NMAs. Thirty-nine networks were informed by studies that reported data at
(98%) DTA-NMAs assessed the methodological quality multiple thresholds per study, and we applied the Owen
of the included studies. Non-Cochrane NMAs using method using all available thresholds from a study [42].
the QUADAS or QUADAS-2 tool were not necessarily When fitting the Menten-Lesaffre, Nyaga ANOVA, Nyaga
published in higher impact factor journals (Appendix L). beta-binomial, and meta-regression models, if a study re-
DTA-NMAs using a meta-regression model tended to be ported multiple thresholds, we selected the threshold with
published in journals with a higher impact factor, which is the highest frequency in the entire dataset. In total, we in-
mainly driven by the dominance of Cochrane DTA reviews cluded 196 tests informing estimation of sensitivity, speci-
in the dataset (impact factor for Cochrane Database of ficity, between-test and between-study heterogeneity, and
Systematic Reviews in 2019: 7.89; Appendix M). None of test ranking (see Appendix F, Appendices N–S).
the DTA-NMA models have been applied in the included
Cochrane DTA reviews.
3.3.1. Summary sensitivity and specificity
The sensitivity estimates obtained from the Menten-
Lesaffre model were on average in agreement with the
3.3. Empirical assessment of the DTA-NMA methods
meta-regression model (Fig. 4; Appendix N), but speci-
Of the 11 DTA-NMA methods, four were DTA-NMA ficities tended to be smaller: mean ratio of sensitivities
approaches modelling the 2 × 2 table for each index test 0.97 (95% CI: 0.94–1.00) and mean ratio of specificities
and assessing at least three tests [1,2,4,42], while the re- 0.96 (95% CI: 0.94–0.98). Overall, the Nyaga ANOVA
maining seven models required joint classification tables. model had good agreement with the sensitivity estimates
The four models (Menten-Lesaffre [4], Nyaga ANOVA [2], obtained from the meta-regression model, but the latter
Nyaga beta-binomial [1] and Owen [42]) are Bayesian hi- estimated higher specificity estimates: mean ratio of sensi-
erarchical DTA-NMA approaches. Of the 40 application tivities 1.04 (95% CI: 0.98–1.11) and mean ratio of speci-
94 A.A. Veroniki et al. / Journal of Clinical Epidemiology 146 (2022) 86–96

Fig. 4. Boxplots of sensitivities and specificities and standard errors of logit sensitivity and specificity as estimated in the meta-regression and the
DTA-NMA models.
∗ The Nyaga ANOVA [2], Nyaga beta-binomial [1], and bivariate meta-regression [21,22] models were applied to 32 networks, while the Menten-

Lesaffre [4] model was applied to 29 networks; this is because one network was informed by single-test studies only, and two networks were
informed by one multi-test study after the exclusion of single-test studies.
Abbreviations: DTA, diagnostic test accuracy; NMA, network meta-analysis.

ficities 0.95 (95% CI: 0.94–0.96). On average, we ob- code is rarely reported in the publications, and the models
served good agreement between the Nyaga beta-binomial are yet to be implemented in popular statistical software or
model and meta-regression on sensitivity. However, the user friendly programs, which adds an extra barrier to their
meta-regression model consistently estimated higher speci- application. Some models require joint classification tables
ficities. The average agreement between these two models which are often not reported in DTA studies, and would
was: mean ratio of sensitivities 1.02 (95% CI: 0.97–1.07) require individual participant data to derive the relevant in-
and mean ratio of specificities 0.97 (95% CI: 0.96–0.97). formation. Also, complexity and convergence issues may
The Owen model suggested that test results may differ de- occur in multivariate approaches with the increase in the
pending on the threshold used when compared with the number of tests. This is because the number of parame-
Nyaga ANOVA model (Appendix O). ters to estimate in the model increases as the number of
tests in the network increases, and this becomes even more
challenging when there is a small number of studies in the
4. Discussion network.
In this scoping review we found 11 models developed To our knowledge this is the first comprehensive review
specifically for DTA-NMA and two hierarchical meta- summarizing the characteristics of all available DTA-NMA
regression approaches for the comparison of three or more methods and assessing their performance empirically in 32
index tests. The methods have different properties and lim- published networks. We followed the Joanna Briggs Insti-
itations. All but one model were developed in a Bayesian tute methods manual for conducting our scoping review
setting. The majority of the methods require joint classi- [19] and the PRISMA extension for scoping reviews for
fication tables of the tests per study and account for cor- reporting [20]. Overall, our results showed that there is a
relation between tests. One method was a contrast-based steep increase in recent years in publications of compara-
approach allowing for modelling paired- and/or multiple- tive meta-analyses with three or more index tests, where
test studies, but not single-test studies, which may lead to the majority perform a meta-regression. However, a lim-
a reduced and less informed evidence network [4]. Our re- itation of meta-regression is that it ignores the within-
view shows that there is no uniformly best approach, and study correlation between tests assuming that observations
selection among the DTA-NMA models depends on the between tests assessed within the same participants in a
data to be analysed, such as the inclusion of different test study are independent. This assumption is unlikely to hold
thresholds, different study designs, and software familiar- when studies using a within-participant design dominate a
ity. meta-analysis thus leading to incorrect estimation of stan-
Despite DTA-NMA models being an important contri- dard error of the summary sensitivities and specificities and
bution to the field of diagnostic tests, there are several limi- between-study variances. Nevertheless the meta-regression
tations that may compromise their wide implementation. A approach is conservative and is accessible to many review
key limitation is model complexity and most of the meth- authors. We previously assessed DTA-NMA methods us-
ods are performed within a Bayesian setting. The statistical ing a case study for the comparative accuracy of tests for
A.A. Veroniki et al. / Journal of Clinical Epidemiology 146 (2022) 86–96 95

cervical cancer [43]. Our current findings are not limited Acknowledgments
to a specific disease area and showed that meta-regression
provided on average higher estimates of specificity com- We thank Dr. Stella Zevgiti for helping screen studies
for inclusion. We also thank Drs. Tom Trikalinos, Joris
pared to the DTA-NMA models. Also, the Nyaga beta-
binomial model [1] estimated lower between-study hetero- Menten, Victoria N. Nyaga, Alex Sutton, Enzo Cerullo,
Wei Cheng, and Junhai Jia for providing data and method
geneities for sensitivity and specificity, which is probably
clarifications through email.
an indication that this model fits the data better. The Owen
model [42] showed that test results may differ depending
on the threshold. Overall, our findings show that selection Supplementary materials
between the methods may impact on the NMA results, es-
pecially for specificity. Supplementary material associated with this article can
Our review has a few limitations worth noting. First, be found, in the online version, at doi:10.1016/j.jclinepi.
although we conducted a very sensitive literature search to 2022.02.001.
capture all relevant articles, we may not have retrieved all
DTA-NMA methods and application papers, as some pa- References
pers may not have been indexed using the search terms we
included in our search strategy. We have previously con- [1] Nyaga VN, Arbyn M, Aerts M. Beta-binomial analysis of variance
model for network meta-analysis of diagnostic test accuracy data.
ducted comprehensive methodology reviews of compara-
Stat Methods Med Res 2018;27:2554–66.
tive accuracy meta-analyses and are active in this field of [2] Nyaga VN, Aerts M, Arbyn M. ANOVA model for network meta–
research [15,44]. Therefore, we believe the risk of missing analysis of diagnostic test accuracy data. Stat Methods Med Res
relevant methodology papers is small. Second, we were 2018;27:1766–84.
able to assess only five approaches that use the 2 × 2 ta- [3] Dimou NL, Adam M, Bagos PG. A multivariate method for
ble per test per study. Joint classification tables of index meta-analysis and comparison of diagnostic tests. Stat Med. 2016
Sep 10;35(20):3509-23. doi: 10.1002/sim.6919. Epub 2016 Mar 4.
tests are rarely reported in DTA studies, hence it was not PMID: 26940666.
feasible to empirically assess methods requiring such data [4] Menten J, Lesaffre E. A general framework for comparative
in a large number of networks. Simulations are needed Bayesian meta-analysis of diagnostic studies. BMC Med Res
to assess the performance of the DTA-NMA methods to Method 2015;15:70.
[5] Trikalinos TA, Hoaglin DC, Small KM, Terrin N, Schmid CH. Meth-
identify which methods perform best and under which cir-
ods for the joint meta-analysis of multiple tests. Res Synth Methods
cumstances. Third, our sample of application papers that 2014;5:294–312.
performed meta-regression to compare multiple index tests [6] Hoyer A, Kuss O. Meta-analysis for the comparison of two diag-
was dominated by Cochrane reviews. This may indicate nostic tests to a common gold standard: a generalized linear mixed
that our search missed non-Cochrane meta-regression pa- model approach. Stat Methods Med Res 2018;27:1410–21.
[7] Hoyer A, Kuss O. Meta-analysis for the comparison of two
pers. Fourth, we did not assess the validity of the con-
diagnostic tests-A new approach based on copulas. Stat Med
sistency assumption in the DTA networks because more 2018;37:739–48.
appropriate methods accounting for both sensitivity and [8] Cheng W SC, Trikalinos TA, Gatsonis CA. Network meta-analysis
specificity are necessary to assess consistency in DTA- of diagnostic accuracy studies. Brown University; 2016. Ph.D. Dis-
NMA. Fifth, we did not account for the different study sertation. doi:10.7301/Z0HX1B3W.
[9] Ma X, Lian Q, Chu H, Ibrahim JG, Chen Y. A Bayesian hierar-
designs in our analyses, since this was outside the scope
chical model for network meta-analysis of multiple diagnostic tests.
of the present review. A key consideration for future work Biostatistics 2018;19:87–102.
would be to develop methods dealing with and account- [10] Veroniki AA, Tsokani S, Rücker G, Mavridis D, Takwoingi Y. Chal-
ing for different study designs in a DTA-NMA. Sixth, we lenges in comparative meta-analysis of the accuracy of multiple di-
assumed common variances across tests, that is estimated agnostic tests. Methods Mol Biol 2022:299–316 PMID: 34550598.
doi:10.1007/978- 1- 0716- 1566- 9_18.
two between-study variances within each model, one for
[11] Veroniki AA, Tsokani S, Rücker G, Mavridis D, Takwoingi Y.
logit sensitivity and one for logit specificity. However, this Protocol for a scoping review to identify all available NMA-DTA
can be a strong assumption and different between-study models. Available at: https:// esmuoigr/ wp-content/ uploads/ 2020/ 05/
variances between tests (for sensitivity and specificity sep- DiagnosNMA_protocolpdf. 2019, Accessed January 31st, 2022.
arately) may need to be estimated when exploring the di- [12] Zarin W, Veroniki AA, Nincic V, Vafaei A, Reynen E, Motiwala SS,
agnostic accuracy of multiple tests in a DTA-NMA. et al. Characteristics and knowledge synthesis approach for 456 net-
work meta-analyses: a scoping review. BMC Med 2017;15:3.
We expect that our findings will facilitate the selection [13] Petropoulou M, Nikolakopoulou A, Veroniki AA, Rios P, Vafaei A,
of the most appropriate DTA-NMA method depending on Zarin W, et al. Bibliographic study showed improving statistical
the needs of investigators and data availability. We believe methodology of network meta-analyses published between 1999 and
that this review will help increase application of the DTA- 2015. J Clin Epidemiol 2017;82:20–8.
[14] Cumpston M, Li T, Page MJ, Chandler J, Welch VA, Higgins JP,
NMA methods through increased awareness of the avail-
Thomas J. Updated guidance for trusted systematic reviews: a new
able methods and their properties, and lead to the creation edition of the Cochrane Handbook for Systematic Reviews of Inter-
of user friendly programs to make the methods accessible. ventions. Cochrane Database Syst Rev 2019;10:ED000142 PMID:
31643080. doi:10.1002/14651858.ED000142.
96 A.A. Veroniki et al. / Journal of Clinical Epidemiology 146 (2022) 86–96

[15] Takwoingi Y, Partlett C, Riley RD, Hyde C, Deeks JJ. Methods [31] Lunn DJ, Thomas A, Best N, Spiegelhalter D. WinBUGS — a
and reporting of systematic reviews of comparative accuracy were Bayesian modelling framework: concepts, structure, and extensibil-
deficient: a methodological survey and proposed guidance. J Clin ity. Stat Comput 2000;10:325–37.
Epidemiol 2020;121:1–14. [32] Lu B, Lian Q, James S, Chen Y, Chu H. NMADiagT: network meta-
[16] Rücker G, B-Z G. Network meta-analysis of diagnostic test accuracy analysis of multiple diagnostic tests. R package version 0.1.2 2020
studies. Diagnostic meta-analysis. Cham: Springer; 2018. https:// cran.r-project.org/ web/ packages/ NMADiagT/ index.html.
[17] Takwoingi Y, Leeflang MM, Deeks JJ. Empirical evidence of the [33] Plummer M. rjags: Bayesian graphical models using MCMC. R
importance of comparative studies of diagnostic test accuracy. Ann package version 2019:4–10 Available at https://CRAN.R-project.org/
Intern Med 2013;158:544–54. package=rjags, Accessed 31 Jan 2022.
[18] Yang B, Olsen M, Vali Y, Langendam MW, Takwoingi Y, Hyde CJ, [34] White IR. Multivariate random-effects meta-regression: updates to
et al. Study designs for comparative diagnostic test accuracy: a mvmeta. Stata J 2011;11(2):255–70.
methodological review and classification scheme. J Clin Epidemiol [35] Su YS, Yajima M. R2jags: Using R to Run “JAGS” (Version 0.6-1)
2021;138:128–38. [R package]; 2020. https:// cran.r-project.org/ web/ packages/ R2jags/
[19] Peters MD, Godfrey CM, Khalil H, McInerney P, Parker D, index.html.
Soares CB. Guidance for conducting systematic scoping reviews. [36] Carpenter B, Gelman A, Hoffman MD, Lee D, Goodrich B, Betan-
Int J Evid Based Healthc 2015;13(3):141–6 PMID: 26134548. court M, et al. Stan: a probabilistic programming language. J Stat
doi:10.1097/XEB.0000000000000050. Softw 2017;76(1):1–32. doi:10.18637/jss.v076.i01.
[20] Tricco AC, Lillie E, Zarin W, O’Brien KK, Colquhoun H, Levac D, [37] Development Team Stan. RStan: the R interface to Stan. R package
et al. PRISMA extension for scoping reviews (PRISMA-ScR): version 2.21.2 2020. Available at http:// mc-stan.org/ , Accessed 31
checklist and explanation. Ann Intern Med 2018;169:467–73. Jan 2022. Accessed January 31, 2022.
[21] Reitsma JB, Glas AS, Rutjes AW, Scholten RJ, Bossuyt PM, Zwin- [38] Randall M, Egberts KJ, Samtani A, Scholten RJ, Hooft L, Liv-
derman AH. Bivariate analysis of sensitivity and specificity pro- ingstone N, et al. Diagnostic tests for autism spectrum disor-
duces informative summary measures in diagnostic reviews. J Clin der (ASD) in preschool children. Cochrane Database Syst Rev
Epidemiol 2005;58:982–90. 2018;7:Cd009044.
[22] Rutter CM, Gatsonis CA. A hierarchical regression approach to [39] Owen RK, Cooper NJ, Quinn TJ, Lees R, Sutton AJ. Network meta–
meta-analysis of diagnostic test accuracy evaluations. Stat Med analysis of diagnostic test accuracy studies identifies and ranks the
2001;20:2865–84. optimal diagnostic tests and thresholds for health care policy and
[23] Wallace B, Small K, Brodley C, Lau J, Trikalinos T. Deploying an decision-making. J Clin Epidemiol 2018;99:64–74.
interactive machine learning system in an evidence-based practice [40] Lian Q, Hodges JS, Chu H. A Bayesian hierarchical summary re-
center: abstrackr. In: Proc of the ACM International Health Infor- ceiver operating characteristic model for network meta-analysis of
matics Symposium (IHI); 2012. p. p819–24. diagnostic tests. J Am Statist Assoc 2019;114:949–61.
[24] Nyaga VN, Arbyn M, Aerts M. {CopulaDTA}: An {R} package for [41] Ma X. Statistical methods for multivariate meta-analysis of diag-
copula-based bivariate beta-binomial models for diagnostic test ac- nostic tests. University of Minessota. 2015; Available at: https://
curacy studies in a bayesian framework. J Stat Softw, Code Snippets conservancy.umn.edu/ handle/ 11299/ 175241, Accessed January 31st,
2017;82:1–27. 2022
[25] Bates D, Machler M, Bolker B, S W. Fitting linear mixed-effects [42] Owen RK, Cooper NJ, Quinn TJ, Lees R, Sutton AJ. Network meta–
models using {lme4}. J Stat Softw 2015;67:1–48. analysis of diagnostic test accuracy studies identifies and ranks the
[26] Comprehensive R Archive Network mada: meta-analysis of diag- optimal diagnostic tests and thresholds for health care policy and
nostic accuracy. Available at: https:// cranr-projectorg/ web/ packages/ decision-making. J Clin Epidemiol 2018;99:64–74.
mada/indexhtml, Accessed January 31st, 2022 [43] Veroniki AA, Tsokani S, Paraskevaidis E, Mavridis D. Evaluating
[27] Guo J, Riebler A. {meta4diag}: Bayesian bivariate meta-analy- multiple diagnostic tests: an application to cervical cancer. HJOG
sis of diagnostic test studies for routine practice. J Stat Softw 2021;20:11–24.
2018;83:1–31. [44] Takwoingi Y. Meta-analytic approaches for summarising and com-
[28] Huang H. Metatron: meta-analysis for classification data and cor- paring the accuracy of medical tests. Univ Birmingham Res
rection to imperfect reference. R package version 0.1-1 2014. https: Arch; 2016. Available at https:// etheses.bham.ac.uk/ id/ eprint/ 6759/
// cran.r-project.org/ web/ packages/ Metatron/ index.html. 1/Takwoingi16PhD.pdf, Accessed 31 Jan 2022.
[29] Gasparrini A, Armstrong B, Kenward G. Multivariate meta-analy- [45] Boyang Lu, Qinshu Lian, James S. Hodges, Yong Chen and Haitao
sis for non-linear and other multi-parameter associations. Stat Med Chu (2020). NMADiagT: Network Meta-Analysis of Multiple Diag-
2012;31:3821–39. nostic Tests. R package version 0.1.2. https://CRAN.R-project.org/
[30] Rabe-Hesketh S, Skrondal A. Multilevel and longitudinal modeling package=NMADiagT.
using Stata. College Station, TX: Stata Press; 2008.

You might also like