How To Perform A Systematic Review and Meta-Analysis

Radiology Health Services Research
How to Perform a Systematic

Review and Meta-analysis of
Diagnostic Imaging Studies
Paul Cronin, MD, MS, Aine Marie Kelly, MD, MS, MA, Duaa Altaee, BA, Bradley Foerster, MD, PhD,
Myria Petrou, MD, MS, Ben A. Dwamena, MD
A systematic review is a comprehensive search, critical evaluation, and synthesis of all the relevant studies on a specific (clinical) topic
that can be applied to the evaluation of diagnostic and screening imaging studies. It can be a qualitative or a quantitative (meta-
analysis) review of available literature. A meta-analysis uses statistical methods to combine and summarize the results of several studies.
In this review, a 12-step approach to performing a systematic review (and meta-analysis) is outlined under the four domains: (1) Problem
Formulation and Data Acquisition, (2) Quality Appraisal of Eligible Studies, (3) Statistical Analysis of Quantitative Data, and (4) Clinical
Interpretation of the Evidence. This review is specifically geared toward the performance of a systematic review and meta-analysis of
diagnostic test accuracy (imaging) studies.
Key Words: Diagnostic accuracy; evidence-based medicine; evidence-based radiology; heterogeneity; literature search; meta-
analysis; meta-regression; publication bias; receiver operating characteristic analysis; ROC analysis; sensitivity analyses; systematic
review; subgroup analysis; threshold effect.
© 2018 The Association of University Radiologists. Published by Elsevier Inc. All rights reserved.
INTRODUCTION information from a comprehensive study of the literature with

limited bias.
S
ystematic reviews and meta-analyses have become
In this review, a 12-step framework for performing sys-
popular in medicine and are very commonly applied
tematic reviews (and meta-analyses) is outlined under the four
to treatment trials. However, they are still less common
domains: (1) Problem Formulation and Data Acquisition, (2)
for diagnostic imaging studies. Systematic reviews and meta-
Quality Appraisal of Eligible Studies, (3) Statistical Analysis
analyses aim to provide summaries of the average result. In
of Quantitative Data, and (4) Clinical Interpretation of the
the case of imaging tests, this is diagnostic performance such
Evidence (Table 1). We will subsequently use “systematic
as sensitivity or specificity, and the uncertainty of this average.
review” and “meta-analysis” to represent the whole process
In radiology, the smaller patient size and limited method-
of evidence synthesis. The steps in “problem formulation and
ological quality of the primary studies can limit the quality
data acquisition” are “define the question and objective of
of the review and meta-analysis. However, systematic reviews
the review,” “establish criteria for including studies in the
and meta-analyses may be the best assessment of the pub-
review,” and “conduct a literature search to retrieve the rel-
lished literature available at any point in time, especially in
evant literature.” The steps in “quality appraisal of eligible
the absence of large, definitive trials. They may provide im-
studies” are “extract data on variables of interest,” “assess study
portant information to guide patient care and direct future
quality and applicability to the clinical problem at hand,” and
clinical research. Performing and interpreting systematic reviews
“summarize the evidence qualitatively and, if appropriate, quan-
in radiology can be challenging given the paucity of avail-
titatively (meta-analysis).” The steps in “statistical analysis of
able clinical studies. However, if investigators adhere to proper
quantitative data” are “estimate summary diagnostic test per-
methodology, systematic reviews may provide useful
formance metrics and display the data,” “assess heterogeneity,”
“investigate data for publication bias,” “assess the robustness
of estimates of diagnostic accuracy using sensitivity analy-
Acad Radiol 2018; 25:573–593
ses,” and “explore and explain heterogeneity in test accuracy
From the Department of Radiology, University of Michigan, B1 132G Taubman
Center/5302, 1500 East Medical Center Drive, Ann Arbor, MI 48109 (P.C., using subgroup analysis (if applicable).” The steps in “clini-
A.M.K., B.F., M.P., B.A.D.); Department of Neurology, University of Michigan cal interpretation of the evidence” are “graphically display how
(D.A.); Nuclear Medicine Service, VA Ann Arbor Health Care System, Ann Arbor,
Mississippi (B.A.D.). Received August 10, 2017; revised November 21, 2017; the evidence alters the posttest probability using a Fagan plot
accepted December 6, 2017. Address correspondence to: P.C. e-mail: (Bayes nomogram), likelihood ratio scatter graph, or probability-
pcronin@med.umich.edu
modifying plot.” This review is tailored for radiologists who
© 2018 The Association of University Radiologists. Published by Elsevier Inc. are new to the process of performing a systematic review
All rights reserved.
https://doi.org/10.1016/j.acra.2017.12.007 and meta-analysis. However, we hope that those with
573
CRONIN ET AL Academic Radiology, Vol 25, No 5, May 2018
TABLE 1. An Outline of the Main Steps in Doing a Meta- graphic angiography (CTA) compare with magnetic resonance
analysis of Diagnostic Test Accuracy angiography (MRA) for the detection and quantification of
carotid stenosis?” or “In patients with known or suspected
1. Problem formulation and data acquisition coronary artery disease, how does CT coronary angiogra-
Step 1. Define the question and objective of the review
phy compare with invasive catheter coronary angiography for
Step 2. Establish criteria for including studies in the review
identifying one (or more) potentially or probably hemody-
Step 3. Conduct a literature search to retrieve the relevant
literature
namically significant (≥50% coronary artery luminal diameter)
2. Quality appraisal of eligible studies stenosis in terms of sensitivity, specificity and diagnostic ac-
Step 4. Extract data on variables of interest curacy?” or “In patients with a solitary pulmonary nodule,
Step 5. Assess study quality and applicability to the clinical how well does dynamic contrast material–enhanced CT,
problem at hand dynamic contrast material–enhanced MR imaging, FDG PET,
Step 6. Summarizing the evidence qualitatively and if and 99mTc-depreotide SPECT compare for the diagnosis of
appropriate, quantitatively (meta-analysis) malignancy (diagnostic accuracy)?” or “In patients with known
3. Statistical analysis of quantitative data or suspected rotator cuff tears, how does ultrasound compare
Step 7. Estimate diagnostic accuracy and display the data to MRI for diagnosis?” or “Is low-dose CT colonography
Step 8. Assess heterogeneity
equivalent to optical colonoscopy in identifying clinically mean-
Step 9. Assess for publication bias
ingful colonic polyps?” It should be remembered that evidence
Step 10. Assess the robustness of estimates of diagnostic
accuracy using sensitivity analyses (if applicable)
synthesis can be derailed by not asking a focused question.
Step 11. Explore and explain heterogeneity in test accuracy It is also important to have a focused research question as this
using subgroup analysis (if applicable) is used to direct the search.
4. Clinical interpretation of the evidence
Step 12. Graphically display how the evidence alters the
Step 2. Establish Criteria for Including Studies in the
posttest probability
Review
In the perfect diagnostic imaging study, all patients receive

experience with systematic review and meta-analysis will also one or more index tests and the “gold” standard test. However,
find new information in this article. in reality, there can be important deviations from the study
described previously. Examples include using different sets of
inclusion or exclusion criteria for those with and those without
PROBLEM FORMULATION AND DATA the target disease. Another example is, verification of the index
ACQUISITION test results based on information that will only be available
after inclusion in the study. These important issues should be
Step 1. Define the Question and Objective of the
Review considered when drawing up inclusion and exclusion criteria.
Bias or variation may be introduced in five aspects of di-
A good review question addresses a clinical problem for which agnostic imaging study design. First, the criteria used for study
there is uncertainty. Therefore, the first step is to identify the population selection; second, comparator test selection; third,
relevant clinical problem. This includes specifying the patient, index test and comparator test execution; forth, index test and
the index test(s) and reference test being studied, and the comparator test interpretation; and fifth, result analysis. The
outcome measurements (diagnostic test accuracy) (1). In inclusion criteria should incorporate all relevant clinical char-
evidence-based practice, these components can be abbrevi- acteristics of the target condition with which such patients
ated to PICO (Patient, Intervention, Comparator, and would present. It may be important to include the imaging
Outcome) or in the Cochrane guidelines for diagnostic ac- setting, as test accuracy can vary between primary, second-
curacy tests as PICTS (Patient, Index test, Comparator test, ary, and tertiary care, and also between screening and diagnostic
Target disorder and Study design) (2–4). Patients can refer uses. The inclusion criteria for the index test may include details
to patients presenting signs and symptoms of the disease (di- of the tests being evaluated, such as, but not limited to, the
agnostic studies), patients with the disease (prognostic studies), manufacturer, type of image processing, and generation of tech-
or population at risk of the disease (screening studies). The nology. This also applies for the comparator test.
index test is the test to be evaluated. A meta-analysis may con-
sider and compare several index tests. The comparator test
Step 3. Conduct a Literature Search to Retrieve the
is standard practice or the reference standard or the “gold stan- Relevant Literature
dard” that the index tests are compared to. It is the test or
procedure used to classify patients as having the target con- Secondary research too is prone to biases, especially selec-
dition or disease or not. The target disorder is the disease that tion bias and publication bias. Selection bias, which the
one is trying to diagnose. Examples of PICO questions or state- researcher has control over, is bias in the published studies
ments are shown in Table A1. These include “In patients with included in the review. Publication bias, which the re-
symptomatic carotid stenosis, how does computed tomo- searcher does not have control over, is bias in the primary
574
Academic Radiology, Vol 25, No 5, May 2018 SYSTEMATIC REVIEW AND META-ANALYSIS
studies published. Bias can occur in the way that the system- the meta-analysis (6). Quality assessment of included studies
atic review or meta-analysis results are summarized and may take precedence over attempts to locate unpublished re-
presented, or in the level of importance attributed to the study search (7).
results. To avoid selection bias, an objective, systematic, and The articles are first screened (screening phase) at the ab-
comprehensive search strategy using several electronic data- stract level for specified exclusions, and the potentially eligible
bases should be used. A search protocol that clearly shows the articles based on the abstracts alone are retrieved and then re-
methods used to search the literature should be used. Search- viewed in their entirety (eligibility phase), with further
ing for and identifying imaging studies are more difficult than subsequent exclusions applied at the full-text level. The re-
searching for and identifying therapeutic studies. Imaging ac- maining articles are included in the review (included phase)
curacy studies are not limited to a single study design (5). The (Fig 1). What distinguishes a systematic review from a nar-
most useful search terms for identifying diagnostic imaging rative review is documenting the literature search with sufficient
studies are the diagnostic tests of interest and the clinical dis- detail so that it could be reproduced. A systematic search should
order. An iterative process is often required to maximize the minimize bias, producing more reliable estimates of diagnos-
search yield. It may be worthwhile including or discussing tic accuracy. Therefore, describe the search and provide an
your research question with a librarian or information spe- illustrative flow chart such as Figure 1. The description and
cialist and have them perform or assist with the literature search. chart should include the number of articles initially identi-
Identifying existing systematic reviews and health technolo- fied through database and other searching, and the number
gy assessments can be used to expand or refine the search. of articles after removal of duplicate. Then, describe and chart
The literature search should encompass published and un- the number of articles screened and excluded, and the number
published materials. A search of single electronic database is of full-text articles assessed for eligibility and number of ex-
not considered adequate for a systematic review as it may miss cluded studies (with reasons for exclusion).
studies and lead to bias. Table A2 outlines examples of search
sources. The search should involve several electronic data-
bases such as MEDLINE, EMBASE, and Cochrane Central QUALITY APPRAISAL OF ELIGIBLE STUDIES
Register of Controlled Trials. PubMed is a free search engine. Step 4. Extract Data on Variables of Interest
It primarily accesses the MEDLINE database of references and
abstracts. The database is maintained as part of the Entrez system Extracted data usually include details of participants or pa-
of information retrieval by the United States National Library tients, index test, comparator test, target disorder, study design,
of Medicine at the National Institutes of Health. Entrez Global results, publications, and investigators. Review authors should
Query is an integrated search and retrieval system that pro- plan a priori what data will be required for their systematic
vides access to all databases simultaneously with a single query review, and develop a strategy for obtaining data. Informa-
string and user interface. EMBASE is a biomedical database tion about patients should include the spectrum of patients
of published literature produced by the medical or scientific who received the test, demographic data such age and gender,
publisher, Elsevier. It contains over 28 million records from co-morbid conditions, and information about the index and
over 8400 published journals, from 90 countries, and with comparator test such as scan parameters and generation of tech-
daily updates. The Cochrane Central Register of Con- nology. The reference standard should be capable of classifying
trolled Trials (CENTRAL) is an excellent source of reports the target condition correctly. Information about a disorder
of randomized and quasi-randomized controlled trials. The may be about a lesion (size, imaging characteristics, classifi-
majority of CENTRAL records are mainly taken from cation [benign vs malignant]) or about a disease. Basic study
MEDLINE and EMBASE, but records are also derived from characteristics would include design (randomized control trial,
other published and unpublished sources. CENTRAL records prospective or retrospective cohort) and the duration of the
often include an abstract or summary of the article but do study. Geographical regions may have important differences
not contain the full text of the article. The search should also that could affect test accuracy or delivery. Technology dif-
include relevant journals, conference proceedings or check- ferences or trends over time may also be important. Generic
ing of the reference lists of papers found in the searches, but information that may be extracted from each study includes
also books of recently published abstracts presented at scien- study citation, study’s first author name, year and journal of
tific meetings, and those that summarize doctoral theses. It publication, and country or region where the study was per-
may involve personal contact with experts in the area. Two formed. Other data, such as whether ethical approval was
important reasons to do this are to identify published studies obtained, whether a sample size calculation was performed,
that might have been missed because they are in press or not funding sources, or potential conflicts of interest of the study
yet indexed and to identify unpublished or gray literature. Con- authors, can indicate the quality of the study conducted.
troversy remains about including gray literature. Not including Relevant test outcomes may be dichotomous (common-
unpublished studies in the review increases the chances of pub- ly), ordinal or categorical, or continuous. For dichotomous
lication bias, that is, the overrepresentation of studies with outcomes, disease positive or disease negative and test posi-
positive results. This can result in a systematic overestima- tive or test negative, there are four groups within the 2 × 2
tion of test performance, a possible threat to the validity of table. These are true positive, false positive, false negative, and
575
Figure 1. Example of a flowchart of a lit-

erature search.
true negative. Extracting these data allows calculation of study- enced readers may overestimate diagnostic accuracy and not
specific sensitivity, specificity, positive- and negative- reflect daily practice (8). Using each reader’s data and treat-
predictive values and accuracy, and likelihood ratios and ing each as an individual study will overrepresent the results
diagnostic odds ratio (DOR) for the meta-analysis. Having of that single study in the sample, and the biases inherent to
sensitivities and specificities as proportions without numera- that study will be magnified in the pooled results (8). In ad-
tors and denominators is not sufficient. This lack of raw data dition, there are the statistical challenges due to the paired
results in exclusion of otherwise eligible articles. However, nature of the data. McGrath et al. evaluated the handling of
if study-specific sample size is known, it may be possible to multiple readers in systematic reviews-meta-analyses of imaging
derive the raw 2 × 2 data using the proportions provided in diagnostic accuracy (8). The authors found that most meta-
the article. Further challenges to data extraction particularly analysts do not report how they handled the issue. In 25%
relevant to radiology studies involve studies reporting results of meta-analyses, investigators averaged the results from mul-
from multiple readers, multiple sessions (eg, using different tiple readers within the study, whereas in 50%, the results from
combinations of pulse sequences), or multiple interpretative each individual reader within a study was treated as a sepa-
approaches. The case of multiple readers is a particularly dif- rate dataset (8). Optimal methods for handling multiple reader
ficult issue commonly encountered for imaging systematic data are not available, but multilevel hierarchical models ac-
reviews and meta-analyses. Different strategies may be used counting for between-observer variability within studies, and
to handle data from more than one reader, such as (1) using between-study variability, provided multiple reader data are
an average of the diagnostic accuracy results across readers within reported consistently at the primary study level, are needed
a study, (2) selecting data from the reader with the highest (8). Using such models, all readers would be included in the
accuracy within a study, (3) selecting data from the reader with meta-analysis, interobserver variability at the primary study
the most years of clinical experience, (4) treating each read- level would not be lost, and a single study would not be over-
er’s data within a study as an individual study, and (5) randomly represented (8).
selecting one reader within a study (8). Averaging the results There are no easily implemented statistically rigorous methods
of multiple readers minimizes heterogeneity from interobserver for dealing with continuous data or ordinal data. It is rec-
variability (8). Using data from the best or most experi- ommended that ordinal or continuous test results be
576
dichotomized by selecting a threshold using cut points based guidance is developed. Third, the published flow diagram for
on criteria such as Youden’s index or Euclidean distance. There- the primary study is reviewed. If none is reported, a flow
fore, it is best to extract raw data and investigate the impact diagram is constructed. Fourth, a judgment of bias and ap-
of choice of cut point in a sensitivity analysis. plicability is made (13). The investigator uses signaling questions
Extracting data from the primary studies is an important for scoring each article as high, low, or unclear risk of bias
but time-consuming part of a systematic review. A data col- and applicability (13). It is essential to tailor QUADAS-2 to
lection form should be used and designed with data extraction each review by adding or omitting signaling questions. The
in mind. Each included study should be read in its entirety. QUADAS-2 tool is applied in four phases: summarize the
To minimize errors and reduce potential biases, it is strongly review question, tailor the tool and produce review-specific
recommended that data are extracted from every study by more guidance, construct a flow diagram for the primary study, and
than one person. This is particularly important where there judge bias and applicability (13). Figure 2 shows an example
is subjective interpretation or information extraction is crit- of QUADAS 2 as a graphic illustration of the percentage of
ical to the interpretation of results. Research has shown that studies meeting each criteria. The QUADAS-2 tool is avail-
independent data extraction by two or more authors re- able at http://www.bristol.ac.uk/media-library/sites/quadas/
sulted in fewer errors than a data extraction by a single author migrated/documents/quadas2.pdf. At the website http://www
(9). One study observed a high prevalence of data extrac- .bristol.ac.uk/population-health-sciences/projects/quadas/
tion errors, whereas another study found that at least 7 of 27 quadas-2/, there is a table that summarizes QUADAS-2 and
reviews had substantial errors (10,11). Those involved in the lists all signaling, risk of bias, and applicability rating ques-
data extraction should practice using the data extraction form. tions. In the top row of the table are the domains of patient
Pilot testing the form using a representative sample is nec- selection, index test, reference standard, and flow and timing.
essary so as to identify important but missing extraction items. In the first column are description, signaling questions
This should also minimize revising the form after data ex- (yes/no/unclear), risk of bias (high/low/unclear), and con-
traction has begun. There is potential for disagreement when cerns regarding applicability (high/low/unclear).
more than one person is extracting data. There should be a
procedure for resolving disagreements such as arbitration by
Step 6. Summarize the Evidence Qualitatively and, if
a third party. Appropriate, Quantitatively
The evidence should be qualitatively summarized, that is, a

Step 5. Assess Study Quality and Applicability to the
Clinical Problem at Hand systematic review, and if appropriate, quantitatively summa-
rized in a meta-analysis. There are situations when it may not
Once all relevant studies have been identified, the re- be appropriate to perform a meta-analysis. These situations
searcher must decide which studies are sufficiently well include, if the literature search and inclusion criteria yield only
conducted to be included in the meta-analysis. A method- a small number of articles or there is heterogeneity between
ologically sound meta-analysis will use explicit and objective studies or there is concern for bias. Meta-analyses may com-
a priori criteria for inclusion or rejection of studies based on pound these errors and produce an erroneous result. Meta-
quality grounds to minimize selection bias (12). A good- regression may be useful to investigate how studies of poor
quality review should include a critical appraisal. It should also methodological quality can bias results.
document the quality of each of the studies included. The
validity of the results and conclusions of the review may be
threatened by unsatisfactory quality of the included studies. STATISTICAL ANALYSIS OF QUANTITATIVE DATA
For the quality assessment of studies included in system- Step 7. Estimate Diagnostic Accuracy
atic review of diagnostic test (including imaging) accuracy,
the QUADAS assessment tool is recommended (3). The Meta-analysis of Diagnostic Test Accuracy Differ From Meta-
QUADAS and updated QUADAS-2 tool are used as an as- analysis of Interventions
sessment of methodological quality, study validity, and risk Unlike meta-analyses of interventions, which deal with one
of bias within the study (3,4,13). In 2003, the QUADAS tool statistic such as relative risk, diagnostic test meta-analyses must
for systematic reviews of diagnostic accuracy studies was de- account for two or more paired statistics. These are usually
veloped. In 2011, QUADAS-2 was developed. Its function sensitivity and specificity, positive- and negative-predictive
is to assess the quality of primary diagnostic accuracy studies. values, and positive and negative likelihood ratios for the re-
It should not be used to replace the data extraction process spective imaging test results. Caution should be used when
of the review and should be applied in addition to data ex- generating summary statistics of positive- and negative-
traction (13). This revised tool consists of a list of possible predictive values as these vary with disease prevalence. Another
sources of bias for diagnostic accuracy studies within the difference is heterogeneity, which is common in meta-
domains of patient selection, index test, reference standard, analyses of test accuracy studies. Therefore, random-effects
and flow and timing (13). The tool is completed in four phases. models, which account for between-study variability beyond
First, the review question is stated. Second, review-specific chance, are required. Undertaking meta-analyses accounting
577
(a)
(b)
(c)
(d)
Figure 2. (a) Methodological quality, study validity, and risk of bias summary for each study showing authors’ judgments about each domain
for each included study using the QUADAS tool. (b) Study quality scores. Graph illustrates study quality based on QUADAS criteria, ex-
pressed as a percentage of studies meeting each criterion. (c) Risk of bias and applicability summary for each study showing authors’ judgments
about each domain for each included study using the QUADAS 2 tool. (d) Study quality scores. Graph illustrates study quality based on
QUADAS 2 criteria, expressed as a percentage of studies meeting each criterion.
578
for both sensitivity and specificity, the relationship between ty, investigators continue to use univariate methods. The use
them, and the heterogeneity in imaging test accuracy re- of univariate methods can potentially lead to overoptimistic
quires fitting hierarchical random-effects models such as conclusions of meta-analyses. Recently, McGrath et al. as-
summary receiver operator characteristic (SROC) curves and sessed whether authors of systematic reviews of diagnostic
may require biostatistical expertise to do this. accuracy studies published in imaging journals used the rec-
ommended methods for meta-analysis. The authors also
Analyzing the Data evaluated the effect of traditional methods on summary es-
The summary statistics for imaging test accuracy commonly timates of sensitivity and specificity (23). The authors found
used are sensitivity and specificity, positive- and negative- that bivariate methods are used a minority of the time, that
predictive values, and accuracy as determined by the area under this issue is not improving with time, and that univariate
the receiver operator characteristic (ROC) curve (Table A3). methods can lead to overestimation of diagnostic accuracy with
Other summary statistics include likelihood ratios (positive, narrower confidence intervals (CIs) than the recommended
negative, or multiple level) and DOR. Meta-analysis can also (HSROC or bivariate) methods (23). Of the 300 reviews from
address how imaging test accuracy varies with clinical and meth- January 2005 to May 2015 that met the authors’ inclusion
odological characteristics. criteria, only 39% used recommended meta-analysis methods
Model Fitting and Statistical Methods for Pooling Data (24). No change in the method used was observed over time.
Moses-Littenberg SROC curves.—The Moses-Littenberg method However, there was geographic, subspecialty, and journal het-
provides a simple model for deriving an SROC (14,15). It erogeneity (25). Fifty-one meta-analyses using univariate
was one of the earliest models to be proposed and has been random-effects methods were reanalyzed with the bivariate
used extensively in meta-analyses of diagnostic test accura- model. The average change in the summary estimate for sen-
cy. Its inability to provide estimates of the heterogeneity sitivity was 1.4% and for specificity was 2.5%. Both changes
between studies is a limitation. Therefore, more complex hi- were statistically significant. The average change in width of
erarchical models that properly allow random effects in the CI was 7.7% for sensitivity and 9.9% for specificity. Sim-
diagnostic test accuracy have superseded it. ilarly, both changes were statistically significant (24).
Hierarchical and Bivariate Models

Display the Data
To overcome the limitations of the Moses-Littenberg method,
more statistically rigorous approaches that are based on hi- Forest Plot
erarchical models have been proposed. These are the hierarchical Meta-analysis data may be presented pictorially using a forest
SROC (HSROC) model proposed by Rutter and Gatsonis plot (Fig 3). The findings from each individual study are dis-
and the bivariate random-effects model, and are the cur- played as a circle or square (26). The weight of the study in
rently preferred models (16–20). If heterogeneity between the meta-analysis is proportional to the area of the circle or
studies is thought to be caused primarily by a threshold effect, square. This also reflects precision of the study. This is roughly
the HSROC model should be used (20). If heterogeneity is related to the sample size. A horizontal line is drawn around
due to other causes, a bivariate random-effects model is rec- each of the studies’ squares. This is usually the 95% CI and
ommended (18). The distribution of the observed pairs of represents the uncertainty of the study sensitivity or speci-
sensitivity and specificity values from each study is modeled ficity. All the studies are combined to calculate the summary
with both approaches. The average sensitivity and specific- statistic. This is usually displayed as a diamond at the bottom
ity, and a valid estimation of the underlying SROC curve can with its 95% CI (Fig 3). The plots for diagnostic imaging ac-
be obtained with both models (18,21). With the HSROC curacy studies are known as coupled forest plots as they contain
model, an explicit formula linking sensitivity and specificity two graphical sections: one depicting sensitivity and the other
through a threshold is assumed. This model accounts for the depicting specificity (Fig 3).
variability across studies. An advantage of this approach is that
multiple summaries of the data, including an SROC curve, Positivity Thresholds
can be estimated. It can also estimate average values of ac- Expressions of diagnostic imaging accuracy, such as sensitiv-
curacy measures, such as sensitivity and specificity. With the ity and specificity, are not always fixed test characteristic. They
bivariate random-effects model, the average sensitivity and speci- describe the performance of the test under a particular cir-
ficity is estimated. It also estimates the unexplained variation cumstance. The sensitivity and specificity are likely to change
in these parameters and the correlation between them. These in a different population, in a different setting, or with a dif-
models are mathematically equivalent in the absence of ferent strategy for pretesting. To classify test results as either
covariates (21,22). The addition of covariates to the models positive or negative may require a decision rule or positivity
enables exploration of explainable heterogeneity. threshold. A range of sensitivity and specificity pairs, ob-
It is very important that the recommended HSROC or bi- tained as the threshold criterion is varied, is often reported
variate methods for pooling in meta-analyses of diagnostic as an ROC curve. This is a graph of the sensitivity (true-
accuracy studies be used rather than traditional univariate positive rate) on the vertical axis against the false-positive rate
methods. Possibly for reasons of convenience or accessibili- (FPR) or 1 − specificity on the horizontal axis (27).
579
Figure 3. Forest plot showing study-specific and mean sensitivity and specificity. Each black square is a study-specific sensitivity and
specificity. The size of the black square reflects the weight of the study in the meta-analysis, and the horizontal line reflects the 95% con-
fidence interval (CI). The vertical broken line represents the pooled sensitivity or specificity and the boundaries of the hollow diamond displayed
at the bottom represent the 95% CI of the pooled results.
Although constructed from sensitivity and specificity, ROC Test Results Are Available Only as a Dichotomy
curves do not depend on the decision threshold. In an ROC If primary studies dichotomize data (disease present or absent)
curve, each point represents the sensitivity and FPR at a dif- and therefore only provide sufficient information to esti-
ferent decision threshold. The area under the ROC curve is mate sensitivity and specificity, the mean sensitivity and the
an overall measure of the test’s accuracy. A perfect test has a mean specificity can be estimated, possibly weighted by the
value of 1, whereas a value of 0.5 is obtained if the test does sample size of each study.
no better than chance (27).
Test Results Are Available in More Than Two Categories
Summary ROC Plots If test results are measured as a continuum such as standard
Summary ROC plots display the results of individual studies uptake value, or as responses on an ordinal categorical scale
in ROC space, each study is plotted as a single sensitivity- such as with ventilation/perfusion scanning, normal to high
specificity point. As discussed previously, the size of points probability of pulmonary embolism, other analytic tech-
depicts the precision of the estimate, which is the inverse of niques can be used. If no threshold or scaling differences
the standard error of the logit of sensitivity and logit of between primary studies exist and test comparison is not an
specificity. objective, then result-specific likelihood ratios can be ob-
tained from logistic modeling procedures.
Linked ROC Plots
Where two tests are evaluated in each study, linked ROC Step 8. Assess Heterogeneity
plots are used in analyses of pairs of tests. Points are plotted
as in a normal summary ROC plot, but the two estimates, Meta-analyses should only include studies that exactly match
one for each test, from each study are joined by a line (Fig 4). the question. However, studies can differ by patients studied,
580
Figure 4. SROC curve with confidence

and prediction regions around mean op-
erating sensitivity and specificity point. Area
under the summary ROC curve. Each circle
represents an individual study result. The
diamond in the center represents intersec-
tion of the summary sensitivity and
specificity, the inner dashed line repre-
sents the 95% confidence interval of the
summary sensitivity and specificity, and the
outer dotted line represents the 95% pre-
dicted interval. SENS, sensitivity; SPEC,
specificity; SROC, summary receiver op-
erating characteristic; AUC, area under the
curve.
disease severity, co-morbidity, test methods, study design, and yses have two variables: sensitivity and specificity. The I2 statistic
other factors. These systematic differences between studies can can be estimated separately for sensitivity and specificity but
lead to heterogeneity between studies. In meta-analyses of di- this is not ideal. Zhou et al. derived an improved I2 statistic
agnostic imaging studies, heterogeneity is the rule rather than measuring heterogeneity for dichotomous outcomes, such as
the exception because of nonrandomized design of most in- with diagnostic test meta-analyses (31). For bivariate diag-
cluded studies and natural variation in sensitivity and specificity nostic meta-analyses, the authors derived a bivariate version
across positivity thresholds. Therefore, heterogeneity should of I2 that is able to account for the correlation between sen-
be tested. Random-effects meta-analysis methods are recom- sitivity and specificity (31) and dependence of within-study
mended when data are heterogeneous, as in diagnostic imaging variance on the value of binomial proportions.
meta-analyses and should be fitted by default. Random- Some have argued that heterogeneity is not quantified in
effects (hierarchical) models provide an estimate of the average systematic reviews of diagnostic test accuracy because one, it
accuracy of the test and the variability in this effect. Where is expected, and two, tests for heterogeneity in sensitivity and
there are too few studies to estimate between-study variabil- specificity and estimates of the I2 statistic do not account for
ity, fixed-effect models can be used. Cochrane Q is a heterogeneity explained by phenomena such as positivity thresh-
commonly used test. It is a statistic based on the chi-square old effects. Instead, heterogeneity should be assumed, and
test (28). However, this test has low power and may fail to sources for heterogeneity explored. Therefore, another ap-
detect heterogeneity when it is present. Therefore, the Higgins’ proach is to use subgroup analyses and multiple univariate meta-
I2 statistic was developed to overcome this limitation (29). The regression analyses. Meta-regression can also determine whether
I2 test scores heterogeneity between 0% and 100%, with 25% the heterogeneity is attributable to the covariates used. The
corresponding to low heterogeneity, 50% to moderate, and modeling strategy should specify the criterion to decide whether
75% to high. A major advantage is, I2 does not inherently or not a covariate is included, and the adding or removing
depend on the number of studies in the meta-analysis (30). of covariates.
However, this test too may also have insufficient power to If there is sufficient heterogeneity, it may not be appro-
detect heterogeneity if present and should be interpreted with priate to calculate overall summary measures such as sensitivity
some caution (28). The I2 statistic was originally developed or specificity. It is important to reiterate that if too much het-
for univariate meta-analyses. However, diagnostic test anal- erogeneity is encountered or there is a lack of high-quality
581
studies, it may be more appropriate to solely perform the sys- region. This suggests threshold effect. If a negative correla-
tematic review and to refrain from a further meta-analysis. tion between sensitivity and specificity or a positive correlation
For example, in the following study, the authors performed between sensitivity and 1 − specificity is found and a corre-
a systematic review of the topic and then concluded that meta- lation coefficient is computed, it has been suggested that the
analysis was not appropriate in light of the identified literature square of the correlation coefficient is approximately equal
(32): In this study, a forest plot of sensitivity and specificity to the amount of heterogeneity that can be attributed to thresh-
for the included studies in the diagnostic accuracy portion of old effect. For instance, if the correlation coefficient was 0.5,
the analysis was generated. However, pooling was not per- this squared is 0.25. Therefore, approximately 25% or a quarter
formed because of the relatively small number of studies, the of the heterogeneity observed could be attributed to thresh-
relatively high risk of bias, and the inherent heterogeneity sec- old effect. When evidence of a threshold effect between studies
ondary to the varied study design among the included studies. in the systematic review and meta-analysis is observed, summary
points alone should be avoided. Summary points such as
Heterogeneity Due to Threshold Effect summary sensitivity, specificity, or DOR may not correctly
It should be noted that some of the variability in test per- reflect the variability between studies and may miss impor-
formance between studies might relate to the selection of a tant information regarding heterogeneity between studies (33).
different diagnostic threshold rather than true differences in It is more appropriate to construct an SROC curve to show
test performance. Threshold effect is one of the primary causes how the different sensitivities and specificities of primary studies
of heterogeneity in meta-analyses of test accuracy studies. It are related to each other (34).
occurs when differences in sensitivities and specificities due
to different cutoffs or thresholds are used to define a posi- Heterogeneity Due to Non-threshold Effect
tive (or negative) test result in the different studies included As stated previously, heterogeneity may be due to other factors
in the meta-analysis. When threshold effect exists, there is a other than threshold effect. These include variations in study
negative correlation between sensitivity and specificity or a population such as severity of disease and co-morbidities, index
positive correlation between sensitivity and 1 − specificity. It test factors such as differences in technology and generations
should be noted that correlation between sensitivity and speci- of technology, reference standard differences, and differ-
ficity could arise due to a number of reasons other than ences in the way a study was designed and conducted (35).
threshold effect. These include partial verification bias, dif- Heterogeneity among included studies in a meta-analysis
ferent spectra of patients, or different settings. can be assessed in two different ways. The first option is a
There are a number of ways to assess threshold effect in visual inspection of “paired” forest plots of sensitivity and speci-
meta-analysis. The first option is to produce “paired” forest ficity. If studies are reasonably homogeneous, the sensitivity
plots of sensitivity and specificity. If this forest plot is in as- and specificity estimates from individual studies will lie along
cending order of sensitivity along with the corresponding the line corresponding to the summary sensitivity and speci-
specificity, and as sensitivity increases, there is decreasing specificity estimate. However, if there are large deviations from
ficity, this could be explained by a threshold effect or vice this line, this indicates possible heterogeneity. A second option
versa. The same inverse relationship will be seen with pos- is based on statistical testing such chi-square, Cochrane Q,
itive and negative likelihood ratio. Similarly, one can assess and the inconsistency index or I2. The pros and cons of these
the correlation of the logits of sensitivity and specificity. If are discussed previously. A third option is to assume heter-
there is a negative correlation, this suggests threshold effect. ogeneity is present and explore sources of heterogeneity with
Alternatively, using the logit of sensitivity and 1 − specific- meta-regression.
ity, a positive correlation suggests the presence of a threshold
effect. If there is a positive correlation of sensitivity plotted Meta-regression
against 1 − specificity in logit space, this suggests a threshold Univariate or multivariate regression analysis can be used in
effect. The second option is a representation of accuracy es- primary studies to assess the relationship between one or more
timates from each study in ROC space. If a plot of sensitivity covariates and a dependent variable. Essentially the same ap-
against 1 − specificity results in a typical ROC pattern some- proach can be used with meta-analysis. This is called meta-
times referred to as a “shoulder arm” plot, this suggests threshold regression. The difference here is that the covariates are at
effect. A third option is a computation of Spearman corre- the level of the study rather than the level of the subject. The
lation coefficient between the logit of sensitivity and logit of causes of heterogeneity should be investigated when de-
1 − specificity. Threshold effect is suggested if there is a strong tected. Patient characteristics, definitions of the test and reference
positive correlation. A fourth option is to create a chi plot. standards, and operating characteristics of the test can result
This is used to judge whether or not sensitivity and speci- in heterogeneity in sensitivity and specificity. Meta-regression
ficity are independent by augmenting the scatter plot with allows the exploration of which types of patient-specific factors
an auxiliary display. In the case of independence, the points or study design factors contribute to the heterogeneity. Meta-
will be concentrated in the central region, in the horizontal regression uses summary data from each trial, such as the
band indicated on the plot. In the case of interdependence, accuracy. Covariates may be introduced into a regression with
the points will be scattered and not concentrated in the central any test performance measure as the dependent variable. The
582
Figure 5. Assessing publication bias.

Funnel plot with superimposed regres-
sion line. Formal testing for publication bias
may be conducted by a regression of di-
agnostic log odds ratio against 1/square
root of the effective sample size (1/root
(ESS)), weighting by effective sample size,
with P < .10 for the slope coefficient indi-
cating significant asymmetry. The
statistically nonsignificant P value (.89) for
the slope coefficient suggests symmetry in
the data and a low likelihood of publica-
tion bias. However, the test is known to
have low power (41).
sample size corresponds to the number of studies in the anal- include unpublished studies. Publication bias can be assessed
ysis. A small sample size limits the power to detect significant with a funnel plot in which effect size is plotted against the
effects. The accuracy measure that is frequently used is DOR. sample size (24,36,37). An inverted symmetrical funnel of dots
It is a useful measure of diagnostic performance, as it is a single is consistent with the absence of publication bias (38). An asym-
measure that encompasses both sensitivity and specificity and metric plot suggests that some studies may have been missed
likelihood ratios. It can compare the overall diagnostic ac- by the meta-analysis. Asymmetry can also occur if small studies
curacy of different tests but is limited because it cannot be have larger effects (25,39). However, it can be difficult to detect
used directly in clinical practice (35). In primary studies, a asymmetry visually (40). Therefore, formal statistical methods,
minimum of 10 subjects is required for each covariate as- such as Egger’s regression, have been developed (36). Egger’s
sessed. Similarly, in meta-analysis, a minimum of 10 studies regression tests whether small studies have larger effect sizes
is required for each covariate assessed. A lower number of than would be expected by chance, and whether small studies
included studies in the meta-analysis limits the number of with small effects have not been published. For meta-
covariates that can be included in and the ability to perform analyses of diagnostic imaging accuracy studies, the best method
meta-regression. Often, there are too few studies included to for investigating publication bias has not yet been decided,
perform multivariate meta-regression and instead one or several and there is a paucity of research. Statistical tests detect funnel
univariate meta-regression analyses are performed. In a meta- plot asymmetry rather than publication bias, and tests de-
analysis with less than 10 included studies, meta-regression signed for meta-analyses of randomized trials are probably not
may not be performed. This means that causes of heteroge- applicable to diagnostic studies. Other regression tests are being
neity cannot be investigated. This may be a significant limitation developed to overcome the problem of small numbers of small
to a meta-analysis of diagnostic test accuracy as heterogene- studies with weakly positive effects (Fig 5) (24,41,42). Cur-
ity is to be expected. rently, there is no clear consensus on which test to use and
Another approach, using individual patient data, allows when. Whatever test is used, findings should be interpreted
greater flexibility for the analysis and exploration of issues not with care. Further research is required. At present, the best
covered in the published trials. However, obtaining the orig- method to assess publication bias is that proposed by Deeks
inal patient data from the trials can be challenging. et al. (41). The authors have shown that the regression test
has greater power to detect publication bias than the rank cor-
Step 9. Assess Publication Bias relation test (41). The authors recommend that systematic
reviewers undertake funnel plot investigations to examine the
When authors or journals are more likely to publish re- possibility of publication and other sample size–related effects.
search with positive results, this distorts the available evidence They recommend testing for asymmetry using regression tests,
and is known as publication bias. However, null and nega- plotting the log of DOR against the square root of 1/effective
tive results are just as valid, and meta-analyses should try to sample size (ESS) (41). Deeks et al. showed that these tests
583
Figure 6. Forest plot of results of multi-

ple univariate meta-regression and
subgroup analyses. This figure shows the
results of a meta-regression of the dataset.
Covariates, such as study design (prospec-
tive cohort [prodesign]), study size ≥ 30
patients (ssize 30), and partial verification
bias (fulverif) may be introduced into a re-
gression with any test performance measure
as the dependent variable. In diagnostic
studies, heterogeneity in sensitivity and
specificity can result from many causes
related to definitions of the test and refer-
ence standards, operating characteristics
of the test, methods of data collection, and
patient characteristics. Meta-regression is
the use of regression methods to incorpo-
rate the effect of covariates on summary
measures of performance and can be used
to explore between-study heterogeneity. As
with any meta-regression, however, the
sample size will correspond to the number
of studies in the analysis with small number
of studies, limiting the power of regres-
sion to detect significant effects.
are robust when used in meta-analyses of studies of diagnos- should be deployed to resolve these uncertainties. Sensitivi-
tic test accuracy (41). ty analyses may generate areas for further investigations and
future research.
Step 10. Assess the Robustness of Estimates of

Diagnostic Accuracy Using Sensitivity Analyses Step 11. Explore and Explain Heterogeneity in Test
Accuracy Using Subgroup Analysis
A sensitivity analysis explores how the main findings of the
meta-analysis may change by varying the approach to aggre- Although some sensitivity analysis may involve restricting the
gating the data. Examples include excluding unpublished or analysis to a subset or subgroup of the original studies, the
poor-quality studies. Some sensitivity analyses can be deter- two are not the same. Exploring and explaining heteroge-
mined a priori. However, others cannot and only identify neity in test accuracy are the aims of subgroup analysis.
issues suitable for sensitivity analysis during the meta- Subgroup analysis and sensitivity analysis differ in two ways.
analysis. Sensitivity analysis results are best reported in a First, in subgroup analysis, estimates are produced for all groups,
systematic way with a summary table or figure (Fig 6). whereas in sensitivity analyses, the effect of the covariate in
When the sensitivity analysis shows that the result and con- the group of studies removed from the analysis is not esti-
clusions of the meta-analysis are not affected by the sensitivity mated. Second, in subgroup analysis, formal statistical
analysis, the results of the review can be viewed with rea- comparisons are made across the subgroups, whereas in sen-
sonable certainty. Where they are affected, further exploration sitivity analysis, informal comparisons are made.
584
Comparing Index Tests Analysis With Small Numbers of Studies

A diagnostic imaging review may involve a comparison of When the number of studies included in a meta-analysis is
the diagnostic accuracy of two or more index tests used to small, it may be difficult to decide which is the most appro-
diagnose the same condition. Two approaches are generally priate model. However, estimates of the variances of the random
adopted to compare tests. The first approach utilizes test ac- effects will be subject to a high level of uncertainty for both
curacy data from all eligible studies that have evaluated one hierarchical models: the bivariate and HSROC models.
or both tests. The advantage of the first approach is it maxi-
mizes the number of studies in the analysis. However, CLINICAL INTERPRETATION OF THE EVIDENCE
confounding may be an issue as the studies are likely to be
Step 12. Graphically Display How the Evidence Alters
heterogeneous. The second approach restricts the analysis to the Posttest Probability
studies that have evaluated both tests. This can be in either
the same individuals or individuals that have been random- Fagan Plot (Bayes Nomogram)
ized to undergo one or other of the two tests. The advantage The clinical or patient-relevant utility of a diagnostic test is
of the second approach is a reduced likelihood of bias due evaluated using the likelihood ratios to calculate posttest prob-
to confounding. Therefore, results should be more reliable. ability based on Bayes theorem as follows:
However, the number of studies that report direct compari-
sons of tests is usually limited. This means that such an analysis Pretest probability = Prevalence of target condition PTP
may not be feasible or may only be considered as a sensitiv- = LR × Pretest probability
ity analysis. [(1 − Pretest probability ) × (1 − LR )]
Figure 7. Likelihood ratio or Fagan nomogram for different pretest probability of disease: 25%, 50% and 75% for two tests. Posttest prob-
ability is derived by drawing a straight line from the pretest probability vertical axis to the appropriate likelihood ratio and continuing the
straight line to the vertical posttest probability axis. Where this line intersects the vertical posttest probability axis is the posttest probabil-
ity. When Bayes theorem is expressed in terms of log-odds, the posterior log-odds are linear functions of the earlier log-odds and the
log-likelihood ratios. A Fagan plot consists of a vertical axis on the left with the earlier log-odds, an axis in the middle representing the
log-likelihood ratio and a vertical axis on the right representing the posterior log-odds (43).
585
Figure 8. Likelihood ratio scatter graph shows summary point of likelihood ratios obtained as functions of mean sensitivity and specificity
in the right upper quadrant, suggesting that the test is useful for confirmation of presence of disease (when positive) and not for its exclu-
sion (when negative). Informativeness may also be represented graphically by a likelihood ratio scatter graph or matrix. It defines quadrants
of informativeness based on established evidence-based thresholds: Left upper quadrant, likelihood ratio positive > 10, likelihood ratio neg-
ative < 0.1, confirmation and exclusion, suggesting that the test is useful for confirmation of presence of disease (when positive) and for its
exclusion (when negative). Right upper quadrant, likelihood ratio positive > 10, likelihood ratio negative > 0.1, confirmation only, suggesting
that the test is useful for confirmation of presence of disease (when positive) and not for its exclusion (when negative). Left lower quadrant,
likelihood ratio positive < 10, likelihood ratio negative < 0.1, exclusion only, suggesting that the test is not useful for confirmation of pres-
ence of disease (when positive) but is for its exclusion (when negative). Right lower quadrant, likelihood ratio positive < 10, likelihood ratio
negative > 0.1, no exclusion or confirmation, suggesting that the test is neither useful for confirmation of presence of disease (when pos-
itive) nor for its exclusion (when negative) (44).
This concept is depicted visually with Fagan nomograms (43). plot is a graphical sensitivity analysis of predictive value across
When Bayes theorem is expressed in terms of log-odds, the a prevalence continuum defining low- to high-risk popula-
posterior log-odds are linear functions of the prior log-odds tions (Fig 9). It depicts separate curves for positive and negative
and the log-likelihood ratios. A Fagan plot, as shown in tests. The user draws a vertical line from the selected pretest
Figure 7, consists of a vertical axis on the left with the prior probability to the appropriate likelihood ratio line and then
log-odds, an axis in the middle representing the log-likelihood reads the posttest probability off the vertical scale. General
ratio, and a vertical axis on the right representing the poste- summary statistics have also been introduced when it may be
rior log-odds. Lines are then drawn from the prior probability of interest to evaluate the effect of p on predictive values: un-
on the left through the likelihood ratios in the center and ex- conditional positive- and negative-predictive values, which
tended to the posterior probabilities on the right (Fig 7). permit prevalence heterogeneity (45). These measures are ob-
tained by integrating their corresponding conditional (on p)
Likelihood Ratio Scatter Graph versions with respect to a prior distribution for p. The prior
Informativeness may also be represented graphically by a like- posits assumptions about the risk level in a hypothetical pop-
lihood ratio scatter graph or matrix (44). It defines quadrants ulation of interest, for example, low, high, moderate risk, as
of informativeness based on established evidence-based thresh- well as the heterogeneity in the population. Figure 8 plots
olds. The likelihood ratio scatter graph shows summary point the relationship between pre- and posttest probability based
of likelihood ratios obtained as functions of mean sensitivity on the likelihood of a positive (above diagonal line) or neg-
and specificity (Fig 8) (5). ative (below diagonal line) test result over the 0–1 range of
Predictive Values and Probability-modifying Plot pretest probabilities.
The conditional probability of disease given a positive (or neg-
ative) test, the so-called positive (or negative)-predictive values,
CONCLUSION
is critically important to clinical application of a diagnostic
procedure. It depends not only on sensitivity and specificity The information from systematic reviews and meta-analyses
but also on disease prevalence (p). The probability-modifying is important to clinicians, health policy makers, researchers
586
Figure 9. Probability-modifying plot with

posttest probabilities for hypothetical popu-
lations with different prevalence of disease
according to Bayes theorem. Dashed line
indicates a positive test result, dashed line
with dots indicate a negative result. Posttest
probability for a positive result is derived
by drawing a vertical line up to the dashed
curved line and then across to the y-axis.
Posttest probability for a negative result is
derived by drawing a vertical line up to the
dashed line with dots curved line and then
across to the y-axis. The probability-
modifying plot is a graphical sensitivity
analysis of predictive value across a prev-
alence continuum defining low- to high-
risk populations. It depicts separate curves
for positive and negative tests. The user
draws a vertical line from the selected
pretest probability to the appropriate like-
lihood ratio line and then reads the posttest
probability off the vertical scale.
and developers of diagnostic techniques, patients, and the general 7. Lijmer JG, Mol BW, Heisterkamp S, et al. Empirical evidence of design-
related bias in studies of diagnostic tests. JAMA 1999; 282:1061–1066.
public. In this review, the procedural and analytic methods 8. McGrath TA, McInnes MDF, Langer FW, et al. Treatment of multiple test
for conducting systematic reviews of diagnostic imaging studies readers in diagnostic accuracy systematic reviews-meta-analyses of
have been discussed. A guide to constructing the research ques- imaging studies. Eur J Radiol 2017; 93:59–64.
9. Buscemi N, Hartling L, Vandermeer B, et al. Single data extraction gen-
tion, literature search strategies, and study selection is provided. erated more errors than double data extraction in systematic reviews. J
Current recommendations for the evidence appraisal process Clin Epidemiol 2006; 59:697–703.
and methodological quality assessment and recommenda- 10. Jones AP, Remmington T, Williamson PR, et al. High prevalence but low
impact of data extraction and reporting errors were found in Cochrane
tions on the use of study quality in quantitative synthesis are systematic reviews. J Clin Epidemiol 2005; 58:741–742.
also discussed. Properties and limitations of the convention- 11. Gotzsche PC, Hrobjartsson A, Maric K, et al. Data extraction errors in
al meta-analytic technique of HSROC curves and mixed- meta-analyses that use standardized mean differences. JAMA 2007;
298:430–437.
effects models, simultaneously synthesized sensitivity and 12. Cook DJ, Sackett DL, Spitzer WO. Methodologic guidelines for system-
specificity pairs, are discussed and summarized. The paper ad- atic reviews of randomized control trials in health care from the Potsdam
dressed the use of meta-regression to investigate unobserved Consultation on meta-analysis. J Clin Epidemiol 1995; 48:167–171.
13. Whiting PF, Rutjes AW, Westwood ME, et al. QUADAS-2: a revised tool
heterogeneity and covariate effects. The graphical and statis- for the quality assessment of diagnostic accuracy studies. Ann Intern
tical elements of a clinically useful report are discussed. Med 2011; 155:529–536.
Challenges confront investigators undertaking these reviews. 14. Moses LE, Shapiro D, Littenberg B. Combining independent studies of
a diagnostic test into a summary ROC curve: data-analytic approaches
However, we encourage radiological investigators to become and some additional considerations. Stat Med 1993; 12:1293–1316.
familiar with these techniques and to collaborate with meth- 15. Midgette AS, Stukel TA, Littenberg B. A meta-analytic method for sum-
odologists who can enhance the design and conduct of marizing diagnostic test performances: receiver-operating-characteristic-
summary point estimates. Med Decis Making 1993; 13:253–257.
diagnostic imaging systematic reviews. 16. van Houwelingen HC, Arends LR, Stijnen T. Advanced methods in meta-
analysis: multivariate approach and meta-regression. Stat Med 2002;
21:589–624.
REFERENCES 17. Macaskill P. Empirical Bayes estimates generated in a hierarchical summary
ROC analysis agreed closely with those of a full Bayesian analysis. J Clin
1. Berman NG, Parker RA. Meta-analysis: neither quick nor easy. BMC Med Epidemiol 2004; 57:925–932.
Res Methodol 2002; 2:10. 18. Reitsma JB, Glas AS, Rutjes AW, et al. Bivariate analysis of sensitivity
2. Whiting P, Rutjes AW, Reitsma JB, et al. Sources of variation and bias and specificity produces informative summary measures in diagnostic
in studies of diagnostic accuracy: a systematic review. Ann Intern Med reviews. J Clin Epidemiol 2005; 58:982–990.
2004; 140:189–202. 19. Chu H, Cole SR. Bivariate meta-analysis of sensitivity and specificity with
3. Whiting P, Rutjes AW, Reitsma JB, et al. The development of QUADAS: sparse data: a generalized linear mixed model approach. J Clin Epidemiol
a tool for the quality assessment of studies of diagnostic accuracy in- 2006; 59:1331–1332, author reply 1332–1333.
cluded in systematic reviews. BMC Med Res Methodol 2003; 3:25. 20. Rutter CM, Gatsonis CA. A hierarchical regression approach to meta-
4. Whiting P, Rutjes AW, Dinnes J, et al. Development and validation of analysis of diagnostic test accuracy evaluations. Stat Med 2001; 20:2865–
methods for assessing the quality of diagnostic accuracy studies. Health 2884.
Technol Assess 2004; 8:iii, 1–234. 21. Arends LR, Hamza TH, van Houwelingen JC, et al. Bivariate random effects
5. Leeflang MM, Deeks JJ, Gatsonis C, et al. Systematic reviews of diag- meta-analysis of ROC curves. Med Decis Making 2008; 28:621–638.
nostic test accuracy. Ann Intern Med 2008; 149:889–897. 22. Harbord RM, Deeks JJ, Egger M, et al. A unification of models for
6. Simes RJ. Publication bias: the case for an international registry of clin- meta-analysis of diagnostic accuracy studies. Biostatistics 2007; 8:239–
ical trials. J Clin Oncol 1986; 4:1529–1541. 251.
587
23. McGrath TA, McInnes MD, Korevaar DA, et al. Meta-analyses of diag- 41. Deeks JJ, Macaskill P, Irwig L. The performance of tests of publication
nostic accuracy in imaging journals: analysis of pooling techniques and bias and other sample size effects in systematic reviews of diagnostic
their effect on summary estimates of diagnostic accuracy. Radiology 2016; test accuracy was assessed. J Clin Epidemiol 2005; 58:882–893.
281:78–85. 42. Peters JL, Sutton AJ, Jones DR, et al. Comparison of two methods to
24. Mulrow CD. Rationale for systematic reviews. BMJ 1994; 309:597– detect publication bias in meta-analysis. JAMA 2006; 295:676–680.
599. 43. Fagan TJ. Letter: nomogram for Bayes theorem. N Engl J Med 1975;
25. Lau J, Ioannidis JP, Terrin N, et al. The case of the misleading funnel 293:257.
plot. BMJ 2006; 333:597–600. 44. Stengel D, Bauwens K, Sehouli J, et al. A likelihood ratio approach to
26. Lewis S, Clarke M. Forest plots: trying to see the wood and the trees. meta-analysis of diagnostic studies. J Med Screen 2003; 10:47–51.
BMJ 2001; 322:1479–1480. 45. Li J, Fine JP, Safdar N. Prevalence-dependent diagnostic accuracy mea-
27. Hanley JA. Receiver operating characteristic (ROC) methodology: the state sures. Stat Med 2007; 26:3258–3273.
of the art. Crit Rev Diagn Imaging 1989; 29:307–335. 46. Dwamena B. MIDAS: Meta-analytical integration of diagnostic accura-
28. Ioannidis JP, Patsopoulos NA, Evangelou E. Uncertainty in heterogene- cy studies in Stata, West Coast Stata Users’ Group meetings. University
ity estimates in meta-analyses. BMJ 2007; 335:914–916. of Michigan, 2007. MIDAS Web site. Published August 15, 2007. Avail-
29. Higgins JP, Thompson SG, Deeks JJ, et al. Measuring inconsistency in able at: http://sitemaker.umich.edu/metadiagnosis/midas_home.
meta-analyses. BMJ 2003; 327:557–560. 47. Dwamena B. MIDAS: Meta-analytical integration of diagnostic accura-
30. Higgins JP, Thompson SG. Quantifying heterogeneity in a meta- cy studies in Stata, North American Stata Users’ Group meetings.
analysis. Stat Med 2002; 21:1539–1558. University of Michigan, 2007. MIDAS Web site. Published August 15, 2007.
31. Zhou Y, Dendukuri N. Statistics for quantifying heterogeneity in univari- Available at: http://sitemaker.umich.edu/metadiagnosis/midas_home.
ate and bivariate meta-analyses of binary data: the case of meta- 48. Dwamena BA. MIDAS: Stata module for meta-analytical integration of
analyses of diagnostic accuracy. Stat Med 2014; 33:2701–2717. diagnostic test accuracy studies. Boston, MA: Boston College Depart-
32. McInnes MD, Hibbert RM, Inacio JR, et al. Focal nodular hyperplasia and ment of Economics, 2008. Available at: http://ideas.repec.org/c/boc/
hepatocellular adenoma: accuracy of gadoxetic acid–enhanced MR bocode/s456880.html.
imaging—a systematic review. Radiology 2015; 277:413–423. 49. Van Houwelingen HC, Zwinderman KH, Stijnen T. A bivariate approach
33. Dinnes J, Deeks J, Kirby J, et al. A methodological review of how het- to meta-analysis. Stat Med 1993; 12:2273–2284.
erogeneity has been examined in systematic reviews of diagnostic test 50. Riley RD, Abrams KR, Lambert PC, et al. An evaluation of bivariate random-
accuracy. Health Technol Assess 2005; 9:1–113, iii. effects meta-analysis for the joint synthesis of two correlated outcomes.
34. Lee J, Kim KW, Choi SH, et al. Systematic review and meta-analysis of Stat Med 2007; 26:78–97.
studies evaluating diagnostic test accuracy: a practical review for clin- 51. Riley RD, Abrams KR, Sutton AJ, et al. Bivariate random-effects meta-
ical researchers—part II. Statistical methods of meta-analysis. Korean analysis and the estimation of between-study correlation. BMC Med Res
J Radiol 2015; 16:1188–1196. Methodol 2007; 7:3.
35. Lijmer JG, Bossuyt PM, Heisterkamp SH. Exploring sources of hetero- 52. Rabe-Hesketh S. GLLAMM manual. University of California-Berkeley, Di-
geneity in systematic reviews of diagnostic tests. Stat Med 2002; 21:1525– vision of Biostatistics, Working Paper Series Paper No. 160. 2004.
1537. 53. Rabe-Hesketh S, Skrondal A, Pickles A. Reliable estimation of general-
36. Egger M, Davey Smith G, Schneider M, et al. Bias in meta-analysis de- ized linear mixed models using adaptive quadrature. Stata J 2002; 2:1–
tected by a simple, graphical test. BMJ 1997; 315:629–634. 21.
37. Sterne JA, Egger M. Funnel plots for detecting bias in meta-analysis: 54. Littenberg B, Moses LE. Estimating diagnostic accuracy from multiple
guidelines on choice of axis. J Clin Epidemiol 2001; 54:1046–1055. conflicting reports: a new meta-analytic method. Med Decis Making 1993;
38. Egger M, Smith GD. Misleading meta-analysis. BMJ 1995; 311:753– 13:313–321.
754. 55. Harbord R, Whitting P, Sterne J. metandi: Stata module for statistically
39. Sterne JA, Egger M, Smith GD. Systematic reviews in health care: in- rigorous meta-analysis of diagnostic accuracy studies. In: Methods for
vestigating and dealing with publication and other biases in meta- evaluating medical tests. Birmingham, UK: Department of Public Health,
analysis. BMJ 2001; 323:101–105. Epidemiology and Biostatistics, University of Birmingham, 2008. 1st Sym-
40. Terrin N, Schmid CH, Lau J. In an empirical evaluation of the funnel plot, posium; July 24–25, 23.
researchers could not visually identify publication bias. J Clin Epidemiol 56. Zamora J, Abraira V, Muriel A, et al. Meta-DiSc: a software for meta-
2005; 58:894–901. analysis of test accuracy data. BMC Med Res Methodol 2006; 6:31.
588
APPENDIX Likelihood-based estimation of the exact binomial ap-

proach may be performed by adaptive Gaussian quadrature
SOFTWARE FOR DIAGNOSTIC ACCURACY using Stata-native xtmelogit command (Stata release 10) or
META-ANALYSIS gllamm, user-written command, both with readily available
post-estimation procedures for model diagnostics and empir-
We review some of the dedicated software for diagnostic test
ical Bayes predictions (52,53). Additionally, midas facilitates
accuracy (imaging) meta-analysis.
exploratory analysis of heterogeneity (unobserved, threshold-
related, and covariate), publication, and other precision-
midas related biases. Bayes nomograms, likelihood ratio matrices, and
probability-modifying plots may be derived and used to guide
Implementation of graphical and statistical methods de- patient-based diagnostic decision-making.
scribed in this article was by means of midas, a comprehensive
program of statistical and graphical routines for undertaking
meta-analysis of diagnostic test performance in Stata (46–48). RevMan
The index and reference tests (gold standard) are dichoto-
mous. Primary data synthesis is performed within the bivariate RevMan (or Review Manager) facilitates preparation of pro-
mixed-effects regression framework focused on making in- tocols and full reviews; the literature review includes text,
ferences about average sensitivity and specificity. The bivariate characteristics of studies, comparison tables, and study data.
approach was originally developed for treatment trial meta- It can perform meta-analysis of the data entered, and present
analysis and modified for synthesis of diagnostic test data using the results graphically. In addition to reviews of studies of the
an approximate normal within-study model (16,18,21,49–51). effects of health-care interventions, it can also be used in reviews
An exact binomial rendition of the bivariate model assumes of diagnostic test accuracy studies. Both fixed model and
independent binomial distributions for the true positives and random model are included in the RevMan. For random
true negatives conditional on the sensitivity and specificity in models, DerSimonian and Laird random-effects models are
each study (19,21,51). used. RevMan can be easily used by the medical researchers
TABLE A1. Examples of PICOS (Patient, Intervention, Comparator, Outcome, and Study Design) or in the Cochrane Guidelines
for Diagnostic Accuracy Tests as PICTS (Patient, Index Test, Comparator test, Target Disorder and Study Design) Statements
(PICOS)—Patient,
Population, Problem Intervention Comparator Outcome Study design
(PICTS)—Patient,
Population, Problem Index test Comparator test Target disorder Study design
Symptomatic carotid Computed tomographic Magnetic resonance Sensitivity, specificity, and
stenosis angiography (CTA) angiography (MRA) diagnostic accuracy
Detection and quantification of
carotid stenosis
Known or suspected CT coronary angiography Invasive catheter Sensitivity, specificity, and
coronary artery coronary angiography diagnostic accuracy
disease Identifying one (or more) potentially
or probably hemodynamically
significant (≥50% coronary artery
luminal diameter) stenosis
A solitary pulmonary Dynamic contrast material– Histology Sensitivity, specificity, and
nodule enhanced CT diagnostic accuracy
Dynamic contrast material– Diagnosis of malignancy
enhanced MRI
FDG PET
99m
Tc-depreotide SPECT
Known or suspected Ultrasound MRI Sensitivity, specificity, and
rotator cuff tears diagnostic accuracy
Low-dose CT colonography Optical colonoscopy Sensitivity, specificity, and
(CTC) (OC) diagnostic accuracy
Clinically meaningful colonic
polyps
CT, computed tomography; MRI, magnetic resonance imaging.
589
TABLE A2. Examples of Search Sources
Computerized bibliographic databases (examples)

PubMed—www.ncbi.nlm.nih.gov/pubmed/
MEDLINE—www.medline.com
EMBASE https://embase.elsevier.com/
Health Technology Assessment (HTA)—www.york.ac.uk/inst/crd/crddatabases.htm#HTA
Database of Abstracts of Reviews of Effects (DARE)—www.york.ac.uk/inst/crd/crddatabases.htm#DARE
Turning Research into Practice (TRIP)—www.tripdatabase.com/Aboutus/Publications/index.html?catid=11
TRIP for guidelines see www.tripdatabase.com/Aboutus/Publications/index.html?catid=4
Aggressive Research Intelligence Facility (ARIF) www.arif.bham.ac.uk/
Cochrane Central Register of Controlled Trials (CENTRAL)—http://www.cochranelibrary.com/about/central-landing-page.html
Search Medica—www.searchmedica.com
Google Scholar—www.scholar.google.com
Google search engine—www.google.com
Yahoo search engine—www.search.yahoo.com
Science Citation Index—scientific.thomson.com/products/sci/
Web of Science—scientific.thomson.com/products/wos/
Web of Knowledge—isiwebofknowledge.com/
Scopus—info.scopus.com/overview/what/
Gale Directory of Online Portable and Internet Databases—http://library.dialog.com/bluesheets/pdf/bl0230.pdf
Continental and regional and national databases
Subject-specific databases
Full-text journals available electronically (examples)
Public Library of Science (PLoS)—www.plos.org/journals/
PubMed Central—www.pubmedcentral.nih.gov/
BiomedCentral—www.biomedcentral.com
Free Medical Journals—freemedicaljournals.com/
HighWire Press—highwire.stanford.edu/lists/freeart.dtl
Journal reference lists
Ancestor and descendent search
Always examine the references of articles which have been decided to be included in meta-analysis to see if they contain any
relevant studies of which the researcher is unaware.
Conference abstracts or proceedings (examples)
Biological Abstracts/RRM (Reports, Reviews, Meetings)—scientific.thomsonreuters.com/products/barrm/
BMC Meeting Abstracts (free)—www.biomedcentral.com
Conference Papers Index—www.csa.com/factsheets/cpi-set-c.php
Programs from professional \meetings
Research registers
Dissertations and theses databases (examples)
ProQuest Dissertations & Theses Database: indexes more than 2 million doctoral dissertations and masters' theses and includes
US dissertations since 1861 and British dissertations since 1988—www.proquest.co.uk/products_pq/descriptions/pqdt.shtml
Letters to active researchers
Personal contact and peer consultation
Gray literature databases
Other reviews, (evidence-based) guidelines and sources of studies (examples)
National Guideline Clearinghouse (US)—www.guideline.gov/
Canadian Medical Association—Infobase: Clinical Practice Guidelines—www.cma.ca/index.cfm/ci_id/54316/1a_id/1.htm
National Library of Guidelines (UK)—www.library.nhs.uk
NICE Clinical Guidelines (UK)—www.nice.org.uk/aboutnice/whatwedo/aboutclinicalguidelines/about_clinical_guidelines.jsp
Australian National Health and Medical Research Council: Clinical Practice Guidelines—
www.nhmrc.gov.au/publications/subjects/clinical.htm
New Zealand Guidelines Group—www.nzgg.org.nz
Citation alerts
Handsearching
Web searching
Unpublished and ongoing studies
590
TABLE A3. The Commonly Used Summary Statistics for Test Accuracy Including a 2 × 2 Contingency Table with Sensitivity and
Specificity Positive- and Negative-predictive Values and Accuracy Calculated
Disease
True False
Test outcome Positive True positive False positive → Positive predictive value = TP/(TP + FP)
Negative False negative True negative → Negative predictive value = TN/(TN + FN)
↓ ↓ → Accuracy = (TP +TN)/(TP + FP + FN + TN)
Sensitivity = Specificity = → Prevalence = (TP + FN)/(TP + FP + FN + TN)
True positive rate = True-negative rate =
True positive fraction = True-negative fraction =
Detection rate = TP/(TP + FN) TN/(FP + TN)
FN, false-negative; FP, false positive; TN, true negative; TP, true positive.
Sensitivity = TP/(TP + FN).
Specificity = TN/(TN + FP).
Positive-predictive value = TP/(TP + FP).
Negative-predictive value = TN/(TN + FN).
Accuracy = TP + TN/(TP + FP + FN + TN).
who are nonstatisticians. For statisticians, RevMan is an easy scores, and pretest or posttest probabilities, among other
tool to perform the meta-analyses and generate the graphs such analyses.
as forest plots and funnel plots. RevMan 5 is the software used
for preparing and maintaining Cochrane Reviews. Thou-
metandi
sands of systematic reviews and meta-analyses published on
the Cochrane Library are performed using RevMan. RevMan metandi is a user-written Stata software for meta-analysis of
can be downloaded at http://community.cochrane.org/ diagnostic test accuracy studies, in which both the index test
tools/review-production-tools/revman-5/revman-5-download. under study and the reference test (gold standard) are di-
chotomous (55). It takes as input tp fp fn tn (the number of
dr-ROC true positives, false positives, false negatives, and true nega-
tives) within each study. It fits a two-level mixed logistic
dr-ROC is a highly specialized Microsoft Excel workbook regression model, with independent binomial distributions for
file for meta-analysis of diagnostic tests available commer- the true positives and true negatives conditional on the sen-
cially. The standard version of the software is limited to 25 sitivity and specificity in each study, and a bivariate normal
or fewer studies. A version that handles up to 100 studies is model for the logit transforms of sensitivity and specificity
available, free of charge, to licensees who contact the pub- between studies (19). Estimates are displayed for the param-
lisher. Key strengths of dr-ROC include an easy-to-use eters of both the bivariate model and the hierarchical summary
complete, self-contained solution for meta-analysis, and au- (18) receiver operating characteristic (HSROC) model (20).
tomatic generation of publication quality graphics. The statistical In Stata 10, metandi fits the model using the built-in command
methodology of dr-ROC is based on the SROC approach xtmelogit by default. In Stata 8 or 9, it makes use of the user-
to meta-analysis (14,54). Data analysis options are set using written command gllamm. metandi does not allow covariates
simple pull-down menus on the same page as data entry. Graph to be fitted, that is, meta-regression of diagnostic accuracy is
options are set with check boxes right next to the graphs. An- not supported. metandiplot graphs the results from metandi
alytic options include fixed-effects (Mantel-Haenszel) and on an SROC plot. By default, the display includes a summary
random-effects (DerSimonian-Laird) meta-analysis of DORs, point showing the summary sensitivity and specificity, a con-
pooled sensitivity and specificity with CIs, and Spearman rank fidence contour outlining the confidence region for the
correlation and Pearson product-moment correlations for sen- summary point, one or more prediction contours outlining
sitivity vs specificity, along with their statistical significance. the prediction region for the true sensitivity and specificity
The coefficient of determination, r2, measures the propor- in a future study, and the HSROC curve from the hierar-
tion of variation in specificity that would be accounted for chical summary ROC (HSROC) model. If the optional
by differences in sensitivity. Graphical options include forest variables tp fp fn tn are included on the command line, the
plots of study-by-study sensitivity and specificity, ROC plot plot also includes study estimates, indicating the sensitivity and
comparing SROC, random-effects, and fixed-effects meta- specificity estimated using the data from each study separate-
analysis results and showing random-effects or fixed-effects ly. If the model was fitted using gllamm, post-estimation
results on SROC and logit plot. dr-ROC does heterogene- predictions are obtained using gllapred. If the model was
ity analysis, statistical analysis of user-supplied study quality fitted using xtmelogit, the predictions are obtained using
591
predict-see [XT] xtmelogit postestimation-predict. Module evaluation, while taking into account the possibly imperfect
is available at http://ideas.repec.org/c/boc/bocode/ sensitivity and specificity of the reference test. This hierar-
s456932.html. chical model accounts for both within- and between-study
variability. Estimation is carried out using a Bayesian ap-
Metadas
proach, implemented via a Gibbs sampler. The model can be
applied in situations where more than one reference test is
Metadas, a SAS macro, developed as a wrapper for Proc used in the selected studies. It is available at http://cran.r
NLMIXED for implementation of hierarchical or multilev- -project.org/web/packages/HSROC/index.html.
el methods for the meta-analysis of diagnostic accuracy studies.
Metadas reduces the problem of selecting starting values for Meta-DiSc
model parameters in Proc NLMIXED. The macro can run
any number of tests consecutively and has several options, which Meta-DiSc is a Windows-based, user-friendly, freely avail-
include model choice (hierarchical summary receiver oper- able, well-validated (for academic use) software to performing
ating characteristic or bivariate model), predictions based on dedicated diagnostic meta-analysis (56). Zamora et al. de-
the empirical Bayes estimates, covariate inclusion, likeli- scribed Meta-DiSc as (1) performing independent statistical
hood ratio tests, and model checking. The output of the analysis pooling of sensitivities, specificities, likelihood ratios, and DORs
is summarized in a Word document with all parameter esti- using fixed- and random-effects models, both overall and in
mates in a format suitable for input into the Cochrane subgroups; (2) allowing exploration of heterogeneity, with a
Collaboration software, RevMan 5, to produce SROC plots. variety of statistics including chi-square, I-squared, and Spear-
In addition, estimates of summary measures of test accuracy man correlation tests; (3) implementing meta-regression
such as the expected sensitivity, specificity, likelihood ratios, techniques to explore the relationships between study char-
and DORs are produced, as well as relative measures when acteristics and accuracy estimates based on; and (4) producing
there is a covariate in the model. Metadas is a versatile program high-quality figures, including forest plots and linear regression–
that renders meta-analysis of diagnostic accuracy studies in SAS based summary receiver operating characteristic curves that
more accessible. It has no graphical capability in terms of can be exported for use in manuscripts for publication (56).
SROC plots but provides more flexibility in model fitting All computational algorithms have been validated through com-
and result output. It is available from the authors at parison with different statistical tools and published meta-
y.takwoingi@bham.ac.uk. analyses (56). Meta-DiSc has a Graphical User Interface with
roll-down menus, dialog boxes, and online help facilities. The
mada software is publicly available at http://www.hrc.es/
investigacion/metadisc-en.htm. Although Meta-DiSc has already
The specialized software required and technical difficulty as- been used and cited in several meta-analyses published in high-
sociated with using hierarchic models may be a barrier to their ranking journals, there is a note of caution. Meta-DiSc has
use. With the release of the freeware package “mada” in R no capacity to perform hierarchic methods (23). As stated pre-
in 2012, this readily available software with a clear and concise viously, we reiterate the importance that the recommended
user guide should considerably reduce the technical and eco- HSROC or bivariate methods for pooling in meta-analyses
nomic barriers to hierarchic methods of data pooling for research of diagnostic accuracy studies be used rather than traditional
groups. The open-source R-package mada provides some es- univariate methods. McGrath et al. showed in 120 reviews
tablished and some current approaches to diagnostic meta- in which traditional univariate pooling methods for meta-
analysis, as well as functions to produce descriptive statistics analysis were used. The authors performed their analyses with
and graphics. It is assumed that the reader is familiar with central Meta-DiSc. This represented nearly two-thirds of reviews in
concepts of meta-analysis, such as fixed- and random-effects this category (23). Unfortunately, Meta-DiSc remains avail-
models and ideas behind diagnostic accuracy meta-analysis and able online and continues to be touted as a tool for use in
(S)ROC. Once R is installed and an Internet connection is reviews of diagnostic test accuracy (23).
available, the package can be installed from CRAN on most
systems by typing >install.packages(“mada”).
SENSITIVITY AND SPECIFICITY
Development of mada is hosted at http://r-forge.r-project
.org/projects/mada/. Sensitivity and specificity are measures defined conditional on
disease status as they are computed as proportions of the number
HSROC diseased and the number nondiseased, respectively. The sen-
sitivity of a test is defined as the probability that the index
The open-source R-package for joint meta-analysis of diag- test result will be positive in a diseased case. Sensitivity is also
nostic test sensitivity and specificity with or without a gold referred to as detection rate (DR), true-positive rate (TPR),
standard reference is authored by Ian Schiller and Nandini or true-positive fraction (TPF) (see Table A3). The specific-
Dendukuri. This package implements a model for joint meta- ity of a test is defined as the probability that the index test
analysis of sensitivity and specificity of the diagnostic test under result will be negative in a nondiseased case. Specificity is also
592
referred to as the true-negative rate or true-negative frac- LIKELIHOOD RATIOS

tion (see Table A3). The terms false-positive rate (FPR) and
Likelihood ratios can be used to update the pretest proba-
false-positive fraction (FPF) are used for the complement of
bility of disease using Bayes theorem, once the test result is
specificity. Both sensitivity and specificity are expressed either
known. The updated probability is referred to as the posttest
as a proportion or a percentage.
probability. For a test that is informative, the posttest prob-
ability should be higher than the pretest probability if the test
PREDICTIVE VALUES result is positive, and the posttest probability should be lower
than the pretest probability if the test result is negative. The
Predictive values are measures defined conditional on the index
positive likelihood ratio describes how many times more likely
test results as they are computed as proportions of the total
positive index test results are in the group with disease com-
with positive and with negative index test results. The positive-
pared to the nondiseased group. The negative likelihood ratio
predictive value of a test is defined as the probability that a
describes how many times less likely negative index test results
person with a positive index test result has disease (see
are in the diseased group compared to the nondiseased group.
Table A3). The negative-predictive value of a test is defined
as the probability that a person with a negative index test result
does not have disease (see Table A3). Both positive- and
DIAGNOSTIC ODDS RATIOS
negative-predictive values are reported either as a propor-
tion or a percentage. The DOR summarizes the diagnostic accuracy of the index
test as a single number that describes how many times higher
the odds are of obtaining a test positive result in a person with
ACCURACY
the disease compared to a person without the diseased. It sum-
Accuracy is also used as a statistical measure of how well a marizes test accuracy in a single number, making it easy to
binary classification test correctly identifies or excludes a con- use this measure for meta-analysis. However, expressing ac-
dition. Accuracy is the proportion of true results (both curacy in terms of ratios of odds has little direct clinical
true positives and true negatives) in the population (see relevance. Therefore, it is rarely used as a summary statistic
Table A3). in primary studies.
593

How To Perform A Systematic Review and Meta-Analysis

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

How To Perform A Systematic Review and Meta-Analysis

Uploaded by

Copyright:

Available Formats

Radiology Health Services Research

How to Perform a Systematic

INTRODUCTION information from a comprehensive study of the literature with

In the perfect diagnostic imaging study, all patients receive

Figure 1. Example of a flowchart of a lit-

The evidence should be qualitatively summarized, that is, a

Hierarchical and Bivariate Models

Figure 4. SROC curve with confidence

Figure 5. Assessing publication bias.

Figure 6. Forest plot of results of multi-

Step 10. Assess the Robustness of Estimates of

Comparing Index Tests Analysis With Small Numbers of Studies

Figure 9. Probability-modifying plot with

APPENDIX Likelihood-based estimation of the exact binomial ap-

CT, computed tomography; MRI, magnetic resonance imaging.

TABLE A2. Examples of Search Sources

Computerized bibliographic databases (examples)

referred to as the true-negative rate or true-negative frac- LIKELIHOOD RATIOS

You might also like