Professional Documents
Culture Documents
1
Office of Education Research, Mayo Medical School, Rochester, Correspondence: David A Cook MD, MHPE, Division of General Internal
Minnesota, USA Medicine, Mayo Clinic College of Medicine, 200 First Street SW,
2
Division of General Internal Medicine, Mayo Clinic, Rochester, Rochester, Minnesota 55905, USA. Tel: 00 1 507 266 4156;
Minnesota, USA Fax: 00 1 507 284 5370; E-mail: cook.david33@mayo.edu
3
Division of Biomedical Statistics and Informatics, Mayo Clinic,
Rochester, Minnesota, USA
ª Blackwell Publishing Ltd 2012. MEDICAL EDUCATION 2012; 46: 943–952 943
D A Cook & C P West
944 ª Blackwell Publishing Ltd 2012. MEDICAL EDUCATION 2012; 46: 943–952
Conducting systematic reviews
ª Blackwell Publishing Ltd 2012. MEDICAL EDUCATION 2012; 46: 943–952 945
D A Cook & C P West
946 ª Blackwell Publishing Ltd 2012. MEDICAL EDUCATION 2012; 46: 943–952
Conducting systematic reviews
including specific search terms for each indexing Web-based learning it made conceptual sense to
database, the databases and other sources searched, begin the search at a date subsequent to the
search dates and all search results should be carefully development of the World Wide Web.13 By contrast,
archived for subsequent reporting. there is rarely a good conceptual reason to limit the
search to English-language publications only, be-
Using the initial search terms as a starting point, cause excellent research is often published in other
the research librarian and the first author of the languages.6 The inclusion of grey literature is more
review of simulation-based education6 worked in controversial; some correctly argue that non-peer-
collaboration to identify a comprehensive search reviewed research may be of inferior quality, but
strategy. The search sensitivity was evaluated by others correctly argue that such studies can still,
comparing the articles identified against those when properly analysed, contribute importantly to
already known to the authors and those cited in evidence-based decisions.
previous seminal reviews. If an article was missed,
the title and abstract were carefully reviewed to Defining inclusion and exclusion criteria
identify terms that would improve the sensitivity.
The search was repeated, with adaptations as Regardless of the actual criteria selected, it is impor-
needed, in MEDLINE, EMBASE, CINAHL, tant to clearly define these criteria both conceptually
PsycINFO, ERIC, Web of Science and Scopus. In (often by using a formal definition from a dictionary,
addition, all references cited in several seminal theory or previous review) and operationally (by using
reviews, and the entire table of contents of two detailed explanations and elaborations that help
key journals, were added to the list of articles. reviewers recognise the key concepts as reported by
Finally, the reference lists of randomly selected authors in published articles). Although some oper-
articles were hand-searched to identify additional ational definitions will be defined from the outset,
articles; this continued until no additional arti- many of these may actually emerge during the process
cles were identified. The unabridged search of the review as reviewers come across articles of
strategy was published as an online appendix. uncertain inclusion status. Such cases should be
discussed by the group with the goal not only of
Decide on the inclusion or exclusion of each deciding on the inclusion or exclusion of that article,
identified study but also of defining a rule that will determine the
triage of similar articles in the future. Such decisions,
After identifying a pool of articles, reviewers include along with brief examples of what should and should
or exclude articles based on predefined inclusion and not be included, can be catalogued in an explanatory
exclusion criteria. These criteria typically emerge document. Although the conceptual definitions
naturally from the focused question and, again, the should remain unchanged, the explanatory document
PICO framework can often be used to help to define and the operational definitions it contains often
the population (e.g. ‘medical students’), intervention continue to evolve throughout the review process.
(e.g. ‘problem-based learning’), comparison (e.g. ‘no
intervention’) and outcome (e.g. ‘any learning out- Involving the entire reviewer group in the develop-
come’). Study design (e.g. ‘any comparative design’ ment of the conceptual and operational definitions
or ‘only randomised trials’) may also be considered, not only improves the likelihood that others will
although a more inclusive approach can allow agree with the decisions made, but ensures that
subsequent evaluation of whether results differ everyone will apply the criteria using this shared
depending on study design. understanding. Yet even after the group development
process, it remains essential to pilot-test the inclu-
Additional restrictions may be placed on included sion ⁄ exclusion form and process on a small subset of
work. Reviewers will occasionally exclude articles articles. After each round of pilot-testing, all reviewers
based on language (e.g. by excluding non-English compare their decisions and use points of discrep-
publications), publication date (e.g. by excluding ancy to refine the operational definitions and to
articles older than 20 years), length (e.g. by exclud- recalibrate their own standards.
ing abstracts) and rigor of peer review (e.g. by
excluding graduate theses, papers presented at The inclusion and exclusion process
meetings, and other unpublished works which are
collectively termed ‘grey literature’). Decisions about As with nearly all phases of the review process,
restrictions are best made on conceptual grounds inclusion and exclusion should involve at least two
rather than convenience. For example, in a review of reviewers. Duplicate, independent review minimises
ª Blackwell Publishing Ltd 2012. MEDICAL EDUCATION 2012; 46: 943–952 947
D A Cook & C P West
random error and helps to avoid idiosyncrasies that Defining the data abstraction elements
would bias the review.
What information should be collected? The PICO
The inclusion ⁄ exclusion process typically has two framework can again provide guidance in planning
stages. In stage 1, reviewers look only at the title, which data to collect, including the key features of
abstract and – if available – the keywords. During this participants (number and key demographics), inter-
stage, if both reviewers are confident based on the title ventions (key elements of design, intensity, timing,
and abstract that the article is ineligible, it is excluded. duration and implementation), comparisons (similar
If there is any doubt, such as in a case in which the to interventions) and outcomes. Information on out-
abstract contains insufficient information, the article comes should include details of both the measurement
advances to stage 2. Reviewers typically do not recon- method (e.g. outcome classification, assessor blinding,
cile disagreements at this stage. If either reviewer feels timing in relation to intervention, score validity) and
the paper should be included, it is duly advanced based the actual results (mean and standard deviation, event
on the rationale that resolving uncertainties is best rate, correlation coefficient, effect size, etc.).
done using the full text rather than the abstract alone.
Reviewers should also code information on study
During stage 2, reviewers read the full text of each design, which might include the number of groups,
article to make a final inclusion ⁄ exclusion decision. method of group assignment (randomised versus
Here, two independent reviews are required in all non-randomised), timing of assessments (e.g. post-
cases. Reviewers initially attempt to resolve the intervention versus pre- and post-intervention),
inevitable coding disagreements through discussion enrolment and follow-up rates, and other features of
and consensus, and appeal to another member of the study quality. These elements may vary for different
review team if needed. study designs, but a focus on threats to study validity14
is common among them. Many instruments for
In the review of simulation-based education,6 assessing study quality have been described, including
inclusion and exclusion criteria had been defined the Medical Education Research Study Quality
with the writing of the study protocol. Included Instrument (MERSQI)15 for education research, the
studies were required to have a comparison group, Jadad scale16 for randomised trials, the Newcastle–
but no other design restrictions were imposed Ottawa Scale13,17 for non-randomised studies, and the
(i.e. both randomised and non-randomised stud- Quality Assessment of Diagnostic Accuracy Studies
ies were eligible). The investigators applied these (QUADAS-2)18 for studies of assessment tools. All
criteria to each article identified in step 4. In stage have strengths and weaknesses, and none has been
1, one or two authors reviewed each title and universally accepted as a reference standard. More
abstract; two negative votes were required to important than a score on any particular instrument
exclude an article, whereas one positive vote would is the assessment of possible bias and validity threats
advance the article to stage 2 (i.e. err on the side of in each study in the systematic review.
inclusion). In stage 2, two investigators indepen-
dently reviewed the full text of each article and The data abstraction process
resolved all disagreements by consensus. Whereas
the original wording of the inclusion and exclu- A data abstraction form should be developed and
sion criteria remained unchanged, the opera- iteratively refined. As with the inclusion ⁄ exclusion
tional definitions of these criteria evolved over criteria, the elements of data to be abstracted must be
time. Over 30 articles were translated from non- defined both conceptually and operationally, and the
English languages including Chinese, Japanese, development of an explanatory document with
Korean, Spanish, French, German, Swedish and detailed definitions and examples is essential. In
Finnish. The authors kept a careful accounting of addition to the questions defined at the study outset,
the reason for each inclusion and exclusion, and new questions often emerge as the review team reads
summarised this in a trial flow figure. All included articles during the inclusion process. These questions
articles were listed in an online appendix. dictate the data to be abstracted.
948 ª Blackwell Publishing Ltd 2012. MEDICAL EDUCATION 2012; 46: 943–952
Conducting systematic reviews
cycles of this process may be carried out thereafter as quality was evaluated using two complementary
needed until a high degree of consistency is achieved. criteria. Investigators reviewed each article in
Data abstraction should ideally be carried out by two duplicate to abstract this information using an
independent reviewers. Coding disagreements must electronic tool designed for this purpose
be resolved, ideally by consensus and by appeal to a (DistillerSR.com) and resolved discrepant codes
third party if necessary. by consensus.
ª Blackwell Publishing Ltd 2012. MEDICAL EDUCATION 2012; 46: 943–952 949
D A Cook & C P West
statistically significant results (vote counting).22 organise and interpret the evidence, anticipating and
Rather, synthesis involves pooling and exploring the answering readers’ questions about this topic, while
results to provide a ‘bottom-line’ statement regarding simultaneously providing transparency that allows
what the evidence supports and what gaps remain in readers to verify the interpretations and arrive at their
our current understanding. This requires reviewers to own conclusions.
950 ª Blackwell Publishing Ltd 2012. MEDICAL EDUCATION 2012; 46: 943–952
Conducting systematic reviews
Reviewers must make a number of key decisions The authors of the review of simulation-based
regarding the analysis. Firstly, should they attempt a education6 used DistillerSR.com to track inclu-
statistical pooling of quantitative results (i.e. meta- sion ⁄ exclusion and for data abstraction. They
analysis)? If so, further decisions about this process used a Google Group to archive all e-mail
will refer to which statistical model to apply (e.g. a communications, Google Docs to keep an
fixed-effects or random-effects model) and how to ongoing list of articles in need of translation,
standardise outcomes across studies. Details on Doodle.com to schedule teleconferences,
meta-analysis are beyond the scope of this article; Google Translate for some simple transla-
Appendix S1 lists several helpful resources. tion needs, and EndNote to manage refer-
ences. They used SAS macros to perform
Secondly, how will reviewers explore heterogeneity meta-analysis.
(inconsistency) across studies? The most informative
aspect of many reviews is not the average result across
studies, but, rather, the exploration of why results REPORTING
differ from study to study. An explanation of between-
study inconsistency should be part of all systematic The key elements in reporting systematic reviews and
reviews. meta-analyses have been codified in guidelines such
as the QUOROM (quality of reporting of meta-
Finally, the authors should consider threats to analyses),24 MOOSE (meta-analysis of observational
the validity of their own review. By transparently studies in epidemiology)25 and, most recently, PRIS-
reporting their methods, acknowledging key MA (preferred reporting items for systematic reviews
assumptions, exploring potential sources of bias and and meta-analyses)26 statements. We encourage
providing tables containing detailed information on reviewers to adhere to these guidelines (http://
each study, the reviewers encourage readers to verify www.prisma-statement.org), but we will not repeat
and potentially reinterpret the information for these in detail. We provide some practical advice for
themselves. Indeed, the degree to which reviewers writing the manuscript itself in Table 3.
explore the strengths, weaknesses, heterogeneity and
gaps in the evidence determines in large part the The review6 team cited the PRISMA guidelines
value of the review. and adhered to these during the planning,
conduct and reporting of the review.
The authors of the review of simulation-based
education6 used meta-analysis to synthesise the
results. They used I2 statistics to quantify heter- CONCLUSIONS
ogeneity and subgroup analyses to explore this
heterogeneity. They also performed a narrative As the volume and quality of evidence in medical
synopsis of key study characteristics including education continue to expand, the need for evidence
trainees, clinical topics and study quality. In synthesis will grow. By following the seven key steps
subsequent manuscripts (in press) on focused outlined in this paper to complete a high-quality
topics they have used narrative synthesis methods systematic review, authors will more meaningfully
to identify and summarise salient themes. contribute to this knowledge base.
ª Blackwell Publishing Ltd 2012. MEDICAL EDUCATION 2012; 46: 943–952 951
D A Cook & C P West
952 ª Blackwell Publishing Ltd 2012. MEDICAL EDUCATION 2012; 46: 943–952