Professional Documents
Culture Documents
Michael T. Hirschmann
Olufemi R. Ayeni · Robert G. Marx
Jason L. Koh
Norimasa Nakamura
Editors
Basic Methods
Handbook for Clinical
Orthopaedic Research
A Practical Guide and Case Based
Research Approach
Basic Methods Handbook for Clinical
Orthopaedic Research
Volker Musahl • Jón Karlsson
Michael T. Hirschmann
Olufemi R. Ayeni • Robert G. Marx
Jason L. Koh • Norimasa Nakamura
Editors
Basic Methods
Handbook for Clinical
Orthopaedic Research
A Practical Guide and Case Based
Research Approach
Editors
Volker Musahl Jón Karlsson
UPMC Rooney Sports Complex Department of Orthopaedics
University of Pittsburgh Sahlgrenska Academy
Pittsburgh, PA Gothenburg University
USA Sahlgrenska University Hospital
Gothenburg
Michael T. Hirschmann Sweden
Department of Orthopaedic Surgery
and Traumatology Olufemi R. Ayeni
Kantonsspital Baselland McMaster University
(Bruderholz, Laufen und Liestal) Hamilton, ON
Bruderholz Canada
Switzerland
Jason L. Koh
Robert G. Marx Department of Orthopaedic Surgery
Hospital for Special Surgery NorthShore University HealthSystem
New York, NY Evanston, IL
USA USA
Norimasa Nakamura
Institute for Medical Science in Sports
Osaka Health Science University
Osaka
Japan
© ISAKOS 2019
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher,
whether the whole or part of the material is concerned, specifically the rights of translation,
reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any
other physical way, and transmission or information storage and retrieval, electronic adaptation,
computer software, or by similar or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are
exempt from the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in
this book are believed to be true and accurate at the date of publication. Neither the publisher nor
the authors or the editors give a warranty, express or implied, with respect to the material
contained herein or for any errors or omissions that may have been made. The publisher remains
neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer-Verlag GmbH, DE part
of Springer Nature.
The registered company address is: Heidelberger Platz 3, 14197 Berlin, Germany
Contents
v
vi Contents
Editors
xi
xii About the Editors
Associate Editors
© ISAKOS 2019 3
V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_1
4 E. Svantesson et al.
treatment, it was possible to assess the clinical vidual patient. Thus, it is about using scientific
effectiveness of such a treatment. He advocated evidence conscientiously and objectively for this
that it was required to monitor every patient who purpose. To ascertain this, it is required that a suf-
received treatment for a follow-up time long ficient amount of clinically relevant sources of
enough to establish if the treatment led to a satis- information is acquired, which may include pub-
factory outcome. If a treatment failure was deter- lished literature such as basic science research,
mined based on the records of outcome, he meant clinical trials, diagnostic testing, predictive fac-
that the reason for the failure should be investi- tors, the efficacy of therapeutic interventions, etc.
gated and proper actions should be undertaken to However, it could also include individual obser-
prevent similar future failures. This was a radical vations and expert opinions. The expertise of a
idea in the early twentieth century, and Dr. highly experienced clinician should also be
Codman’s work was encountered by such a incorporated into the concept of EBM. EBM is
strong criticism that he lost his staff position at about using the accumulated scientific and statis-
the Massachusetts General Hospital. Opposingly, tical knowledge derived by several of the afore-
we nowadays consider Dr. Codman’s approach a mentioned sources of information. Moreover, it
milestone for medicine based on empirical evi- involves using critical appraisal of such sources
dence. The term EBM, which has been regarded to evaluate: what does the strongest evidence sug-
as one of the most important paradigm shifts in gest in terms of decision-making in this clinical
medical history [27], was defined in 1991 by situation? Nonetheless, to solely rely on scien-
Gordon Guyatt as a component of the medical tific evidence without clinical expertise for clini-
residency program at McMaster University in cal decision-making would be a misconception
Hamilton, Ontario, Canada [4]. The concept of the EBM concept. An important component is
aimed to educate clinicians on how to perform also to incorporate patients’ perspectives and val-
and interpret scientific evidence in terms of ues, which is in line with providing the best care
assessment of credibility, critical appraisal of the for every individual patient. Establishing a good
results, and integration of scientific evidence in relationship between the medical professional
the everyday work [10]. This concept was subse- and the patient is important for this purpose, and
quently adapted worldwide and several guides, the practice of EBM should be characterized by
functioning as a foundation of the understanding the synergistic values between the brain and the
of the EBM concept, have been published [13, heart [21]. Thus, EBM is characterized by three
15, 26]. One contributing factor for the almost equally important fundamental principles: the
explosive entrance and integration of the EBM best available research, the clinical experience,
was the rapid evolvement of modern technology and the patient’s perspective [24].
during this time, which enabled the field of infor-
matics to grow tremendously with, for example,
large online databases and scientific journals. 1.3 The Best Available Evidence
Therefore, once the concept was established, the
requisites of implementing EMB in a variety of The large amount of available literature needs a
specialties enabled the emergence of a new era of systematic approach to evaluate and synthesize
a scientific approach to the practice of medicine. data. Before the term EBM was even established,
efforts to describe how to systematically examine
the scientific literature for “critical appraisal” and
1.2 What Defines the Practice for extraction of evidence were made by David
of EBM? Sackett of McMaster University [24]. The idea of
summarizing evidence was developed as a cor-
The practice of EBM entails the integration of the nerstone of EBM, and creating such summaries
current best evidence in the clinical decision- has been facilitated by the rapid development
making process in terms of the care of every indi- of venues for finding information, as well as
1 What Is Evidence-Based Medicine? 5
advanced degrees in librarian science to help studies, diagnostic studies, prevalence studies,
extract data efficiently. Today, finding the best and economic/decision analyses. Thus, each
possible evidence is closer than ever with various study type has its own system for determining the
software programs and databases that can gener- level of evidence, and subgroups within a certain
ate information almost instantly. There are sev- level have also been established (e.g., level 1a,
eral valuable sources that provide clinicians with 1b, etc.). Fact Box 1.1 summarizes the hierarchy
the best available evidence, including systematic of evidence for therapeutic studies.
reviews and evidence-based clinical guidelines.
Perhaps the most extensive database providing
in-depth systematic reviews on a numerous of
Fact Box 1.1: Levels of Evidence
topics is the Cochrane Database, which also com-
1 Meta-analyses
prises a list of randomized clinical trials in ortho- Systematic reviews
pedics and many other subspecialty areas. Randomized controlled trials
2 Cohort studies
3 Case-control studies
1.3.1 The Hierarchy of Evidence 4 Retrospective case series
5 Expert opinions
The hierarchy of evidence needs to be appreci-
ated to facilitate interpretation of the large
amount of literature on a topic. Moreover, a sys- The hierarchy of evidence is proposed to
tematic quality appraisal needs to be carried out reflect the applicability, reproducibility, and gen-
when searching for the best possible evidence eralizability of a study [9], and a study of high
because not all research is reliable research. The level of evidence should therefore show superi-
hierarchy of evidence has been established as lev- ority in terms of these factors. Grant et al. [11]
els of evidence, which mainly depends on the conducted a systematic review to evaluate the
quality of study design and the expected risk of level of evidence among published articles in
bias [5, 23, 28]. Although there are several ver- three major journals of sports medicine over the
sions of the hierarchy of evidence, the most fre- past 15 years. It was concluded that the percent-
quently used version is the one available from the age of level 1 and 2 studies had increased over
Oxford Centre for Evidence-Based Medicine time, and in 2010 nearly 25% of all studies were
website, www.cebm.net. In this version, the ran- level 1 or 2. Although level 4 and 5 studies had
domized controlled trials (RCTs) are regarded as decreased over time, these were still the most
the highest level of evidence (level 1) of individ- common level of evidence among sports medi-
ual studies, while expert opinions and uncon- cine literature (53% in 2010) [11]. Similarly,
trolled studies are considered as of the lowest another systematic review, specifically investi-
level (level 5). Controlled observational studies gating the level of evidence among literature
intake a position in the middle of these two, and related to anterior cruciate ligament reconstruc-
in turn, the levels are further depending on tion, concluded that a minority of published lit-
whether a prospective or retrospective approach erature between 1995 and 2011 were of level 1
to the study design has been undertaken. evidence (approximately 10%) [25]. However, it
Moreover, the assessment of level of evidence is important to point out that EBM does not
differs slightly depending on the study type, and solely rely on level 1 RCTs, which is a common
specific criteria have been established for each misunderstanding of the EBM [3]. All types of
study type. According to the systemic approach study designs contribute to EBM, where the
of the level of evidence assessment proposed in strength and disadvantages of every study design
the version by the Oxford Centre for Evidence- need to be considered when accumulating evi-
Based Medicine, study types are stratified in the dence. A valuable instrument for rating evidence
following groups: therapeutic studies, prognostic quality is the Grading of Recommendations
6 E. Svantesson et al.
review has already been scrutinized, it is the takes an evidence-based approach toward the
readers’ responsibility to critically evaluate the topic on his or her own. This requires skills of the
included studies and to relate them to the ratio- reader that may take time and education to mas-
nale of the systematic review and the specific ter. Nevertheless, to develop such skills is some-
question. A systematic review should be carried thing that is nearly mandatory in the modern
out based on strict, predefined inclusion and world of EBM, and increased understanding of
exclusion criteria for all eligible articles. The how to independently determine study validity
search strategy should be clearly illustrated, and and applicability is encouraged. Another usable
it should be obvious and reproducible how the instrument in the arsenal of evaluating study
articles have been assessed for inclusion and crit- quality is the strength of recommendation taxon-
ical quality appraisal [1, 14]. It is of importance omy (SORT) [8] system which is useful when the
that all articles found in the literature search are clinician has failed to find the requested informa-
evaluated objectively and that all articles meeting tion via randomized controlled trials, meta-
eligibility criteria are included, regardless of analyses, or systematic reviews. The SORT
what results the studies present. Otherwise, there system could be used alone or as a complement
is an evident risk of selection bias in the to the level of evidence hierarchy for quality
systematic review [20]. Finally, if a meta-analy- appraisal of individual studies and yields a code
sis is performed, the reader should evaluate data ranging from A to C where A represents the best
extraction, pooling of data, and the statistical possible evidence (Fact Box 1.4).
methodology for quantitative synthesis of data,
including data heterogeneity assessment. The
Preferred Reporting Items for Systematic Fact Box 1.4: The SORT Rating System
Reviews and Meta-Analyses (PRISMA) state- Code Definition
ment [19] has been developed to facilitate the A Consistent, high-quality, patient-oriented
evidence
conduction of a systematic review and could also B Inconsistent or limited-quality patient-
be applied when evaluating a systematic review. oriented evidence
In general, most journals require that the authors C Consensus, disease-oriented evidence, usual
have applied the PRISMA statement to publish a practice, expert opinion, or case series for
systematic review. Other valuable resources for studies of diagnosis, treatment, prevention,
or screening
systematic reviews and meta-analyses are, for
example, the Cochrane Database of Systematic
Reviews (the Cochrane Collaboration) and the
Database of Abstracts of Reviews of Effects After undertaking a thorough review of the lit-
(DARE; National Institute for Health Research). erature and appreciating what is the best available
A disadvantage of filtered resources is that it evidence, the most important thing for a clinician
takes time to conduct such a synthesis of evi- is to subsequently incorporate this into clinical
dence and that more recent research therefore is practice. This entails a combination of having the
at risk of being missed if solely relying on such actual requisites to provide such health-care and
sources of information. Moreover, primary stud- also to ascertain that the patient’s needs and pref-
ies related to, for example, newer conditions, erences are incorporated in the algorithm of treat-
interventions, or technologies are often insuffi- ment. A complete utilization of EBM relays on a
cient for a conduction of a systematic review. patient-centered health-care as well as a scientific
Unfiltered resources, or primary studies, provide approach to treatment. This means that the clini-
the most up-to-date research and can include cian has a responsibility to function as the expert
important conclusions that might have been pub- of the area, presenting evidence-based alterna-
lished since the filtered resources were published. tives to treatment, while the ultimate decision is
Finding the best available evidence among pri- characterized by a shared decision-making
mary studies demands that the clinician under- between the clinician and the patient.
1 What Is Evidence-Based Medicine? 9
2017;390(10092):415–23. https://doi.org/10.1016/ 18. Matthews J. Quantification and quest for medical cer-
s0140-6736(16)31592-6. tainty. Princetion, NY: Princeton University Press; 1995.
8. Ebell MH, Siwek J, Weiss BD, Woolf SH, Susman J, 19. Moher D, Liberati A, Tetzlaff J, Altman DG. Preferred
Ewigman B, et al. Strength of recommendation taxon- reporting items for systematic reviews and meta-
omy (SORT): a patient-centered approach to grading analyses: the PRISMA statement. J Clin Epidemiol.
evidence in the medical literature. J Am Board Fam 2009;62(10):1006–12. https://doi.org/10.1016/j.
Pract. 2004;17(1):59–67. jclinepi.2009.06.005.
9. Evans D. Hierarchy of evidence: a framework for 20. Murad MH, Montori VM, Ioannidis JP, Jaeschke R,
ranking evidence evaluating healthcare interventions. Devereaux PJ, Prasad K, et al. How to read a system-
J Clin Nurs. 2003;12(1):77–84. atic review and meta-analysis and apply the results to
10. Evidence-Based Medicine Working Group. Evidence- patient care: users’ guides to the medical literature.
based medicine. A new approach to teaching the prac- JAMA. 2014;312(2):171–9. https://doi.org/10.1001/
tice of medicine. JAMA. 1992;268(17):2420–5. jama.2014.5559.
11.
Grant HM, Tjoumakaris FP, Maltenfort MG, 21. Oxman AD, Sackett DL, Guyatt GH. Users’ guides
Freedman KB. Levels of evidence in the clinical to the medical literature. I. How to get started. The
sports medicine literature: are we getting better over Evidence-Based Medicine Working Group. JAMA.
time? Am J Sports Med. 2014;42(7):1738–42. https:// 1993;270(17):2093–5.
doi.org/10.1177/0363546514530863. 22. Paynter R, Banez LL, Berliner E, Erinoff E, Lege-
12. Green SHJ. Glossary of terms in the cochrane col- Matsuura J, Potter S, et al. EPC methods: an
laboration. The cochrane collaboration. 2015. http:// exploration of the use of text-mining software in
www.cochrane.org/sites/deafault/files/uploads/glos- systematic reviews. Rockville, MD: Agency for
sary.pdf. Healthcare Research and Quality (US); 2016.
13. Guyatt G, Rennie D, Meade OM, Cook JD. Users’ 23. Sackett DL. Rules of evidence and clinical recom-
guide to the medical literature: a manual for evidence- mendations on the use of antithrombotic agents.
based clinical practice. Boston, MA: McGraw-Hill; Chest. 1986;89(Suppl 2):2s–3s.
2014. 24. Sackett DL, Rosenberg WM, Gray JA, Haynes RB,
14. Harris JD, Quatman CE, Manring MM, Siston RA, Richardson WS. Evidence based medicine: what it is
Flanigan DC. How to write a systematic review. and what it isn't. BMJ. 1996;312(7023):71–2.
Am J Sports Med. 2014;42(11):2761–8. https://doi. 25. Samuelsson K, Desai N, McNair E, van Eck CF,
org/10.1177/0363546513497567. Petzold M, Fu FH, et al. Level of evidence in anterior
15. Jackson R, Ameratunga S, Broad J, Connor J, Lethaby cruciate ligament reconstruction research: a system-
A, Robb G, et al. The GATE frame: critical appraisal atic review. Am J Sports Med. 2013;41(4):924–34.
with pictures. Evid Based Med. 2006;11(2):35–8. https://doi.org/10.1177/0363546512460647.
https://doi.org/10.1136/ebm.11.2.35. 26. Straus ES, Glasziou P, Richardson W, Haynes
16. Lind J. A treatise of scurvy. In three parts. Containing R. Evidence-based medicine. How to practice and
an enquiry into the nature, causes and cure, of that dis- teach EBM. Edinburgh: Churchill Livingstone; 2005.
ease. Together with a critical and chronological view 27. Watts G. Let’s pension off the “major breakthrough”.
of what has been published on the subject. Edinburgh: BMJ. 2007;334(Suppl 1):s4. https://doi.org/10.1136/
Printed by Sands, Murray, and Cochran; 1753. bmj.39034.682778.94.
17. Lochner HV, Bhandari M, Tornetta P 3rd. Type-II 28. Wright JG, Swiontkowski MF, Heckman JD. Introducing
error rates (beta errors) of randomized trials levels of evidence to the journal. J Bone Joint Surg Am.
in orthopaedic trauma. J Bone Joint Surg Am. 2003;85-a(1):1–3.
2001;83-a(11):1650–5.
What Is the Hierarchy of Clinical
Evidence?
2
Vishal S. Desai, Christopher L. Camp,
and Aaron J. Krych
© ISAKOS 2019 11
V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_2
12 V. S. Desai et al.
question to which a study design can later be tai- 2.3 Primary/Unfiltered Research
lored. This model serves to clearly outline key
variables of the question so that subsequent data 2.3.1 Expert Opinion
collection and analysis can be performed in a
focused manner [26, 43] (Fig. 2.2: Patient, Prior to popularization of evidence-based medicine
Problem, Population; Intervention; Comparison, and peer review, expert opinion was the predomi-
Control; Outcome). Focused around the topic of nant form of “clinical evidence” used to communi-
anterior cruciate ligament (ACL) reconstruction cate findings to a larger audience. In fact, a
outcomes, we will introduce and parse out the significant portion of orthopedic literature that has
hierarchy of clinical evidence as well as demon- been influencing therapeutic decision-making of
strate how its components can be applied differ- surgeons for many years is based in expert opinion
ently to answer clinical questions. Let us begin and case series [18]. As it is reliant on the observa-
first with a discussion of primary (unfiltered) tions and experiences of an individual (or group)
research including expert opinion, case reports, and may lack a reproducible or methodical design,
cross-sectional study, case-control study, cohort some argue that it does not belong in the hierarchy
study, and randomized controlled trials. of evidence at all [14]. To reduce the dependence of
Critically-Appraised Individual
Articles (Article Synopses)
ce
en
vid
Cohort Study
Qu
Unfiltered
Case-Control Study Information
P I C O
Patient
Population Intervention Comparison Outcome
Problem
Fig. 2.2 The PICO format allows the examiner to methodically develop all components of a complete clinical
question
2 What Is the Hierarchy of Clinical Evidence? 13
the results on simply the observations of an indi- 3. An “outlier” with features strikingly outside
vidual, expert opinion can be strengthened by pool- the realm of what is usually seen with a par-
ing together and appropriately summarizing the ticular disease
opinions of multiple, well-respected members of a 4. An unexpected response or course suggesting
scientific community. Consider the following by a previously unrecognized therapeutic or
Zantop et al. [49] in which 20 panelists from around adverse effect of intervention
the world were surveyed on their preferential tech-
nique and rehabilitation protocol for ACL recon- In the realm of orthopedic surgery, surgical
struction. The purpose of the study was to complications may be of significance. Although
summarize the collective practices of this panel of surgical complications remain a traditionally
experts to inform clinicians of the current technical underreported topic in orthopedic literature, case
conventions predominantly used in ACL recon- reports may serve to inform surgeons of potential
struction. The authors defined “expert” as those risks and how to avoid them. For example, a case
who were identified in the specific subject matter to report by Heng et al. illustrated a 35-year-old
have contributed to the National Library of man who developed a peri-ACL, distal femoral
Medicine or present clinical evidence at interna- fracture running through the two femoral tunnels
tional meetings [49]. Expert opinions provide an following double-bundle ACL reconstruction.
efficient way for the clinical community at large to Furthermore, the authors go on to identify this
be updated on the current trends and pitfalls of case as the first of its kind following a thorough
practice in the community at large. However, based literature review [24]. On the one hand, a clini-
on the design of expert opinion articles, it becomes cian, after having read this, may refine surgical
clear that panelist bias, small sample sizes, and lack technique or be more cognizant of specific risk
of a rigorous data collection protocols limit their factors in his/her patients that may predispose
overall impact and relative significance [14, 47, them to a similar outcome. However, although
49]. Despite their shortcomings, the information the authors did allude to a possible mechanism of
presented in a high-quality, thorough expert opin- this complication, without an appropriately con-
ions may serve to have more practical utility than ducted scientific experiment, the basis of their
that of a poorly designed clinical study, and there- commentary largely remains conjecture. Case
fore, they will remain a crucial component of the reports may inspire a testable hypothesis for
medical literature [47]. future studies to execute and be successful in pro-
viding short and succinct learning points for the
reader [17]; however, they provide limited value
2.3.2 Case Report when attempting to investigate and answer spe-
cific clinical questions. Further limitations of
Case reports document events or outcomes of case reports include a lack of generalizability, the
individual or small groups of patients that the inability to establish cause-effect relationships,
author finds to be of significance. Whether or not publication bias, and the risk of “overinterpret-
the presented findings are positive or negative, ing” the findings beyond their simple anecdotal
they may be useful in dictating future patient intent [38].
care [14]. Huth outlined four major applications
of case reports in his discussion on medical
literature [27]: 2.3.3 Cross-sectional Study
1. A unique case that may represent a previously The cross-sectional study attempts to answer the
unknown syndrome or disease question, “what is happening right now?” One
2. A case with the previously unreported associ- of the most common applications of the cross-
ation of two distinct diseases, suggesting a sectional study is in determining the prevalence
possible relationship between them of a condition or diagnosis at a particular time.
14 V. S. Desai et al.
Let us return to our theme of ACL reconstruc- ferences in exposures and risk factors between
tion and consider the following cross-sectional the two groups. Because case-control studies
study. Farber et al. [16] investigated, “Which begin with the outcome, they are always inher-
management strategies for ACL injury are cur- ently retrospective in design. In addition, because
rently the predominant forms of therapy applied the patient outcome is selected by the investiga-
by team physicians in Major League Soccer tor, case controls serve as the ideal study design
(MLS)?” The authors selected a focused ques- for analyzing root causes and risk factors of rare
tion that attempts to answer what is currently diseases. For example, you may have identified a
happening rather than trying to establish a tem- group of your patients who failed single-bundle
poral, causative, or correlative relationship of (SB) ACL reconstruction. And as you think about
any sort. In order to answer this question, the why this may have happened, the question arises,
investigators sent out a survey which required “what predisposed these patients to failing sur-
respondents to provide their particular surgical gery?” A well-designed case control, such as the
technique when treating MLS players [16]. It is following, may be suited to answer this question.
important to note that while this study may have Parkinson et al. [39] identified 123 SB ACL
employed a survey- based model just as the reconstruction patients with long-term follow-up
expert opinion example did, they can be differ- who met their inclusion criteria. Patients who
entiated by how participants were selected and demonstrated failure at 2-year follow-up had
the types of questions they are trying to answer. higher rates of medial and lateral meniscus defi-
Rather than a more loosely defined group of ciency, shallower femoral tunnel positioning, and
“experts,” cross-sectional studies identify a spe- younger patient age. The authors were able to
cific group, in this case MLS team physicians, conclude that medial meniscus deficiency was
and select a specific question for them to answer. the most significant factor to predict graft failure
Their responses can then be statistically pooled in SB anatomic ACL reconstructions [39] (Fact
and homogenously formatted. The primary Box 2.1: Odds Ratio). While they serve a very
shortcoming of cross-sectional studies is their important role in clinical research, case-control
inability to generate temporal or causative rela- studies are not without limitation. In particular,
tionships between exposures and outcomes. In they are prone to recall bias in which there are
our example case, although the study was able discrepancies in the accuracy and thoroughness
to identify which reconstruction techniques are by which prior exposures are recalled among
currently prevalent among MLS team physi- study participants [30]. For example, in a study of
cians, it does not begin to address the relative risk factors for chronic ACL deficiency, partici-
outcomes of the patients who underwent these pants with ACL pathology are likely to more thor-
interventions nor the risk factors that may have oughly search their memories for injuries and
placed them at risk for the injury in the first underlying causes compared to their unaffected
place. Therefore, while they do not necessarily counterparts, thus creating bias in the collected
serve to show trends or relationships between data. Another important shortcoming of case-
multiple variables, cross sections are effective control studies is their limited ability to demon-
for gathering and presenting large volumes of strate causation. As causation is typically best
generalizable data. accomplished in prospective studies, the retro-
spective nature of a case control makes it a poor
indicator for causal inference between exposure
2.3.4 Case-Control Study and outcome. In the case of our above example,
the study we highlighted more effectively showed
Case-control studies begin with a sample of a correlation between medial meniscus defi-
patients in whom a selected outcome has already ciency and ACL reconstruction failure rather than
occurred and been identified. Patients with and showing that medial meniscus deficiency caused
without this outcome are then compared for dif- a failure in ACL reconstruction.
2 What Is the Hierarchy of Clinical Evidence? 15
decide to seek surgical intervention through which group they were originally assigned to
another avenue. What now are we to do with this [15]. The SPORT trial illustrates side by side how
data which has been altered by patient movement these two analytical methods can yield drasti-
between our groups? The ITT principle contends cally different results from the same patient set.
that patients should be analyzed in the group to In addition to the ITT analysis performed (as
which they were initially randomized. If the ten described above), the examiners also analyzed
patients who left the control group are excluded the data using the as-treated method which
from its analysis, the remaining 40 patients in showed strong, statistically significant advan-
that group now represent a clinically healthier tages for surgery at all follow-up times through
and skewed sample of patients thus biasing the 2 years [46]. While this does go to show that ITT
results. In other words, patients who are faring can underestimate the true effect of surgery in
more poorly in the control group are more likely certain cases, one clear limitation of as-treated
to abandon it. The Spine Patient Outcomes analysis is that it does not provide the same pro-
Research Trial (SPORT) of 2006 beautifully tection against confounding variables that can be
illustrates a real-life example of this dilemma. A achieved by following through with the original
group of 501 patients who were all surgical can- randomization protocol [46].
didates for lumbar discectomy for intervertebral Despite representing the highest level of pri-
disk herniation were randomized to either mary evidence, RCTs are not without shortfalls. A
undergo standard open discectomy or non- discussion of the limitations of RCTs would be
operative treatment for their persistent signs and incomplete without an understanding of “clinical
symptoms. Although patients were initially ran- equipoise.” This principle states that the examiner
domized evenly between the two groups, only and medical community at large genuinely cannot
50% of patients assigned to the surgery group definitely say that one arm of an experiment is
had undergone surgery at 3 months; in this same superior to the other. Ideally, RCTs performed
3-month period, 30% of those assigned to the under the assumption of clinical equipoise uphold
non-operative arm of the study underwent sur- a higher ethical benchmark as they prevent the
gery. The examiners used ITT analysis on the deliberate withholding of a treatment from
data which did show small and not statistically patients which is otherwise known to be superior
significant improvement in the surgical group to the control arm of the study [7, 37]. Equipoise
over the nonsurgical. However, the substantial is necessary to ethically justify the use of random-
crossover of patients between the two groups led ization to select which treatment (or lack thereof)
the author to conclude that superior or equiva- a patient will receive in a situation where the opti-
lence of treatments could not be established and mal treatment method is not yet agreed upon by
thus illustrated a key limitation of executing the expert community [12]. This concept, in the
RCTs [46]. If all patients are analyzed in the context of orthopedic surgery, can be well illus-
groups to which they are initially assigned, as per trated with a discussion of sham surgery. Sham
ITT analysis, examiners can improve protocol surgeries are analogous to “placebo drugs” in that
adherence and decrease bias. As a result of imple- they create the illusion to a patient that he or she
menting this principle, the examiner’s findings is being treated; however, they in fact omit the
tend to become more conservative as non- therapeutically significant step in surgery [37].
compliant and withdrawn patients potentially On the one hand, they allow us to accurately com-
dilute positive data [20]. ITT is not without limi- pare the efficacy of the treatment arm; but on the
tations, and critics have argued that it may fail to other hand, they raise the possibility of denying
provide an accurate picture of the data if large appropriate therapy to patients of the control arm.
portions of patients change groups [25, 44]. An Current literature holds that within the confines of
alternative to ITT is “as-treated” analysis in ethical practices, equipoise, and clear, informed
which subjects are compared with the treatment consent, sham surgeries do have a place in ortho-
regimen that they have received regardless of pedic clinical trials [12, 34].
18 V. S. Desai et al.
A major challenge with RCTs lies in the orthopedic literature and a preponderance of sys-
expense, man power, and resources needed to tematic reviews substantiated by nonrandomized
perform them. In the absence of ethically man- and heterogeneous studies; they recommended a
aged protocols and high reporting standards, degree of caution when interpreting findings of
RCTs may provide erroneous information which secondary evidence [2].
can subsequently compromise patient care.
Despite being considered the “gold standard” of
study design, RCTs are not the only option to 2.4.1 Systematic Review
generate reliable and evidence-based results. It is
recommended that if clinical questions can be Systematic reviews offer a comprehensive collec-
answered accurately and more efficiently with a tion of all literature available on a particular topic
so-called “lower-impact” study design (as the followed by a discussion and analysis of findings.
ones discussed above), those avenues should be They allow the examiner to answer the question,
considered [6]. “What does the current literature, as a whole, say
about my question?” Returning to our theme of
ACL reconstruction, consider the following. In
2.4 Secondary/Filtered Research assessing how outcomes of ACL reconstruction
surgery differ between male and female patients,
Secondary evidence primarily takes the form of Ryan et al. performed a systematic review of lit-
two study designs: the systematic review and the erature and found 13 studies that met their inclu-
meta-analysis. Rather than searching for and sion criteria. Stratifying their findings based on
interpreting data directly, secondary evidence “graft failure risk,” “contralateral ACL injury
relies upon pooling of primary evidence that is risk,” “knee laxity,” and “patient-reported out-
already available and organizing it in a meaning- comes,” their review found no evidence of clini-
ful and novel fashion. Because it represents a cally important difference in outcomes between
summary of the best available evidence on a par- male and female patients [42]. While systematic
ticular topic [21], secondary evidence is tradi- reviews do allow for a simplified, qualitative dis-
tionally classified as the highest level on the cussion of large volumes of data, what they gen-
hierarchy of evidence. Despite the perceived erally lack is a head-to-head statistical analysis.
value of secondary evidence, critics have called Another limitation of systematic reviews is that
into question the validity of its findings and sug- while they do require an exhaustive search of lit-
gest that the medical community may be over- erature, the inclusion criteria of studies that ulti-
valuing their contribution compared to mately make it to the analysis stage can be
well-designed, primary studies [2, 3, 5, 8]. The doctored to the point that you now portray a
inclusion of methodologically flawed and non- biased perspective on an otherwise generalizable
randomized trials into systematic reviews, due to question. In other words, the author has the free-
the unavailability of high-quality RCTs in the dom to alter the inclusion criteria to include (or
orthopedic literature, has been a recent topic of exclude) studies that may reinforce their
discussion [5]. The findings from a systematic argument.
review or meta-analysis are only as strong as the
studies that comprise it; thus, the inclusion of
low-quality studies compromises the validity of 2.4.2 Systematic Review
reported results. The inclusion of clinically irrel- with Meta-analysis
evant studies and “old” information has weak-
ened the quality of conclusions, if any, that can be Like a systematic review, meta-analysis also
drawn from many meta-analyses. As a result, begins with a comprehensive collection of litera-
they do little to advance the standard of care or ture pertinent to the examiner’s question.
best practices [8]. Audigé et al. reviewed the However, in order to be classified as meta-
2 What Is the Hierarchy of Clinical Evidence? 19
analysis, the examiner must be able to pool statis- design are highly encouraged to review and
tics [14]. Therefore, meta-analysis requires a adhere to the PRISMA guidelines prior to begin-
relative degree of homogeneity in data. When ning. To see an outline of how to systematically
studies are combined in a meta-analysis, they are filter the literature to isolate the studies to more
generally done using either a “fixed effects” thoroughly examine for inclusion in a study,
model or “random effects” model. In the Mantel- please refer to the Additional Resources section.
Haenszel fixed model, the examiner assumes that
one true effect size underlies all studies in the
analysis and that any differences among the 2.5 Levels of Evidence
included studies are due to chance [31]. In the
DerSimonian and Laird random effects model, Although the “levels of evidence” will be dis-
the degree of heterogeneity between studies is cussed at length in Chap. 5, we felt they merited
considered to render a fixed effect model implau- a brief mention here as they are modeled off of a
sible and thus is the optimal selection in evaluat- similar structure as the hierarchy of evidence.
ing heterogeneity between studies [10]. Figure 2.3 adapted from an editorial by Wright
Because data on the same subject matter may be et al. in the Journal of Bone and Joint Surgery
presented using different metrics, units, outcome elegantly outlines the levels of evidence and the
scores, etc., meta-analysis is more difficult to exe- study designs that underlie them [48]. Not only
cute. Say one were interested in finding a compre- do they demonstrate the relative strength of data
hensive analysis comparing outcomes scores and generated by different types of studies; they serve
physical exam findings for all patients who under- to uniformly grade the quality recommendations
went single-bundle versus double- bundle ACL derived from that data.
reconstruction that are documented in the litera-
ture. A meta-analysis by Desai et al. identified 970
patients from 15 studies which met inclusion crite- 2.6 Trends in the Evolution
ria and allowed the examiner to compare a homog- of Orthopedic Literature
enized data set. Their results showed a statistical
improvement in knee kinematics of the double-bun- Over the past several decades, orthopedic litera-
dle technique but no significant improvement over ture has continued to change in both quantity and
single-bundle reconstruction on clinically mean- quality. Cunningham et al. [9] published a 2013
ingful outcomes [11]. Just as in systematic reviews, study in which they reviewed literature in eight of
meta-analysis is also limited by its vulnerability to the highest impact journals in orthopedic medi-
bias based on the author’s decision to include or cine over the past 10 years to identify trends in
exclude studies. the volume and level of evidence that has been
published. In short, they found that the annual
volume of published studies continuously
2.4.3 PRISMA increased between the years 2000 and 2010 and
the number of Level I and Level II studies signifi-
How does one go about conducting an exhaustive cantly increased during that same period.
review of literature? In 2009, Moher et al. However, despite these increases in volume, they
released the “preferred reporting items for sys- reported that much of the literature still contains
tematic reviews and meta-analyses” (PRISMA) Level III and Level IV studies and advocated for
statement. The article outlines a meticulous researchers to publish the highest level of evi-
methodology by which to design and execute dence available for a given question [9] (Fig. 2.4).
systematic reviews and meta-analyses with the These findings were later corroborated in a study
intent of improving and standardizing the quality by Grant et al. [18] who also identified an increase
of secondary evidence available in the literature in the volume of Level I and Level II evidence
[35]. Researchers preparing a manuscript of this studies. In addition, they highlighted that the
20 V. S. Desai et al.
Level I 1. Randomized controlled trial 1. Prospective study1 1. Testing of previously developed diagnostic
a. Significant difference 2. Systematic review2 of Level-I studies criteria in series of consecutive patients
b. No significant difference but narrow (with universally applied reference
confidence intervals “gold” standard)
2. Systematic review2 of Level-I randomized 2. Systematic review2 of Level-I studies
controlled trials (studies were
homogeneous)
Level II 1. Prospective cohort study3 1. Retrospective study4 1. Development of diagnostic criteria
2. Poor-quality randomized controlled trial 2. Study of untreated controls from a on basis of consecutive patients
(e.g., 80% follow-up) previous randomized controlled trial (with universally applied reference
3. Systematic review2 3. Systematic review2 of Level-II studies “gold” standard)
a. Level-II studies 2. Systematic review2 of Level-II studies
b. nonhomogeneous Level-I studies
Level III 1. Case-control study5 Case series 1. Study of nonconsecutive patients
2. Retrospective cohort study4 (no consistently applied reference
3. Systematic review2 of Level-III studies “gold” standard)
2. Systematic review2 of Level-III studies
Level IV Case series (no, or historical, control group) 1. Case-control study
2. Poor reference standard
Level V Expert opinion Expert opinion Expert opinion
Figure adapted from The Journal of Bone & Joint Surgery: 85-A(1), 2003
Fig. 2.3 This table outlines the clinical levels of evidence and the study designs that underlie them in the context of
therapeutic, prognostic, and diagnostic studies [48]
60 LoE
Proportion of Articles (%)
0
200
200
201
1
50 2
III
III
IV
IV
III
IV
I
II
I
II
II
I
AJSM Arth FAI LoE
3
40 4 100
1
2
Number of Articles
50 3
30 0 4
JHS JoT JPO
100
20
50
0
10 JSES Spine
I
I
III
III
I
III
II
II
II
IV
IV
IV
100
0
0
5
0
200
50
200
201
LoE 0
IV
IV
IV
II
II
II
III
III
III
I
LoE
I
I
III
III
I
III
II
II
II
IV
IV
IV
Year 00 05 10 Year
0
20 20 20
5
0
200
200
201
Fig. 2.4 Despite an increase in the volume of orthopedic literature published on an annual basis over the past 20 years,
a high proportion of the literature is still comprised of Level III and Level IV studies [9]
largest increase in volume has been seen in diag- poor job in answering the specific question
nostic studies, compared to more modest that the examiner sought to address in the first
increases in prognostic and therapeutic ones [18]. place.
• While RCTs may represent the gold stan-
dard of clinical evidence, they may not
Take-Home Message always be the ideal choice for a particular
• Despite this hierarchical construct that has question.
been developed to organize evidence based on • While we encourage publishing the highest
strength, the most important factor to consider quality of evidence feasible, we stress the
when designing a study is “what is the ques- importance of ensuring that every scientific
tion that I want to answer?” study begins with a well-crafted clinical ques-
• A seemingly “stronger” study may add no tion and only then should a study design be
additional benefit to one’s objective if it does a implemented.
2 What Is the Hierarchy of Clinical Evidence? 21
© ISAKOS 2019 23
V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_3
24 N. Roselaar et al.
one RCT according to criteria used by the authors analysis [10]. Both restriction and matching
[17]. Therefore, the selection of procedures when methods can be used to mitigate confounding
designing a study sometimes does not occur factors at the research design stage [4].
based on evidence from the literature.
3.2.2.2 Restriction
The principle of restriction involves limiting the
3.2.2 Confounding Bias ability of a variable to vary in the study popula-
tion [10]. As Gerhard et al. explain, “A variable
The basis for confounding is the presence of one has to be able to vary to be associated with an
or more risk factors independently related to both exposure.” In a study of only non-smokers, smok-
the exposure and outcome of a study [24]. Known ing cannot act as a confounding variable.
as confounding variables, confounding factors, or
confounders, these variables primarily affect non-
randomized observational studies. When unseen, Clinical Vignette
these risk factors can affect the association A basic orthopedic example of restriction
between the exposure and outcome in a study. used to negate a measurable confounding
Randomized controlled trials are affected by con- variable may be found in the study of ACL
founding factors to a lesser degree, because the injuries in young athletes. Sports may be a
confounding effects are greatly reduced or elimi- confounding factor when considering the
nated by the process of randomization. This is one association between sex and suffering from
of the most important reasons why randomized an ACL injury in young athletes. Types of
controlled trials are considered the gold standard sports are associated with both the exposure
in clinical research design [30]. (sex) and with the outcome (ACL injury fre-
quency). Some sports such as American
3.2.2.1 Measuring and Controlling football or rhythmic gymnastics are either
Confounding Factors predominantly male or female, while cutting
It is impossible to completely eliminate the and pivoting sports like soccer and lacrosse
effects of all confounding factors. However, steps have the highest rates of ACL injury. Because
can be taken to reduce the confounding effects of the type of sport that an athlete plays can be
certain variables. If the confounding factor in precisely determined (measured), the study
question is measurable, the effects can be design can be modified to eliminate this con-
decreased or eliminated [23]. Mitigating the founding effect. Limiting the study to include
effects of confounding factors is possible only to only participants who play a specific sport
the degree of precision that the factor is measur- (or sports) through the use of specific inclu-
able [23]. Confounding bias can be measured sion and exclusion criteria is one way to con-
using methods such as restriction and matching trol this confounding variable.
and can be controlled for using multivariate
3 Bias and Confounding 25
Cohort Consented
AII TKA Patients Female
selection
TKA Patients
over age 65
Compliant
Completed surveys,
not lost of to follow-up
ferences in the patients who did or did not select make their outcomes systematically different
surgery—such as level of activity, severity of from patients who would seek care at a commu-
injury, or age—which affect their outcome as nity hospital.
much or more than the treatment itself. Some of Consenting to randomization can be a signifi-
these confounding factors could be controlled by cant hurdle to achieving unbiased selection in
collecting relevant data and accounting for them RCTs in orthopedics (and other surgical special-
in the analysis. However, other factors that may ties). Not surprisingly, few patients are willing to
be systematically associated with either the treat- consent to be randomized to receive either sur-
ment or control group, such as motivation to gery or no surgery or even a specific type of sur-
achieve pre-injury level of activity, may be diffi- gery. For example, in the Spine Patient Outcomes
cult or impossible to detect or measure. Still other Research Trial (SPORT) [28], a much lower pro-
factors that result in patients choosing surgical portion of eligible patients enrolled in the ran-
versus nonsurgical treatment, and that affect out- domized surgical/nonsurgical arm of the study
come, may be unknown. compared with the parallel observational cohort
Regardless of the study design, choosing the (Fig. 3.3). That the proportion of patients willing
correct control or comparison group is critical for to be randomized is small suggests that this group
mitigating selection bias. Whenever possible his- may represent individuals with a specific person-
torical controls should be avoided, as they always ality, disease, or other characteristics that may
introduce some bias due to trends over time. affect outcome in a way that is different from
While more common in observational studies, other patients, so that their findings are not gener-
selection bias can also occur in randomized trials. alizable to the general population of patients with
For example, if recruitment occurs through a ter- back pain. Furthermore, the low number of
tiary care center—as is often the case for RCTs— patients in the trial diminishes the likelihood that
patients who are recruited may have different randomization can be effective in mitigating the
disease or socioeconomic characteristics, which effect of unknown confounders.
3 Bias and Confounding 27
100
Percent of Patients Consented in SPORT Even loss of patients in both groups relatively
90 RCTs vs. Parallel Observational Cohort equally will compromise the generalizability of a
80 study, as the loss may cause the study population
Percent of Patients
Retention bias occurs when patients who drop Informational biases can stem from detection or
out of a study differ from those not lost to measurement bias by the observer or due to
follow-up. Examples include patients who die or response bias by the study subjects.
leave the study due to poor health or because they
are unsatisfied with the treatment. Patients may
be lost due to non-compliance. A disproportion- 3.4.1 D
etection or Measurement
ate loss of patients in one group or another is par- Bias
ticularly problematic, as that may introduce error
in the analysis of the outcome. For example, if Detection bias can occur during the data collec-
patients leave a study investigating the outcome tion phase of a study when outcomes assessment
of a surgical procedure because they undergo tools are inappropriate or used inappropriately
revision surgery at an outside hospital, this would [27]. Detection bias can arise from poor calibra-
cause an underestimation of the rate of complica- tion of a measurement instrument, prejudice in
tions. Documenting the reasons that patients the interviewer or grader, as well as misclassifi-
leave a study can mitigate this type of bias. cation of exposure or outcome. In orthopedics
28 N. Roselaar et al.
80
70 66%
Percent of Patients
60%
60
50 44%
40
30
20
10
0
Surgical Non-surgical Parallel
observational cohort
Fig. 3.4 The similarity in the rate of surgery in the surgical underwent surgery was unexpectedly low because of non-
and nonsurgical groups made differentiating the outcomes compliance. At the same time, the proportion of patients
in the two groups impossible in the SPORT study. The pro- who started out in the nonsurgical group, but who crossed
portion of patients in the surgical group who actually over and underwent surgery, was unexpectedly high
research, a common cause of detection bias is the Blinding can be used to prevent interviewer or
failure to use a standardized method to diagnose grader prejudice from introducing measurement
the condition being studied. This type of bias is bias. Using clear definitions, validating findings
more common in retrospective studies, where with multiple resources and standardizing mea-
researchers do not control which diagnostic tools surement protocols are other methods to avoid
are used for each patient in the study and may not biases that stem from observer errors.
have information on what data or process led to a
specific diagnosis [27]. In prospective studies,
detection bias can be mitigated by standardizing 3.4.2 Response Bias
data collection protocols, including, but not lim-
ited to, protocols for how diagnoses are made. Subjects in a study may inadvertently give biased
In a study on the association of patellofemoral responses. The Hawthorne effect refers to a phe-
pain (PFP) and patellofemoral cartilage composi- nomenon where people tend to show bigger
tion, researchers in the Netherlands compared improvements, simply on the basis of being
patients with PFP to healthy controls [26]. All observed [29]. Another example of response bias
subjects included in the study were diagnosed is the tendency for respondents to please the
with PFP based on the presence of at least three interviewer, especially if the interviewer is the
criteria from a standardized list. Cartilage com- doctor or someone associated with their doctor.
position was assessed in all patients by a single Studies asking for responses on sensitive topics
validated magnetic resonance imaging (MRI) can also result in response bias, if patients are too
method. In this case, researchers minimized the embarrassed to tell the truth.
risk of detection bias by using a prospective study During the data collection phase of retrospec-
design. This allowed them to specify the diagnos- tive observational study designs such as case
tic criteria for subject inclusion in the study and series, case-control studies, and retrospective
standardize the MRI protocol for assessment of cohort studies, recall bias can be particularly
cartilage composition in both patients with PFP problematic [30]. Recall bias occurs when sur-
and the healthy controls. veys or interviews require subjects to recall their
3 Bias and Confounding 29
own history, which they may incorrectly remem- used to reveal confounding. Multivariable model-
ber and report, inadvertently introducing error ing can be used to adjust for known potential
into the study [25]. Prospective cohort studies confounders. Time-to-event analyses can help
avoid recall bias because subject-obtained data is mitigate bias due to loss to follow-up.
collected in real time [12]. Propensity score matching can be effective in
Gabbe et al. [9] compared retrospective, self- mitigating selection bias.
reported injury histories of 70 community-level
Australian football players with prospective
injury surveillance records for the same 12-month 3.5.1 Propensity Scores
period and found a significant difference, to
investigate the magnitude of recall bias in a typi- Propensity scores are generated from logistic
cal type of question that might be included in an regression models including all potentially impor-
orthopedic study. They reported that recall bias tant information that could influence a clinician’s
increased with the level of detail that was treatment decision. Therefore, the model is esti-
requested [9]. mating the likelihood of a patient going down a
The proportion of community football players treatment pathway, rather than an outcome. The
who were accurately able to recall their injuries in coefficients are used to assign each patient a pro-
the past 12 months declined as the level of detail, pensity score ranging from 0 to 1 representing the
which was asked about the injury history, increased. patient’s likelihood of receiving a reference treat-
All respondents were accurately able to recall the ment (usually the standard of care). A score of 0
presence or absence of any injury, but fewer than indicates no chance of receiving treatment A,
80% were able to recall the number of injuries or while a score of 1 represents absolutely certainty
body region where the injury occurred correctly. of receiving treatment A. A score of 0.5 would
Accuracy decreased further when respondents represent an unbiased random allocation.
were asked to recall number of injuries, where on Once propensity scores are assigned to each
the body the injury occurred and diagnosis. study subject, they can be used in a variety of
The study of Gabbe et al. [9] not only showed ways. For example, a propensity score “match”
how much recall bias could affect research find- could be performed, which simulates randomiza-
ings but suggested that asking only simple ques- tion. A patient with a propensity score of 0.576 for
tions could increase the accuracy of responses, a specific surgery would be matched to a patient
when researchers need to ask respondents for from the other treatment type with a propensity
their medical history. score nearest to 0.576. A number of methods exist
Other strategies for mitigating response bias to optimize this matching process, but when
include asking questions related to sensitive top- appropriately performed, relevant characteristics
ics later in a questionnaire or interview, using of the two groups will be balanced, minimizing
validated survey instruments, and seeking potential selection bias. This can be visualized by
responses as early as possible after the exposure plotting the standardized differences for each
or treatment being studied. variable before and after matching (Fig. 3.5). A
standardized difference is used to compare bal-
ance between treatment groups for each indepen-
3.5 Biases in the Analysis dent variable in the dataset. It was proposed by
and Data Interpretation Cohen [3] as Cohen’s effect size index for the
Phase comparison of two sample mean values [8, 18]. A
standardized difference of >0.1 is considered to
The data analysis and interpretation phase offer represent an imbalance in the variable. As can be
opportunities to avoid confounding and other seen from the example, 14 variables were unbal-
types of bias, which may inadvertently or anced before the match and all variables were bal-
unavoidably have been introduced in earlier anced after the match. In fact, for all but two of the
phases. Stratification/subgroup analysis can be variables, the balance improved.
30 N. Roselaar et al.
Race - Hispanics
Chronic pulmonary disease
Medicaid
Commercial
Charlson-Deyo comorbidity index - 3+
Lymphoma
Solid tumor w/out metastasis
Race - Black
Liver disease
Pulmonary circulation disease
Paralysis
Peripheral vascular disease
Hypertension
Charlson-Deyo comorbidity index - 1–2
Congestive heart failure
Charlson-Deyo comorbidity index - None
DMAG
Valvular disease
Coagulopathy
Fluid and electrolyte disorders
Race - Asian
Race - White Before Propensity Score Matching
Renal failure
Performing Surgeon’s last TKA volume
Hypothyroidism After Propensity Score Matching
Year of surgery - 2009
ASA - ‘1-2
ASA - ‘3-4
Other neurological disorders
Race - All others
OA diagnosis
Year of surgery - 2007
Systemic inflammatory disease diagnosis
Year of surgery - 2008
Gender
BMI
Year of surgery - 2011
Selfpay
Year of surgery - 2010
Medicare
Age
Year of surgery - 2012
Fig. 3.5 Standardized differences were calculated for the denotes meaningful imbalance in the variable. In this exam-
42 variables (along x-axis) for which patients in a sample ple, there is imbalance (standardized difference > 0.1) in 15
cohort undergoing two different knee procedures were variables among patients in the two treatment groups before
matched. The standardized difference (also known as propensity score matching (dark blue diamonds). However,
Cohen’s effect size index, along the y-axis) compares two after propensity score matching (light blue circles), the like-
sample mean values in units of the pooled standard devia- lihood these variables affected each patient’s assignment to
tion so that a standardized difference of 0.1 or greater a particular treatment became more balanced
Whereas randomization eliminates bias, even determine a standardized list of variables that
from unknown confounders, in a strict sense, pro- best represents the most important information
pensity matching can only balance for known for creating a propensity score. Propensity score
factors. However, because the matching process matching has been used in orthopedics including
balances all measured variables, the effects of arthroplasty and trauma [5, 15].
unmeasured confounders are often also mini- A drawback of propensity score matching is
mized. Despite this, it is incumbent on orthope- that a very large number of patients may be
dic researchers to review the existing literature to needed, especially in the standard of care group
3 Bias and Confounding 31
[11]. Moreover, matching frequently omits a large 2. Cassel C, Sarndal C, Wretman J. Some uses of sta-
proportion of the available study population when tistical models in connection with the nonresponse
problem. In: Madow W, Olkin I, editors. Incomplete
comparison groups are being created. Therefore, data in sample surveys, Symposium on incomplete
inverse probability of treatment weighting (IPTW) data, proceedings, vol. 3. New York, NY: Academic;
is proposed as an alternative to matching to adjust 1983.
for confounders [2, 13, 21]. With IPTW, the pro- 3. Cohen J. Statistical power analysis for the behavioral
sciences. 2nd ed. New York: Routledge.
pensity score is used as weights in a weighted 4. Delgado-Rodríguez M, Llorca J. Bias. J Epidemiol
regression. Weights restore balance in the clinical Community Health. 2004;58:635–41.
characteristics of the two treatment groups and 5. Duchman K, Gao Y, Pugely A, Martin C, Callaghan
allow for the use of the entire sample, rather than J. Differences in short-term complications between
unicompartmental and total knee arthroplasty: a
the subset of matched patients. propensity score matched analysis. J Bone Joint Surg
Am. 2014;96(16):1387–94.
6. Dunn WR, Lyman S, Marx R. ISAKOS scientific
3.6 Publication Bias committee report research methodology. Arthroscopy.
2003;19(8):870–3.
7. Emerson GB, Warme WJ, Wolf FM, Heckman
Despite diligent attention to study design and exe- JD, Brand RA, Leopold SS. Testing for the pres-
cution, bias can be introduced at the stage of pub- ence of positive-outcome bias in peer review: a
lication. Reviewers are also human and studies randomized controlled trial. Arch Intern Med.
2010;170(21):1934–9.
have shown that prejudice can affect which studies 8. Flury B, Riedwyl H. Standard distance in univariate
are published. Emerson et al. (JAMA 2010) used and multivariate analysis. Am Stat. 1986;40:249–51.
fabricated manuscripts to show that those with 9. Gabbe BJ, Finch CF, Bennell KL, Wajswelner H. How
positive outcomes were more likely to be accepted valid is a self reported 12 month sports injury history?
Br J Sports Med. 2003;37(6):545–7.
for publication than those with negative ones. This 10. Gerhard T. Bias: considerations for research practice.
article also reported the presence of positive out- Am J Health Pharm. 2008;65:2159–68.
come bias in the Journal of Bone and Joint Surgery 11. Guo S, Fraser M. Propensity score analysis: statisti-
(JBJS) and Clinical Orthopaedics and Related cal methods and applications. 2nd ed. Thousand Oaks,
CA: Sage; 2015.
Research (CORR) [7]. Another type of bias was 12. Hennekens C, Buring J. Epidemiology in medicine.
reported by Okike et al. (JBJS 2008), who found 1st ed. Boston: Little, Brown and Company; 1987.
that submissions to JBJS were more likely to be 13. Hirano K, Imbens G. Estimation of causal effects
accepted if they were from the USA or Canada using propensity score weighting: an application
to data on right heart catheterization. Health Serv
[19]. These studies indicate that readers should be Outcomes Res Methodol. 2001;2:259–78.
aware of bias, even when reading peer-reviewed 14. JBJS. JBJS Inc. Journals Level of Evidence [Internet].
papers. J Bone Joint Surg. 2015. https://journals.lww.com/
jbjsjournal/Pages/Journals-Level-of-Evidence.aspx.
15. Jenkinson R, Kiss A, Johnson S, Stephen D, Kreder
Take-Home Message H. Delayed wound closure increases deep-infection
• When performing orthopedic research, the rate associated with lower-grade open fractures: a
effects of confounding factors and bias must propensity-matched cohort study. J Bone Joint Surg
Am. 2014;96(5):380–6.
be acknowledged. 16. Jennings JM, Sibinga E. Understanding and iden-
• Techniques such as randomization, matching, tifying bias in research studies. Pediatr Rev.
and choice of study design may help mitigate 2010;31(4):161–2.
the effects of confounding and bias. 17. Lim HC, Adie S, Naylor JM, Harris IA. Randomised
trial support for orthopaedic surgical procedures.
PLoS One. 2014;9(6):e96745.
18. Normand S, Landrum M, Guadagnoli E, Ayanian J,
References Ryan T, Cleary P, et al. Validating recommendations
for coronary angiography following an acute myocar-
1. Amin AK, Clayton RAE, Patton JT, Gaston M, Cook dial infarction in the elderly: a matched analysis using
RE, Brenkel IJ. Total knee replacement in morbidly propensity scores. J Clin Epidemiol. 2001;54:387–98.
obese patients: results of a prospective, matched 19. Okike K, Kocher MS, Mehlman CT, Heckman
study. J Bone Joint Surg [Br]. 2006;88-B(10):1321–6. JD, Bhandari M. Publication bias in orthopaedic
32 N. Roselaar et al.
research: an analysis of scientific factors associated biological plausibility. J Fam Plann Reprod Health
with publication in The Journal of Bone and Joint Care. 2008c;34(4):261–4.
Surgery (American Volume). J Bone Joint Surg Am. 25. Smith J, Noble H. Bias in research. Evid Based Nurs.
2008;90(3):595–601. 2014;17(4):100–1.
20.
Pannucci CJ, Wilkins EG. Identifying and 26. van der Heijden RA, Oei EHG, Bron EE, van Tiel J,
avoiding bias in research. Plast Reconstr Surg. van Veldhoven PLJ, Klein S, et al. No difference on
2010;126(2):619–25. quantitative magnetic resonance imaging in patello-
21. Rosenbaum P. Model-based direct adjustment. J Am femoral cartilage composition between patients with
Stat Assoc. 1987;82:387–94. patellofemoral pain and healthy controls. Am J Sports
22. Shapiro S. Causation, bias and confounding: a
Med. 2016;44(5):1172–8.
hitchhiker’s guide to the epidemiological galaxy 27. Viswanathan M, Berkman ND, Dryden DM, Hartling
Part 1. Principles of causality in epidemiological L. Assessing risk of bias and confounding in obser-
research: time order, specification of the study base vational studies of interventions or exposures: further
and specificity. J Fam Plann Reprod Health Care. development of the RTI item bank. Agency Healthc
2008a;34(2):83–7. Res Qual. 2013:1–22.
23. Shapiro S. Causation, bias and confounding: a hitch- 28. Weinstein JN, Tosteson TD, Lurie JD, Tosteson ANA,
hiker’s guide to the epidemiological galaxy: Part 2. Hanscom B, Skinner JS, et al. Surgical vs nonopera-
Principles of causality in epidemiological research: tive treatment for lumbar disk herniation the Spine
confounding, effect modification and strength of Patient Outcomes Research Trial (SPORT): a random-
association. J Fam Plann Reprod Health Care. ized trial. JAMA. 2006;296(20):2441–50.
2008b;34(3):185–90. 29. Wickstrom G, Bendix T. The “Hawthorne effect”—
24. Shapiro S. Causation, bias and confounding: a hitch- what did the original Hawthorne studies actually show?
hiker’s guide to the epidemiological galaxy Part 3: Scand J Work Environ Health. 2000;26(4):363–7.
Principles of causality in epidemiological research: 30. Zlowodzki M, Jönsson A, Bhandari M. Common pit-
statistical stability, dose- and duration-response falls in the conduct of clinical research. Med Princ
effects, internal and external consistency, analogy and Pract. 2006;15:1–8.
Ethical Consideration
in Orthopedic Research
4
Jason L. Koh and Diego Villacis
© ISAKOS 2019 33
V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_4
34 J. L. Koh and D. Villacis
for the purpose of this section, we will focus on Multiple revisions have been made over the
developments from the modern era, starting with years; however, the principles set forth by the
the aftermath of the atrocities suffered during “Nuremberg Code” remain the foundation for
World War II. the conduct of ethical research all over the
The Nuremberg Military Tribunal after world.
World War II shed light on the sadistic and hor- During early times of modern human research
rifying “research” conducted by Nazi German ethics, audible outcry was forming in the United
scientists. Nazi research included forced human States. Ethical concern was mounting regarding
exposure to the effects of freezing, incendiary the 1936 publication of a paper funded by the
devices, mustard gas, and other weaponized federal government entitled “The Tuskegee Study
agents. The rulings handed down from the of Untreated Syphilis in the Negro Male.” The
Nuremberg trial served as an outline for the purpose of the study was to investigate the effects
required elements of conducting ethical human of untreated syphilis starting in 1932, essentially
research. tracking the natural history. The study would
remain in progress until 1972. What would make
the study infamous was the continued withhold-
Fact Box 4.1 ing of treatment even after the acceptance of pen-
Initially known as rules for “Permissible icillin as the standard of care in 1945. Subjects
Medical Experiments,” this became known were led to believe that they were being treated
as the “Nuremberg Code.” The leading eth- when in fact nothing was being done. It took
ical principles that emerged were a require- nearly 40 years from the initial publication of the
ment for voluntary consents, evidence of study until stories in the press created public out-
scientific merit, benefits outweighing the cry on a national level, and the study was
risks, and the ability of research subjects to concluded [13].
terminate participation in the study at any- The subsequent attention gained by the shock-
time [23, 27]. ing truths of the Tuskegee study leads to congres-
sional hearings to address the matter of ethical
conduct in human subject research. As a direct
response to these atrocities and those of other
Although not formally adopted into law or as similar situations of ethical abuse, the National
any part of a professional ethical code, it served a Research Act of 1974 was passed to address ethi-
significant influence for the development of con- cal concerns [7]. The congress called for the
temporary ethical guidelines. establishment of “The National Commission for
the Protection of Human Subjects and Biomedical
and Behavioral Research” and the establishment
Fact Box 4.2 of institutional review boards (IRBs). The
Subsequently, the Declaration of Helsinki “National Commission” was tasked with identi-
in 1964 was developed by the World fying the basic ethical principles underlying
Medical Association (comprised of almost human subject research and develops guidelines
all national medical associations) to pro- for ensuring ethical principles are followed [3].
vide guiding principles regarding human The commission met between 1975 and 1978
subject research. Key among these princi- releasing a series of reports with their last report,
ples are a respect for the individual and a summary of their deliberations regarding ethi-
their right to make informed choices cal principles, released in 1979. This final report
regarding participation in research [9]. The was entitled “Ethical Principles and Guidelines
individual’s interests are placed above for the Protection of Human Subjects of
those of “science and society.” Research” and would come to be known as the
Belmont Report.
4 Ethical Consideration in Orthopedic Research 35
Although it was to have an initial impact on by public disclosure of the breach, limitations
health-care providers and insurance carriers, it on research, fines, and imprisonment (Fig. 4.2).
would also go on to affect human subject
research through its impact on protection of
patient privacy. This focus on protection of 4.5 Informed Consent
patient privacy was expanded with a provision
in 2003 requiring all “covered entities,” essen- The process of informed consent is a funda-
tially any personnel with access to patient mental pillar to ethical human subject research.
information, to be compliant with the HIPAA In essence the participant must be able to both
privacy rule. The goal of the HIPAA privacy understand and clearly verbalize a study’s pur-
rule is to regulate the use and disclosure of spe- pose, methods, risk, benefits, and alternative
cific “individually identifiable health informa- to participation in order to be consider
tion,” termed protected health information “informed.” The participant must be allowed
(PHI) [6]. A patient’s PHI is considered to be to freely decide whether to participate in the
information pertaining to their past, present, or study both at the beginning and throughout the
future physical or mental health, the provision study with the option to discontinue at any
of health care to an individual, and the past, time. For situations in which the potential sub-
present, or future payment for the provision of ject is a minor or adult of otherwise limited
health care to the individual [17]. Similar legis- mental capacity, a proxy can be used with the
lation to HIPAA has been recently enacted in same requirements of informed consent [22].
the European Union in 2016 [24]. In order to Informed consent is designed so as to demon-
help ensure the privacy of PHI during human strate respect for the autonomy of potential
subject research, the IRB is tasked with allow- and enrolled participants. Therefore, informed
ing the transmission of only the minimal consent must be obtained in a voluntary and
amount of information necessary. The IRB noncoercive manner. This goal of voluntary
must also make sure that this information is enrollment can be challenging when consider-
also adequately protected when transmitted or ing potential vulnerability to autonomy in sit-
stored electronically with tools such as encryp- uations where there is a large inequality in
tion or password protection. In the United authority between researcher and enrolled or
States, violations of this act can be punishable potential participant (Fig. 4.3).
Fig. 4.3 Paragraph 1. The orthopaedic surgeon must explain to the patient in terms the patient can
VIII.A. of the American understand the proposed treatment, its likely effect on the patient, and purpose of
of Orthopaedic Surgeons’ the research. Orthopaedic surgeons must provide at least the degree of information
Code of Medical Ethics that is required by applicable state and federal law, which will include at a minimum
and Professionalism for information on the purpose of the research, its potential side effects, alternatives and
Orthopaedic Surgeons [1] risks of the proposed treatment as well as the method, purpose, conditions of
participation and the opportunity to withdraw from the research protocol
without penalty.
2. The patient must understand for what they are providing consent. The orthopaedic
surgeon must ensure that the patient has understood the basic information and
has engaged in rational decision-making in deciding to participate in the research; and
3. The patient’s consent must be voluntary. Voluntary consent requires that the patient
agreeing to participate in the project has a full understanding of all alternative
treatments beyond the research protocol. The orthopaedic surgeon must believe
that the patient’s consent is free from undue or overbearing influences, e.g., fear
of the loss of care or medical benefits if the patient declines to participate.
4.9 Funding and Potential researcher, the research institution, and the cor-
Conflict of Interest (COI) poration funding the research [1]. These interests
can vary drastically and do not necessarily run in
Traditionally, there was minimal support from parallel. Thankfully, many institutions have cre-
the industry for orthopedic research. Research ated structures that assist in negotiating research
projects came from academic institutions as well funding so as to help eliminate or minimize con-
as research-active individuals or practices. flicts of interest. As may seem obvious, “ethical
However, in the past 3 decades, orthopedic sur- problems may arise when the researcher or the
gery has undergone a transformation with the research institution have direct financial interest
development and flourishing of an “orthopedic in the research program” [1]. However, for the
industry” [5]. Much of the industry-fueled inno- researcher or research institution, financial bias
vation has come in terms of implant develop- may not be obvious and can often be subtle. The
ment. During the same period, there has been a Academy (AAOS) references two ethical princi-
shift in research funding from predominantly ples for researchers to fall back on when faced
charitable or government funded to one where with economic conflicts of interest (refer to
industry funding represents a majority [5]. This Table 4.3) [1]. One can conclude from these two
includes industry support for education, meet- principles set forth by AAOS that once someone
ings, and conferences. Although increased indus- has received funding from a corporation they
try support clearly has the power to discover should not buy or sell the corporations stocks
positive advancements in the treatment of during the entirety or their involvement in the
patients, it also produces a potential bias and situ- project. Also, someone who has developed an
ation of potential conflict. The potential for con- implant from which they will receive royalties
flicts has led to most societies and journals in the should avoid doing research on that implant,
orthopedic community requiring full statements leaving the research to a disinterested third party
of disclosure when presenting any research work. who has no potential financial interest from use
However, successfully navigating the potential of the device [1]. That does not exclude research-
conflict of interests in regard to industry funding ers from being able to serve as consultants for a
is still an active challenge. corporation, given that the compensation is in
In order to better understand the arena, it is line with their efforts.
important to grasp the three parties who have dis-
tinct interests in industry-funded research: the
Clinical Vignette
Table 4.3 Ethical principles for economic conflict of Below are examples of unethical behavior
interest [1]
regarding conflict of interests as stated by
1. A researcher ethically may share the economic
the AAOS:
rewards of his or her efforts. If a drug, device, or
other products becomes financially remunerative, the
researcher may receive profits that reasonably • Knowingly negotiating for more fund-
resulted from his or her contribution. The Academy’s ing than is appropriate to support the
Standards of Professionalism on Orthopaedic
project and related institutional and
Surgeon-Industry Relationships and the Code of
Medical Ethics and Professionalism for Orthopaedic departmental overhead costs
Surgeons explicitly permit an orthopedic surgeon to • A researcher’s selling or purchasing
receive royalties. However, ethically the researcher stock in a company whose orthopedic
may not reap profits that are not justified by the value
device is being tested by that orthopedic
of his or her actual efforts
2. Potential sources of bias in research should be surgeon-researcher
eliminated, particularly where there is a direct • A researcher accepting financial incen-
relationship between a researcher’s personal interests tives to alter data
and potential outcomes of the research
4 Ethical Consideration in Orthopedic Research 41
9. Declaration of Helsinki History Website. Ethical prin- 21. Moseley JB, O’Malley K, Petersen NJ, Menke TJ,
ciples for medical research. JAMA Netw. Accessed 26 Brody BA, Kuykendall DH, Hollingsworth JC, Ashton
Feb 2018. CM, Wray NP. A controlled trial of arthroscopic sur-
10. Emanuel EJ, Wendler D, Grady C. What makes clini- gery for osteoarthritis of the knee. N Engl J Med.
cal research ethical? JAMA. 2000;283:2701–11. 2002;347:81–8.
11. Friedman EA. Ethical issues in clinical research. In: 22. NIH & Clinical Research. Ethics in Clinical Research.
Supino PG, Borer JS, editors. Principles of research http://clinicalresearch.nih.gov/ethics_guides.html.
methodology: a guide for clinical investigators. Ed 1 Accessed 26 Jan 2018.
ed. New York: Springer; 2012. p. 233–54. 23. Nuremberg Code [from Trials of War Criminals
12. Health Insurance Portability and Accountability Act Before the Nuremberg Military Tribunals Under
of 1996. Public Law 104–191. 104th Congress. 1996. Control Council Law No. 10. Nuremberg, October
http://www.gpo.gov/fdsys/pkg/PLAW-104publ191/ 1946–April 1949. Washington, DC: U.S. G.P.O,
pdf/PLAW-104publ191.pdf. Accessed 30 Jan 2018. 1949–1953].
13. Heller J. Syphilis victims in U.S. study went untreated 24. Official Journal of the European Union. Legislation
for 40 years. New York Times (New York). 1972;1:8. L119. http://eur-lex.europa.eu/legal-content/EN/
14. The hippocratic oath: today. Doctors’ diaries. WGBH TXT/?uri=OJ%3AL%3A2016%3A119%3ATOC.
Educational Foundation. 1964. http://www.pbs. Accessed 27 Feb 2018.
org/wgbh/nova/body/hippocratic-oath-today.html. 25. Owsei T, Temkin C. Ancient medicine. Selected
Accessed 28 Jan 2018. papers of Ludwig Edelstein Johns. Baltimore:
15. Hrobjartsson A, Gotzsche PC. Is the placebo power- Hopkins University Press; 1987.
less? An analysis of clinical trials comparing placebo 26. Public, Protection of Human Subjects, Basic HHS
with no treatment. N Engl J Med. 2001;344:1594– Policy for Protection of Human Research Subjects,
602. Erratum in: N Engl J Med;345:304. Title 45 CFR Part 46, Subpart A. 2005. http://ohsr.
16. Hrobjartsson A, Gotzsche PC. Placebo interventions od.nih.gov/guidelines/45cfr46.html. Accessed 28 Jan
for all clinical conditions. Cochrane Database Syst 2018.
Rev. 2004;3:CD003974. 27. Shuster E. Fifty years later: the significance of the
17. HSS. Summary of the HIPAA Privacy Rule. http:// Nuremberg Code. N Engl J Med. 1997;337:1436–40.
www.hhs.gov/ocr/privacy/hipaa/understanding/sum- 28. Sussman MD. Ethical requirements that must be
mary/privacysummary.pdf. Accessed 30 Jan 2018. met before the introduction of new procedures. Clin
18. International Committee of Clinical Journal Editors. Orthop Relat Res. 2000;378:15–22.
Clinical Trials. http://www.icmje.org/recommenda- 29. Sussman MD. Ethical standards in the treatment of
tions/browse/publishing-and-editorial-issues/clinical- human subjects involved in clinical research. J Pediatr
trial-registration.html. Accessed 27 Feb 2018. Orthop. 1998;18:701–2.
19. Mazur DJ. Evaluating the science and ethics of
30. Wolf BR, Buckwalter JA. Randomized surgical trials
research on humans: a guide for IRB members. and “sham” surgery: relevance to modern orthopae-
Baltimore: The Johns Hopkins University Press; dics and minimally invasive surgery. Iowa Orthop J.
2007. 2006;26:107–11.
20. Mehta S, Myers TG, Lonner JH, Huffman GR, Sennett
BJ. The ethics of sham surgery in clinical orthopaedic
research. J Bone Joint Surg Am. 2007;89:1650–3.
Conflict of Interest
5
Michael Hantes and Apostolos Fyllos
© ISAKOS 2019 43
V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_5
44 M. Hantes and A. Fyllos
there is a risk that professional judgment may be ing than the other interests but that they are rela-
influenced more by secondary interests than by tively more objective and quantifiable [10].
primary interests [10].
Conflict of interest is a worthwhile subject con-
sidering the simple truth that financial ties between 5.3 Recognition
industry and academic researchers contain the and Consequences
inherent risk that a guideline development process of Conflict of Interest
or research results may be compromised. Conflict
of interest policies typically and reasonably focus Financial conflicts of interest and intellectual
on financial gain and financial relationships. An bias can influence recommendations for clinical
“intellectual conflict of interest” is also possible practice. The issue of orthopedic surgeons having
and has been defined as well by Guyatt et al. [9]: industry-related conflicts of interest that could
Academic activities that create the potential for an potentially affect the scientific reports that are
attachment to a specific point of view that could used to guide orthopedic patient care remains
unduly affect an individual’s judgment about a spe- important patient- and community-wise. To begin
cific recommendation.
with, dissemination of orthopedic research can be
Potential conflicts of interest can arise because affected by industry funding-related conflict of
of the professional position held by the author, interest. Rates of citation may be influenced by
such as being employed by a private clinic or sit- characteristics other than the impact factor of the
ting on advisory boards related to specific treat- publishing journal. It has been found that high
ments. Researchers can feel pressured to obtain citation rates in medical literature are associated
funding, publish, advance careers, or obtain ten- with industry funding and the reporting of an
ure. Universities add to this pressure by pushing industry-favoring result, among others and in
researchers to succeed [16]. These personal inter- addition to journal impact factor [11]. Even more
ests also have the potential to bias researchers recently, in a study focused exclusively on ortho-
when they conflict with research. Furthermore, pedic journals, high level of evidence, large sam-
intellectual COI is further characterized as ple size, representation from multiple institutions,
“important,” when it includes authorship of origi- and conflict of interest disclosure itself involving
nal studies and peer-reviewed grant funding by a nonprofit organization or for-profit company
the government or nonprofit organizations that are associated with higher rates of citation in
directly relate to a recommendation and should be orthopedics [18]. A possible explanation offered
frowned upon, and “less important,” as is the par- is the possibility that researchers who secure
ticipation in previous guideline panels that must external funding may be able to publish articles
be acknowledged but do not necessarily preclude that are scientifically superior and therefore more
participation in developing new recommenda- likely to be cited.
tions. The reason of focus on financial COI is not Biases from financial and/or intellectual COIs
that financial gains are necessarily more corrupt- may result from the ability of such conflicts to
affect decision-making in a way that is com-
pletely hidden from the person making the deci-
Fact Box 5.2 sion. A related concern involves access to data. In
The three main elements of the definition industry-supported research, there is often the
of a conflict of interest are the primary and case that the investigator may lack full access to
secondary interest and their conflict. the study data. This exposes and creates a new
Secondary interests are objectionable only idea of unintentional bias on the researcher’s
when they have greater weight than the pri- behalf. Industry sponsors may even go as far as
mary interest in professional decision- preventing publication of findings not in their
making, interest, and their conflict. favor. Conflict of interest relates to people, not to
companies or organizations. Most people
5 Conflict of Interest 45
involved in research do not recognize the effect and quality of drug trials financed by pharmaceu-
of these conflicts on their judgments [5, 23]. ticals, the methodological quality of trials funded
Declaring or acknowledging conflicts does not by profit organizations was not found to be worse
mitigate their effects. Moral licensing occurs than trials funded through other more indepen-
when disclosure of a COI reduces feelings of dent sources. It was however suggested that pro-
guilt of the advisor, resulting in more biased tocols were influenced to favor the trial product
advice because advisees have been warned [4, and results were interpreted more favorably [22].
15]. Even though a researcher could be unbiased Another way of looking into this subject is that
and declaring no conflict of interest whatsoever, bias lies in funding decisions made by companies
one should not fail to mention the potential con- toward studies that are likely to promote their
flict of interest of editors and reviewers in rela- interests and in manipulating the process of
tion to the review or publication of research. research [19]. Finally, as far as perceptions of
financial ties to profit organizations by patients,
professionals, and researchers, the quality of
Fact Box 5.3 research evidence is thought to be compromised,
Biases from financial and/or intellectual and ties should be disclosed in order to enable
COIs may result from the ability of such readers to interpret results through their own
conflicts to affect decision-making in a way scope [14, 17].
that is completely hidden from the person
making the decision.
5.4 Management of Conflict
of Interest
Distinction between an actual conflict of inter-
est, an apparent conflict of interest, and a poten- Institutions, professional organizations, and
tial conflict of interest exists. A perceived conflict governments establish policies to address the
of interest may be as important as an actual con- problem of conflict of interest on behalf of the
flict of interest. Financial COI is an important public in order to achieve and maintain high
factor impacting the public’s trust in research. A standards of care. Such policies work best when
perceived conflict of interest can undermine the they aim at prevention or mitigation of its
public’s opinion and trust in the quality of public impact on research integrity rather than penalty
sector services, institutional processes, and pub- after occurrence [20]. Although penalizing
lic institutions [1]. researchers who violate disclosure policies may
Fear for biased results or even fake results is also assist with prevention, the presence of a
legitimate. Some researchers argue that disclo- conflict of interest does not imply that any indi-
sure policies of scientific journals about research- vidual is improperly motivated. To avoid these
ers’ financial ties to their work do little to help and similar mistakes and to provide guidance
with bias prevention or identification. for formulating and applying such policies, a
Furthermore, valuable research can be discarded framework for analyzing conflicts of interest is
by readers if financial ties are declared, even desirable. Conflict of interest policies should
though results were not affected by funding [6]. not only address concerns that financial rela-
Another way of affecting readers’ perception is tionships with industry may lead to bias or a
bias in control group choice in prospective ran- loss of trust but should also consider the poten-
domized controlled trials. Such bias can favor tial benefits of such relationships in specific
products that are made by the company funding situations.
the research, and this might be due to the selec- There are three key steps in managing conflict
tion of inappropriate comparator products and of interest: (a) identifying relevant suspicious
publication bias. In a recent study looking into relationships, (b) resolving such existing rela-
possible influences on the findings, protocols, tionships, and (c) disclosing relevant relationships
46 M. Hantes and A. Fyllos
to learners [7]. Eliminating the conflict may not In reviewing manuscripts, reviewers need to
be possible for ethical or practical reasons. hold in mind the potential for conflict of interest
Requirements for the registration of clinical and be vigilant for potential conflicts of interest.
trials are, in part, a response to concerns about Reviewers need to check that references used to
conflict of interest in industry-sponsored research support statements are being appropriately uti-
and research reporting. The registration of clini- lized, that the claims made are reasonable given
cal trials ensures that basic methods for the con- the findings of the study, that negative findings
duct and analysis of the findings of a study as are fairly reported, and that conclusions and rec-
well as the primary clinical end points to be ommendations are commensurate with the impli-
assessed and reported are specified before the cations that can be drawn from the study. Editors
trial begins and before data are analyzed [10]. In also need to consider their own individual con-
the USA, management of COI and objectivity in flicts in a similar way to authors and reviewers.
research is regulated federally by the Department They have the job of developing policy and
of Health and Human Services [8]. ensuring that authors’ and reviewers’ interests do
Essentially all medical journals require disclo- not distort publication decisions. It is the readers,
sure of potential conflicts of interest from their the healthcare professionals, that have the most
authors. Unfortunately the format of that disclo- difficult part in this sort of conflict, since they
sure is not consistent among journals. The man- have to develop their own approach to the inter-
agement of conflict of interest in academic pretation of the results. Awareness is the key, and
publishing involves multiple parties: those who the responsibility is theirs to question whether
research, those who edit and publish, and those published material may be influenced by interests
who read the research. Awareness of the possibil- of authors, reviewers, or editors [17].
ity of conflict of interest is the first step to recog-
nition, and the difficult part in managing this
complex issue is declaration of conflict of inter-
Fact Box 5.4
est. The responsibility for managing conflict of
There are three key steps in managing con-
interest in publishing is equally spread across
flict of interest: (a) identifying relevant sus-
authors, reviewers, editors, and readers.
picious relationships, (b) resolving such
For researchers and authors, there is a need to
existing relationships, and (c) disclosing
be aware of those situations that have the potential
relevant relationships to learners.
to bias results or perspectives. Potential conflicts
Eliminating the conflict may not be possi-
may include financial or in-kind support from
ble for ethical or practical reasons.
vested interests. Most clearly and easily identified
is research funding sources directly related to the
publication, but this support could also include
conference scholarships, paid speaking engage- Take-Home Message
ments, and prizes. Managing this potential con- • It is important to understand that having a con-
flict of interest in the first instance involves flict does not mean wrongdoing.
awareness and acknowledgement of potential or • A conflict is a situation that increases the
real conflict of interest and declaration on submis- potential or risk for bias to occur, and steps
sion. Adherence to sound academic principles, should be taken to assure the integrity of the
such as being true to the data, not over-claiming, research.
and not under-disclosing, is a protection that can • The goal is to increase transparency, primarily
help manage potential conflicts. In several coun- though disclosure and regulation, which
tries, in addition to reporting funding, researchers allows consumers of the research to consider
must disclose COIs in presentations and publica- potential influences related to the conflict.
tions, as well as to colleagues and, in some cases, • Sometimes careful judgment is required based
research subjects [12]. on the facts and the nature of the research.
5 Conflict of Interest 47
© ISAKOS 2019 49
V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_6
50 N. Roselaar et al.
6.2.1 Declaration of Helsinki tions under the Common Rule [40]. Revisions to
the Common Rule were proposed in 2015 by
The Declaration of Helsinki (DOH) has been HHS and 15 other federal departments and agen-
considered the gold standard for ethics in clinical cies [22]. The updates are designed to reflect
research [26]. In its current form, the DOH changes in research over the 35 years since the
applies to human subjects, data, and material inception of the Common Rule [22]. Goals
(WMA). Central tenets of the DOH protect the include enhancing respect and strengthening
health and rights of all patients involved in clini- informed consent, particularly for the long-term
cal research and advocate for the continuous use of de-identified biospecimens; enhancing
evaluation of safety, effectiveness, efficiency, safeguards by specifying privacy and security
accessibility, and quality of human subject measures concerning identifiable information;
research [46]. It was originally composed of 14 streamlining Institutional Review Board (IRB)
short statements outlining ethical guidelines for review by clarifying levels of risk and the IRB
conducting human subject research [47]. Since process for multisite studies; and calibrating
its inception, the DOH has been revised seven oversight [40]. The HHS offered the opportunity
times, 1975 (Tokyo), 1983 (Venice), 1989 (Hong for public comments on revisions until January
Kong), 1996 (Somerset, South Africa), 2000 2016 [42].
(Edinburgh), 2008 (Seoul), and 2013 (Fortaleza,
Brazil), and revised twice [46]. By 2014, the
DOH included 37 detailed principles [46]. The 6.2.3 Institutional Review Board
basis of the declaration stems from the Nuremberg
Code [5]. This seminal code of ethics was estab- The Common Rule also states that research con-
lished at the conclusion of Nuremberg trials for ducted or supported by organizations outside a
Nazi war crimes, including horrifically violent federal department or agency must be compliant
human medical experiments on Holocaust vic- with the host’s Institutional Review Board (IRB)
tims [34]. [40]. Like the Common Rule, IRBs aim to provide
ethical and regulatory oversight for research with
human subjects. On an institutional level, they
6.2.2 Common Rule ensure compliance with external laws, policies,
and regulations [27]. Both the Common Rule and
In the United States, the Department of Health IRBs operate to uphold ethical principles defined
and Human Services (HHS) issues regulations on in the Belmont Report of 1979 [40, 27]. The pri-
the ethical conduct of research on humans [22]. mary principles include respect for persons, benef-
The HHS Code of Federal Regulations 45 Part 46 icence, and justice [27]. Research approved by an
Protection of Human Subjects was developed in IRB is subject to annual continuing reviews.
1981 and updated in 2009 [40]. The policy is
known colloquially as the “Common Rule” and
protects human subjects in research conducted or 6.2.4 HIPAA
supported by a federal department or agency
[40]. It requires researchers to provide informed, In addition to the protection of subjects them-
written consent, full disclosure of the benefits selves, information that identifies subjects must
and foreseeable risks of the proposed study, and a also be protected. The Health Insurance
statement addressing subject rights to refuse par- Portability and Accountability Act of 1996 seeks
ticipation at any point [40]. Considered vulnera- to increase privacy protection for human research
ble populations, pregnant women, human fetuses participants [10]. This includes the privacy and
(definite as the “product of conception from security of information that could be used to
implantation until delivery”), neonates, prison- identify subjects in a particular study such as
ers, and children are offered additional protec- name, medical record number, birthdate, social
6 Ethics in Clinical Research 51
security number, address, or identifying photo- other fields in clinical research including cardio-
graph. As electronic medical records become vascular disease, respiratory disease, mental
more prevalent, and security and privacy issues health services, and substance abuse [2, 24, 33, 3].
extend to the online storage of identifiable data, Despite legislation, barriers in inclusion crite-
regulations must also change. In 2000, 2004, ria may prevent trials from including minority
2009, and 2013, HIPAA has been modified and populations. English language fluency is required
extended to reflect technological advances [41]. for clinical trials more and more frequently [16].
The HIPAA Privacy Rule, which protects The mistrust of healthcare professionals and the
identifiable health information, was introduced in lack of understanding of clinical research also
2000 and mandated nationally in 2003 [29]. By contribute to low rates of minority participation
protecting ownership and transfer of specific pro- [2]. Decreased access to academic institutions
tected health information (PHI), the Privacy Rule pursuing clinical research by minorities influ-
aims to safeguard health information associated ences recruitment diversity [14].
with individuals while facilitating data flow to
maximize the quality of health care [43]. PHI
comprises any information that can be used to 6.4 Ethical Publishing Practices
identify a study participant [43]. The implemen-
tation of the HIPAA Privacy Rule in clinical 6.4.1 Data Fraud and Misconduct
research has also provoked criticism. In 2007,
JAMA published a study in which more than One of the most highly publicized fraudulent stud-
two-thirds of surveyed epidemiologists perceived ies is Dr. Andrew Wakefield’s case series suggest-
“substantial, negative influence on the conduct of ing an association between the MMR vaccine and
human subjects health research” after the imple- autism [45]. Published in The Lancet, a British
mentation of the HIPAA Privacy Rule [29]. To medical journal, in 1998, Dr. Wakefield’s paper
encompass the protection of electronic PHI caught the attention of mainstream media [31].
(e-PHI), the HIPAA Security Rule (HSR) was Consequently, the rate of MMR vaccines for tod-
enacted in 2003 [44]. dlers in the United Kingdom decreased from 83.1%
in 1997 to 69.9% in 1998 [39]. A one-page com-
mentary titled, “Retraction of an Interpretation”
6.3 Ethical Population was published in 2004 by 10 of the 13 authors of
Representation the original article [25]. Simultaneously, editors at
The Lancet acknowledged a lack of financial dis-
The lack of diverse racial and ethnic representa- closures by Dr. Wakefield et al. and reaffirmed “the
tion in clinical research prevents the best possible paper’s suitability, credibility, and validity for pub-
treatment for disease outcomes in heterogeneous lication” [21]. In 2010, The Lancet retracted Dr.
populations [30]. To address this issue, Congress Wakefield’s article.
passed the National Institute of Health (NIH) This example demonstrates many important
Revitalization Act of 1993, intended to catalyze facets of the ethics of clinical research. The
the diversification of participants in clinical researchers failed to report accurate findings and
research [28]. With minimal exceptions the drew speculative conclusions from a small, non-
Revitalization Act requires NIH-funded clinical representative case series [31]. These unethical
research to include women and members of actions by the authors were compounded by irre-
minority groups [28]. sponsible publishing practices at The Lancet. The
Twenty years after the introduction of regula- publishers failed to require proper disclosure of
tory laws to diversify participation in clinical conflicts of interest, specifically those that
research, the minority participation in cancer clin- revealed Dr. Wakefield’s financial gains related to
ical trials remained minimal [7]. The lack of the research conclusions [12]. Furthermore, The
minority representation persists across many Lancet only retracted the fraudulent article
52 N. Roselaar et al.
12 years after its initial publication [15]. After the Journal Editors (ICMJE). The conflicts of interest
retraction of the article, investigative journalist included by the ICJME conflict of interest form
Brian Deer published multiple articles in the include financial activities related to the work as
BMJ revealing Dr. Wakefield’s connection to well as relevant financial activities outside the sub-
lawyer Richard Barr [12]. Barr, who was working mitted work. Relevant financial activities may
to file a lawsuit against vaccine manufacturing include relationships with a pertinent entity such
companies, provided Dr. Wakefield with as a government agency, foundation, academic
£400,000 through the Legal Aid Fund while also institution, or commercial sponsor; grants; per-
representing the anti-vaccine organization sonal fees; royalties; leadership positions; and
Justice, Awareness, and Basic Support (JABS) nonfinancial support [23].
[13]. Barr used his connection to JABS to find
patients for Dr. Wakefield’s study [12].
However, cases such as Dr. Wakefield’s are not 6.4.3 Self-Citation
common. Although data fraud is difficult to moni-
tor and likely underreported, confirmed cases of Self-citation refers to referencing an article from
data fabrication, falsification, and plagiarism exist the same journal [8]. The rate of self-citation for
among 0.01% of scientists according to the US a medical journal is defined by the number of
Public Health Service [19]. Dr. Wakefield’s will- self-citations divided by the number of total ref-
ful deceit through data falsification is classified as erences made by that journal in a specified time
fraud, while misconduct refers to honest errors in period [20]. In orthopedics journals, many fac-
ethical research practices [19]. In addition to data tors influence the differences in self-citation
fraud and misconduct, there are many other rates. Self-citation rates are highest in sub-
aspects of clinical research for which good clini- specialized journals due to their specificity [36].
cal practices must be observed. They include con- Specialized orthopedics journals including Spine,
flict of interest disclosure, self-citation, and Arthroscopy, and FAI have self-citation rates two
distinguishing predatory journals. and three times higher than the general orthope-
dics journals CORR and JBJS (Am), respectively
[36]. Rates of self-citation are categorized as
6.4.2 Conflict of Interest “high” if they are at or above 20% (JCR).
Self-citation rates are relevant in the calcula-
Conflicts of interest in clinical orthopedic research tion of medical journal impact factors [17].
are any instances of overlapping personal, finan- According to the Science Citation Index, the
cial, or academic involvement that may bias or impact factor for medical journals “measures the
influence a participant’s work. Investigators are average number of citations received in a particu-
required to submit conflict of interest statements lar year by papers published in the journal during
for project proposals, manuscript publications, and the two preceding years” [8]. Practices surround-
presentations at conferences. This benefits con- ing the manipulation of self-citation rates intro-
sumers of the research as it provides context for duce bias into impact factors [20]. For journals in
the circumstances under which research is con- which self-citations dominate the references, the
ducted. It is the responsibility of authors, editors, true contribution to the journal’s discipline may
peer reviewers, and another other staff members be misrepresented [18].
who play a role in determining the publication or
presentation of a study to disclose any relevant
conflicts of interest [23]. Internationally, many 6.4.4 Predatory Journals
orthopedics journals follow the “Recommendations
for the Conduct, Reporting, Editing and Publication An increase in the existence of open-access pub-
of Scholarly Work in Medical Journals” estab- lications with minimal or no peer review has been
lished by the International Committee of Medical accredited to the rise of spam emails [4]. Known
6 Ethics in Clinical Research 53
cal trials among cancer center leaders, investigators, 32. Rockwell DH, Yobs AR, Moore B Jr. The Tuskegee
research staff, and referring clinicians: enhancing study of untreated syphilis: the 30th year of observa-
minority participation in clinical trials (EMPaCT). tion. Arch Intern Med. 1964;114(6):792–298.
Cancer. 2014;120(Suppl 7):1097–105. 33. Santiago CD, Miranda J. Progress in improving mental
15.
Eggertson L. Lancet retracts 12-year-old arti- health services for racial-ethnic minority groups: a ten-
cle linking autism to MMR vaccines. CMAJ. year perspective. Psychiatr Serv. 2014;65(2):180–5.
2010;182(4):e199–200. 34. Shuster E. Fifty years later: the significance of the
16. Egleston BL, Pedraza O, Wong YN, Dunbrack RL, Nuremberg code. N Engl J Med. 1997;337:1436–40.
Griffin CL, Ross EA, et al. Characteristics of clinical 35. Shyam A. Predatory journals: what are they? J Orthop
trials that require participants to be fluent in English. Case Rep. 2015;5(4):1–2.
Clin Trials. 2015;12(6):618–26. 36. Siebelt M, Siebelt T, Pilot P, Bloem RM, Bhandari
17. Frandsen TF. Journal self-citations -analysing the JIF M, Poolman RW. Citation analysis of orthopaedic
mechanism. J Informetr. 2007;1(1):47–58. literature; 18 major orthopaedic journals compared
18. Garfield E. Journal self-citation in the Journal Citation for impact factor and SCImago. BMC Musculoskelet
Reports – Science Edition [Intranet]. Clarivate Disord. 2010;11(4):1–7.
Analytics. 2002 [cited 2018 Nov 13]. https://clarivate. 37. Sorokowski P, Kulczycki E, Sorokowska A, Pisanski
com/essays/journal-self-citation-jcr/. K. Predatory journals recruit fake editor. Nature.
19. George SL, Buyse M. Data fraud in clinical trials. 2017;543:481–3.
Clin Invest (Lond). 2015;5(2):161–73. 38. Strielkowski W. Predatory journals: Beall’s list is
20. Hakkalamani S, Rawal A, Hennessy MS, Parkinson missed. Nature. 2017;544:416.
RW. The impact factor of seven orthopaedic journals. 39. Thomas DR, Salmon RL, King J. Rates of first
J Bone Joint Surg [Br]. 2006;88(2):159–62. measles-mumps-rubella immunisation in Wales (UK).
21. Horton R, Murch S, Walker-Smith J, Wakefield A, Lancet. 1998;351:1927.
Hodgson H. A statement by the editors of the lancet. 40. U.S. Department of Health and Human Services.
Lancet. 2004;363:P820–1. Code of Federal Regulations Title 45 Part 46
22.
Hudson KL, Collins FS. Bringing the com- [Internet]. Office for Human Research Protections.
mon rule into the 21st century. N Engl J Med. 2009 [cited 2017 Oct 21]. https://www.hhs.gov/ohrp/
2015;373(24):2293–6. regulations-and-policy/regulations/45-cfr-46/index.
23. ICMJE. Recommendations for the conduct, reporting, html#subparta.
editing, and publication of scholarly work in medical 41. U.S. Department of Health and Human Services.
journals. 2017. HIPAA for Professionals [Internet]. Office for Civil
24. McGarry ME, McColley SA. Minorities are under- Rights. 2017 [cited 2017 Oct 21]. https://www.hhs.
represented in clinical trials of pharmaceutical gov/hipaa/for-professionals/index.html.
agents for cystic fibrosis. Ann Am Thorac Soc. 42. U.S. Department of Health and Human Services.
2016;13(10):1721–5. NPRM for revisions to the common rule. Federal
25. Murch S, Anthony A, Casson D, Malik M, Mark
Register. 2015.
B, Dhillon A, et al. Retraction of an interpretation. 43. U.S. Department of Health and Human Services.
Lancet. 2004;363:750. Summary of the HIPAA Privacy Rule [Internet].
26. Muthuswamy V. The new 2013 seventh version of Office for Civil Rights. 2013 [cited 2017 Oct 21].
the declaration of Helsinki—more old wine in a new https://www.hhs.gov/hipaa/for-professionals/privacy/
bottle? Indian J Med Ethics. 2014;11(1):2–4. laws-regulations/index.html.
27. National Institute of Environmental Health Sciences. 44. U.S. Department of Health and Human Services.
Institutional Review Board [Internet]. National Summary of the HIPAA Security Rule [Internet].
Institutes of Health. 2015 [cited 2017 Oct 21]. https:// Office for Civil Rights. 2013 [cited 2017 Oct 21].
www.niehs.nih.gov/about/boards/irb/index.cfm. https://www.hhs.gov/hipaa/for-professionals/secu-
28. National Institutes of Health. NIH policy and guide- rity/laws-regulations/index.html.
lines on the inclusion of women and minorities as 45. Wakefield A, Murch S, Anthony A, Linnell J, Casson
subjects in clinical research [Internet]. Office of D, Malik M, et al. RETRACTED: ileal-lymphoid-
Extramural Research. 2017 [cited 2017 Dec 12]. nodular hyperplasia, non-specific colitis, and per-
https://grants.nih.gov/grants/funding/women_min/ vasive developmental disorder in children. Lancet.
guidelines.htm. 1998;351:P637–41.
29. Ness RB. Influence of the HIPAA privacy rule on 46. World Medical Association. WMA Declaration of
Health Research. JAMA. 2007;298(18):2164–70. Helsinki—ethical principles for medical research
30. Oh SS, Galanter J, Thakur N, Pino-Yanes M, Barcelo involving human subjects. Helsinki; 1964.
NE, White MJ, et al. Diversity in clinical and biomed- 47. World Medical Organization. Declaration of Helsinki
ical research: a promise yet to be fulfilled. PLoS Med. (1964). Br Med J. 1996;313(7070):1448–9.
2015;12(12):e1001918. 48. Yale University Library. Choosing a journal for publi-
31. Rao T, Andrade C. The MMR vaccine and autism: cation of an article: list of suspicious journals and pub-
sensation, refutation, retraction, and fraud. Indian J lishers [Internet]. 2018 [cited 2017 Dec 19]. https://
Psychiatry. 2011;53(2):95–6. guides.library.yale.edu/c.php?g=296124&p=1973764.
Part II
How to Get Started with Clinical Research?
How to Get Started: From Idea
to Research Question
7
Lachlan M. Batty, Timothy Lording,
and Eugene T. Ek
© ISAKOS 2019 57
V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_7
58 L. M. Batty et al.
Interesting, Novel, Ethical, and Relevant. Each of • What are the risk factors for infection after
these elements should be met when developing an shoulder arthroscopy? [3]
idea into a research question as failure to recog-
nize any one element may impede progression of Studies investigating the utility of diagnostic
any investigation. modalities of a condition or pathological process
may include evaluation of clinical examination
techniques and biochemical, hematological,
7.3 Types of Research Questions pathological, or radiological investigations.
Some examples include:
Research questions can take many forms; how-
ever, many authors divide them into four main • Following medial opening wedge high tibial
categories [9, 11]. These pertain to (1) the safety osteotomy, does high-resolution computer
and efficacy of an intervention, (2) the etiology of tomography have higher sensitivity in diagno-
a pathological process, (3) the diagnosis of a con- sis of lateral hinge fracture compared to plain
dition or pathological process, or (4) the progno- radiography? [13]
sis of a condition. • What are the sensitivity and specificity of the
Investigating an “intervention” may include Lachman, anterior drawer, and pivot shift tests
an operation, modification to an operation, a for clinical diagnosis of anterior cruciate liga-
medication, or other forms of treatment, such as a ment rupture? [2]
rehabilitation program. Examples of research • Can preoperative magnetic resonance imaging
questions assessing interventions include: predict irreparable rotator cuff tears? [12]
string graft. If the diameter of the Fact Box 7.2: PICOT Criteria
graft is found to be less than P Patient/ • What specific patient
7.5 mm, semitendinosus will be tri- population population are you interested
pled (five-strand graft) providing a in?
I Intervention • What is your investigational
greater graft diameter. Femoral tun-
intervention?
nels will be drilled using an antero- C Comparison • What is the main alternative
medial portal technique, with to compare to the
suspensory femoral fixation. Tibial intervention?
fixation will be provided by inter- O Outcome • What do you intend to
accomplish, measure,
ference screw improve, or affect?
• Outcome T Time • What is the appropriate
–– Primary: graft failure at 24 months. follow-up time to assess the
This is defined as symptomatic insta- outcome?
bility requiring revision ACL surgery Reproduced with permission from: Farrugia,
or a positive pivot shift or asymmet- P., et al., Research questions, hypotheses and
rical pivot shift greater than other objectives. Canadian Journal of Surgery,
contralateral sides 2010. 53(4): p. 278
–– Secondary: patient-reported, radio-
logical, and clinical examination
findings are listed
Take-Home Messages
• A structured and answerable research ques-
tion is an essential prerequisite to the develop-
ment of a research project.
Fact Box 7.1: FINER Criteria for a Good • Confirmation of a knowledge gap is the initial
Research Question step in the process from idea to research ques-
F Feasible • Adequate number of subjects tion and requires reviewing the available lit-
• Adequate technical expertise erature and ongoing research.
• Affordable in time and money • The PICO approach provides a basic structure
• Manageable in scope
for a well-formed clinical question.
I Interesting • Getting the answer intrigues
investigator, peers, and • Attention to detail and clear definitions are
community required for each element of the PICO model.
N Novel • Confirms, refutes, or extends • Time spent developing a well-considered
previous findings research question before commencing the
E Ethical • Amenable to a study that
institutional review board will
research trial can prevent problems down the
approve track.
R Relevant • To scientific knowledge
• To clinical and health policy
• To future research 7.7 Resources/Websites
Reproduced with permission from: Farrugia,
P., et al., Research questions, hypotheses and https://clinicaltrials.gov—a database of privately
objectives. Canadian Journal of Surgery, and publicly funded clinical studies conducted
2010. 53(4): p. 278 around the world.
7 How to Get Started: From Idea to Research Question 63
References 12. Kim JY, Park JS, Rhee YG. Can preoperative mag-
netic resonance imaging predict the reparability
of massive rotator cuff tears? Am J Sports Med.
1. Ardern CL, Taylor NF, Feller JA, Webster KE. Return- 2017;45(7):1654–63.
to-sport outcomes at 2 to 7 years after anterior cruciate 13. Lee O-S, Lee YS. Diagnostic value of computed
ligament reconstruction surgery. Am J Sports Med. tomography and risk factors for lateral hinge fracture
2012;40(1):41–8. in the open wedge high tibial osteotomy. Arthroscopy.
2. Benjaminse A, Gokeler A, van der Schans CP. Clinical 2018, 34(4):1032–43.
diagnosis of an anterior cruciate ligament rup- 14. Multicenter randomized clinical trial comparing ante-
ture: a meta-analysis. J Orthop Sports Phys Ther. rior cruciate ligament reconstruction with and without
2006;36(5):267–88. lateral extra-articular tenodesis in individuals who are
3. Cancienne JM, Brockmeier SF, Carson EW, Werner at high risk of graft failure 2018. https://clinicaltrials.
BC. Risk factors for infection after shoulder arthros- gov/ct2/show/NCT02018354.
copy in a large medicare population. Am J Sports 15. Oh JH, Seo HJ, Lee Y-H, Choi H-Y, Joung HY, Kim
Med. 2018;46(4):809–14. 0363546517749212. SH. Do selective COX-2 inhibitors affect pain control
4. Chalmers PN, Salazar DH, Steger-May K, and healing after arthroscopic rotator cuff repair? A
Chamberlain AM, Stobbs-Cucchi G, Yamaguchi K, preliminary study. Am J Sports Med. 2018;46(3):679–
et al. Radiographic progression of arthritic changes 86. 0363546517744219.
in shoulders with degenerative rotator cuff tears. J 16. Rios LP, Ye C, Thabane L. Association between fram-
Shoulder Elb Surg. 2016;25(11):1749–55. ing of the research question using the PICOT format
5. Farrugia P, Petrisor BA, Farrokhyar F, Bhandari and reporting quality of randomized controlled trials.
M. Research questions, hypotheses and objectives. BMC Med Res Methodol. 2010;10(1):11.
Can J Surg. 2010;53(4):278. 17. Sackett DL, Rosenberg WM, Gray JM, Haynes RB,
6. Health UNIo. ClinicalTrials.gov. 2012. Richardson WS. Evidence based medicine: what it is
7. Hewison CE. STABILITY Study: a multicentre RCT and what it isn’t. BMJ. 1996;312(7023):71–2.
comparing ACL reconstruction with and without lat- 18. Sanders TL, Kremers HM, Bryan AJ, Fruth KM,
eral extra-articular tenodesis for individuals at high Larson DR, Pareek A, et al. Is anterior cruciate liga-
risk of graft failure. Electronic Thesis and Dissertation ment reconstruction effective in preventing secondary
Repository, 3199. 2015. https://ir.lib.uwo.ca/etd/3199. meniscal tears and osteoarthritis? Am J Sports Med.
8. Hewison CE, Tran MN, Kaniki N, Remtulla A, Bryant 2016;44(7):1699–707.
D, Getgood AM. Lateral extra-articular tenodesis 19. Thabane L, Thomas T, Ye C, Paul J. Posing the
reduces rotational laxity when combined with ante- research question: not so simple. Can J Anesth. 2009;
rior cruciate ligament reconstruction: a systematic 56(1):71.
review of the literature. Arthroscopy. 2015;31(10): 20. Von Porat A, Roos E, Roos H. High prevalence of
2022–34. osteoarthritis 14 years after an anterior cruciate liga-
9. Hoffmann T, Bennett S, Del Mar C. Evidence- ment tear in male soccer players: a study of radio-
based practice across the health professions-e-Book. graphic and patient relevant outcomes. Ann Rheum
Amsterdam: Elsevier Health Sciences; 2013. Dis. 2004;63(3):269–73.
10. Hulley SB, Cummings SR, Browner WS, Grady
21. Willits K, Amendola A, Bryant D, Mohtadi NG,
DG, Newman TB. Designing clinical research. Giffin JR, Fowler P, et al. Operative versus nonop-
Philadelphia: Lippincott Williams & Wilkins; 2013. erative treatment of acute Achilles tendon ruptures: a
11.
Kaura A. Crash course evidence-based medi- multicenter randomized trial using accelerated func-
cine: reading and writing medical papers-e-Book. tional rehabilitation. JBJS. 2010;92(17):2767–75.
Amsterdam: Elsevier Health Sciences; 2013.
How to Write a Study Protocol
8
Lukas B. Moser and Michael T. Hirschmann
8.1 Introduction facilitate the first steps into the unknown territory
of research [4].
This first step of getting the research idea orga- Secondly, in many countries young orthopedic
nized may be difficult for the beginner. However, trainees may not be prepared with a basic scien-
it is also an important step. In the further course tific skillset at medical school. Specific courses
of the research, the young scientist will find out may be rare and often do not focus enough on the
that writing a brief, concise, and comprehensive challenges of junior researchers [3].
study protocol will save time and problems in the Thirdly, from a young researcher’s perspec-
course of the study. If done properly it is defi- tive, it appears to be a waste of time to spend their
nitely one of the most rewarding parts of the valuable time on preparation of a study protocol.
research project as it facilitates later execution of It is the time of publish or perish in which the
the study and the subsequent writing process (see young researcher is driven by the ultimate goal to
Chapter 11.9). get the study done, to present at a conference and
Many junior researchers have difficulties finally get it published.
starting their first research project. This might be Fourthly, there is always a next conference and
due to a number of reasons. submission deadline to meet. This might mislead
Firstly, most of the scientific project ideas young researchers to gather overhasty data in
evolve from problems more senior orthopedic order to write their abstract in time. Please do not
surgeons encounter during their daily practice. feel rushed by upcoming deadlines. This inevita-
The seniors pass the study idea over to the bly decreases the quality of the scientific work,
younger researchers, who often do not fully and the abstract might even get rejected [3].
understand it at first sight as they do not have the Clearly, there is no alternative to writing a
necessary broad and deep orthopedic knowledge. proper study protocol. Therefore, writing a good
Hence, good mentorship from senior orthopedic study protocol is time well spent facilitating your
researchers is of utmost importance and will further study. It is also mandatory for ethical
committee or grant application.
L. B. Moser · M. T. Hirschmann (*)
Department of Orthopaedic Surgery and
Traumatology, Kantonsspital Baselland
(Bruderholz, Liestal, Laufen),
8.2 What Is a Study Protocol?
Bruderholz, Switzerland
A study protocol is the central documentation of
University of Basel, Basel, Switzerland
e-mail: michael.hirschmann@ksbl.ch; the research project including scientific, ethical,
michael.hirschmann@unibas.ch and regulatory considerations. It should provide
© ISAKOS 2019 65
V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_8
66 L. B. Moser and M. T. Hirschmann
detailed information on planning and execution Every research protocol follows a similar
of the research project. The study protocol serves basic structure. However, it is important to know
as a comprehensive guide and also represents the that the structure of a research protocol might
main document for external evaluation of the vary from one to another study. This is due to the
study (e.g., ethical committee, grant authorities). fact that study protocols written for ethical com-
However, the purpose of the study protocol is mittee approval might be slightly different than
to give a concise description of the study idea, protocols written for grant authorities. It is also
plan, and further analysis. The writing style mandatory to follow exactly the instructions
should be brief and concise. It needs to be easy to given by the grant authority or local ethical com-
understand even for laypersons without a medical mittee [9].
background. Generally, the study protocol can be divided
Think of a study protocol as a recipe, which into the following parts:
should enable the cook/reader to prepare an iden-
tical meal [9]. • Title page
• Background
• Study objectives (aims)
• Hypothesis
As a study is often the result of an interdis- • Material and methods
ciplinary collaboration and involves a vari- –– Study design
ety of different professions such as –– Subjects
clinicians, scientists, or statisticians, all –– Sample size
groups should be involved at this early –– Study procedures
stage. Significant input from all groups –– Outcome instruments
involved is necessary. –– Data collection
• Data management
• Statistical analysis
• Ethical considerations
A well-written study protocol is necessary for: • Time points and timeline
• Conflict of interest
• Application for ethical approval at the local
• Insurance
ethical committee.
• Reference section
• Application for a scientific grant at grant
• Annexes
authority.
• It helps to structure and define the study idea To help getting started, the researcher should
in the discussion with the collaborators. consider the following questions to be answered:
• It prompts rethinking the study plan and
reveals possible problems and obstacles at an • What is the clinical problem to be addressed
early stage. by this study?
• Clearly defines the responsibility of each • What is already known about the problem and
researcher involved. topic?
• Defines a budget and funding plan for the • What is the study design in the study?
study. • What are the inclusion and exclusion criteria?
• Defines a clear timeline for each step of the • What study subjects are investigated?
study. • What are the outcome instruments chosen?
• Helps to monitor the milestones and study • What are the primary or secondary endpoints
progress. of the study?
• Facilitates writing of the later scientific • What are the interventions?
article [14]. • What is the experimental setup?
8 How to Write a Study Protocol 67
• How is data collection organized? ence is really needed to lead the reader to the
• How is the data processed and analyzed? purpose of the study. It also helps to stay
• What are the statistical methods? focused [14].
• Are there any ethical problems to be It is of utmost importance to clarify why the
considered? researcher is conducting the study. The researcher
• What is the timeline? should make the reader understand why the
• What milestones are set? research project is planned and what the original
• How is funding organized [22]? study idea is. As this requires a good understand-
ing of the current knowledge in this field, a sys-
Having specific answers to the aforemen- tematic review of the current literature should be
tioned questions helps to fill every part of the performed. It facilitates the summary of current
study protocol. Hence, we can move on and get it evidence and allows to put the study idea into the
started step by step. broader scientific context [16]. The red thread of
the background section is to start from the eagle’s
perspective and come closer to the topic by every
8.3 Title Page sentence written [9].
Defining the research question is definitely the
The title page contains the most important infor- heart of the protocol. The research question pro-
mation at a single glance. Here, the researcher posed should be able to fill an existing gap of
should give the full title of the study along with the knowledge. Ideally, it should represent the logi-
names and affiliations of all people involved [4]. cal consequence of the background explained.
The project title needs to be wisely chosen as The question asked here should be as precise as
it should closely reflect the content of the study. possible [16].
Keep it brief and concise. The title should not This part decides the fate of the study. Clearly,
pose a question, but summarize the main objec- if the researcher is not able to give a proper study
tive including the study type such as “random- question and explain the hypothesis here, better
ized controlled trial.” It might also mention the reconsider and do not waste the personal resources
subjects to be investigated [14]. and time for this study [14] (Tables 8.1 and 8.2).
The affiliations of each study team member A major pitfall of study protocols is that many
need to be complete. For every member of the do not have a single study question and others
study team, the contact information including just have too many. There might be several study
e-mail address and telephone and fax number
should be given here. Table 8.1 The mnemonic FINER will help the researcher
If applicable state the study sponsor, in the with identification of a good research question and study
case the study is supported by a company or grant idea [7]
authority [4]. F Feasible in terms of recruitment, expertise, funding,
resources, and scope
I Interesting to the scientific community
N Novel, providing new findings and confirming or
8.4 Background rejecting previous findings
E Ethical approval should be possible
The background of the topic should be clearly R Relevant to scientists, clinicians, and future research
described. Meticulously guide the reader into
the topic while avoiding irrelevant informa- Table 8.2 The PICO acronym helps to formulate a
tion. Do not spend more than two pages on the testable question [15]
background information. As a rule of thumb, P Patient, problem
also limit the scientific articles for reference I Intervention or exposure
to less than 20–30. Only describe the most C Comparison
important ones here. Always ask if this refer- O Outcome
68 L. B. Moser and M. T. Hirschmann
questions of interest with regard to the research known to be possible causes of patellar overstuff-
project that need to be answered. However, the ing inducing a higher patellar BTU, these possi-
researcher should stay focused on the most ble bias parameters will be calculated and
important research question. Secondary research compared between the two groups.
questions are often of explorative nature, because The secondary purpose is to compare postop-
sample size calculation is based on the primary erative functional results measured at 1 and
research question. Do not overload the project 2 years postoperative.
with too many research questions. The researcher An exact definition of outcomes is mandatory to
will then lose the red threat and the reader will be standardize the outcome measures. Note the time
confused. The researcher also has to consider that of assessment and the unit of measures. If previous
each research question requires a hypothesis. definitions or validations are preferred, do not for-
Defining the research questions and objectives in get to quote the references. If the researcher com-
advance helps the researcher to avoid reporting pares outcomes, (s)he needs to state an overall goal
significant results more likely than negative of the comparison in the protocol. The reader
results (outcome reporting bias) [16]. should understand if the goal was to show superior-
ity, equivalence, or non-inferiority [16].
Sometimes, outcomes are not easy to deter-
8.5 Study Objectives (Aims) mine. Some outcomes might be subjective (e.g.,
pain); others are more objectively measurable
After performing the literature review with a (e.g., range of motion).
focus on the research question, the researcher Surrogate endpoints are alternative endpoints
needs to define the objectives, outcomes, and that are faster to assess than long-term clinical
hypothesis. A clear definition of these factors is endpoints [8]. The effect of the intervention on
recommended [9] (Table 8.3). the surrogate endpoint has to correspond to the
Choose the study objectives wisely and effect on clinical outcome. However, this effect is
restrictively. If the researcher decides to use more often difficult to predict, and therefore surrogate
objectives, (s)he should distinguish between pri- endpoints should only carefully be used [12].
mary and secondary objectives or outcomes. Furthermore, the researcher needs to identify
Altogether, do not choose more than three to five all potential confounders. Confounders are
objectives [4]. defined as additional factors that distort the effect
Practical Case Example of the treatment or exposure on a predicted out-
A clinical study investigating if the femoral come. However, they have an impact on both, the
prosthetic design in total knee arthroplasty exposure and outcome, while not being part of
(TKA) influences the patellar loading and func- the causal pathway. Confounding factors might
tional scores in TKA with unresurfaced patellae lead to the situation that falsely a correlation
The primary purpose of the study is to investi- between treatment and outcome might be seen or
gate if the femoral shape of the different TKA a true relationship is hidden. Therefore, con-
models, (group P) and (group A), influences the founders should be carefully considered. Ideally,
BTU distribution pattern at SPECT-CT in TKA confounders are equally distributed in random-
with unresurfaced patellae. Since femoral TKA ized controlled studies. Be aware that this is not
malrotation, patellar height, and TT-TG are the case in observational data [18].
Table 8.3 It helps when the researcher defines the objec-
tives of the study with regard to the SMART criteria [6]
Clinical Vignette
S Specific
M Measurable Age being a confounder in a study investi-
A Achievable gating the association of physical activity
R Realistic and knee pain
T Time-related
8 How to Write a Study Protocol 69
8.9 Subjects
Yes
In this section, the researcher should clearly describe
what is going to be done with the s ubjects, how, and
Invite subject to
participate in study when. Provide a detailed description of the study
procedure with the planned interventions.
Consent process
successful?
8.12 Data Collection
Yes
Subjects enrolled?
Describe clearly who and how to obtain the data,
Fig. 8.1 Flowchart showing enrollment process
how to record it, and how long it will be stored
[4]. Name the locations where the data is going to
be collected, relevant events regarding recruit-
needs to be based on a sample size calculation.
ments, exposer, or follow-up [20].
Keep in mind that an inclusion of vulnerable pop-
State the data-collecting instruments (e.g.,
ulation (children, cognitively impaired adults)
electronic forms or paper forms). Explain the
requires special justification [5].
validation procedure for the instruments used.
Provide a detailed and unambiguous list of the
For outcome instruments, provide original refer-
eligibility criteria as in the example below:
ences. If the researcher is going to develop a
Enclose a statement which states that the
novel type of data-collecting instruments, as a
participation of the patient is entirely volun-
checklist or a questionnaire, the researcher needs
tary, and an unexplained withdrawal is possible
to provide all information. Do not forget to annex
at any time having no impact on patient man-
the original document to the study protocol.
agement [16].
Practical Case Example
Practical Case Example
This is a retrospective case series study. No
Any subject is entitled to withdraw from this
additional examinations other than the clinical
clinical investigation for any reason without obli-
routine will be conducted. All data is stored in a
gation or prejudice to further treatment.
centralized electronic database at the orthopedic
Declare how the patient consented to partici-
research unit (…). The data has been already col-
pate in the study.
lected in our ethical committee approval registry.
Practical Case Example
The patient shall give his written consent to
participate in the clinical documentation. The 8.13 Data Management
signed consent form remains within the clinical
study case file. The researcher needs to make a comprehensive
statement on the data management including data
entry and monitoring. Describe how the
8.10 Sample Size researcher plans to ensure privacy protection
(e.g., anonymization) and how long the data is
Choosing the appropriate study sample is para- stored. Declare who is able to access the data,
mount for successful research. The study sam- and state that all involved people underline medi-
ple ensures the sufficient power of the study and cal confidentiality. Add the statement by the IEC
might allow conclusive inferences [16]. and regulatory authorities for permitting data
Consider that having a small sample size might access, if applicable [16].
not be powerful enough to prove a difference, Mention how to introduce the data into the
although true difference actually exists. database. Describe what programs the researcher
However, a too large sample size might be a will use and how to perform data analysis. The
waste of resources [14]. best approach is to make a data analysis plan
8 How to Write a Study Protocol 71
Jan Feb Mar Apr May Jun Jul Aug Sep Oc t Nov Dec
Writing the study
protocol
Data collection
Data analysis
Discussion of results
among research staff
Writing of scientific
article
Submission to peer-
reviewed journal
8.17 Conflict of Interest sate serious adverse events (as injuries) the
patients suffer during their participation.
Do not forget to disclose the conflicts of interests, However, rules and requirements of trial insur-
such as nonfinancial or financial relationships ance vary from country to country. Therefore, the
with the industry. If applicable, state the study researcher needs to check the local insurance
sponsor. Declare how the sponsor contributed policy [10].
and what benefits the researcher will obtain by Practical Case Example 1
this agreement [4]. The sponsor will secure and maintain in full
These disclosures facilitate the identification force and effect, throughout the duration of the
of possible sponsor bias. Most journals require a clinical investigation, a clinical trial insurance
disclosure form of all contributing authors. A which was required in line with national regula-
short statement of relevant information will be tions and which will be evidenced by an insur-
listed on the paper in abbreviated form [17]. ance certificate.
Practical Case Example 1 Practical Case Example 2
One of the authors (XY) is a consultant for No subject insurance is required as it is a ret-
(company). rospective study and the patient is not put at any
Practical Case Example 2 risk or has to show up for follow-up.
The study was supported by a financial grant
from (company). The external funding source did
not have any influence on the investigation. 8.19 Reference Section
Practical Case Example 3
The authors of this paper declare no conflict of The reference section should be organized as
interest. in a scientific paper (see Chapter 11.9).
Provide original references, and number them
consecutively or in the order requested. For
8.18 Insurance every statement made, a citation should be
given. It is important that only articles are
The sponsor of the study is obliged to provide cited which really add to the study protocol.
clinical trial insurance. If the researcher is con- There should not be more than 30–40 refer-
ducting a clinical study, (s)he needs to compen- ences in the reference list [4].
8 How to Write a Study Protocol 73
National Research Act of 1974 to identify ethical Institutional Review Board (IRB). The basic pro-
principles and guidelines for conducting human visions of the IRB are outlined by the Federal
research in response to the Tuskegee syphilis Policy for the Protection of Human Subjects, or
experiment [5]. The Tuskegee syphilis experiment “Common Rule” was published and codified by
was a prospective clinical study conducted between 15 federal department and agencies [3]. The
1932 and 1972 by US Public Health Service to Common Rule also outlines the basic provisions
study the natural history of untreated syphilis in for IRB, informed consent, and assurances of
rural African-American males in Alabama. The compliance [3]. Any research that is conducted
men were never told of their diagnosis, and by or for these federal agencies must abide by
although penicillin was found to be an effective the “basic policy for protection of human
cure for the disease, treatment was withheld [16]. A research subjects” (also known as Subpart A,
whistle-blower by the name of Peter Buxtun Part 46, Protection of Human Subjects, of Title
revealed the story to the press, leading to public 45, Public Welfare, in Code of Federal
outrage and major changes in US law and regula- Regulations (46 CFR 45)).
tion on how human research is conducted [2, 16]. The IRB is an independent, administrative
Other controversial research projects in the United group tasked with the responsibility of reviewing
States during this era include the Stanford prison and approving research on human subjects, with
experiment (1971), where the participants were the purpose of protecting the rights and welfare
unable to withdraw from the study [20], and Project of human subjects. IRBs, and human subject
MKUltra (1950s–1973) where subjects were not research in general, are regulated by the Office
informed of their participation in the studies [14]. for Human Research Protections (OHRP), an
In 1978, the National Commission for the organization within the Department of Health
Protection of Human Subjects of Biomedical and and Human Services (HHS). The goal of the IRB
Behavioral Research issued the Belmont Report, is to insure that proper informed consent is
which established three core principles (Fact Box obtained and documented, risks to subjects are
9.2) for research involving human studies: respect minimized, research design is sound and do not
for persons, beneficence, and justice [18]. The appli- unnecessarily expose subjects to risk, patient
cations of these principles led to the consideration of selection is equitable, appropriate data monitor-
the following requirements: informed consent, risk/ ing provisions are in place, and privacy and con-
benefit analysis, and patient selection [17]. fidentiality of the subjects are maintained [3].
The IRB also has the power to terminate or sus-
pend any research that is not in accordance with
Fact Box 9.2: Belmont Report, Three Core the policy [3]. The IRB is comprised of scientists,
Principles for Research Involving Human lay community members, physicians, and law-
Studies [18] yers [3]. The average size of the IRB is 14 mem-
1 . Respect for persons bers. One study found internal medicine to be
2. Beneficence most commonly represented and orthopedic sur-
3. Justice gery to be the least represented, among physician
members of IRBs [10].
The Common Rule also includes regulations
9.3 ommon Rule (Part 46,
C for addressing vulnerable populations, such as
Protection of Human pregnant women, fetuses, in vitro fertilization,
Subjects, of Title 45, Public prisoners, and children. Subpart B of 46 CFR 45
Welfare, in Code of Federal affords special protections to pregnant women,
Regulations) fetuses, and neonates of uncertain viability or
nonviable neonates. Research must directly ben-
The National Research Act of 1974 established a efit the mother and/or the fetus; if not, risks to the
set of guidelines for research involving human fetus must be minimal, and the purpose of the
subjects, introducing the concept of the study must be “the development of important
78 K. Takamura and F. Petrigliano
biomedical knowledge that cannot be obtained and well-being of adolescents and cannot reason-
by any other means.” Subpart C of 46 CFR 45 ably or practically be carried out without the wav-
affords special protections to prisoners, to ensure ier” or if the research involves treatments that
that they are not exploited, but at the same time adolescents can receive without parental permis-
given equal opportunity to participate in research sion (may differ by state law) [3, 8]. Additionally,
studies. the investigators also need to present evidence that
the adolescents have the ability to understand the
research and their rights as participants and the
9.4 Research in Children research protocol must contain safeguards to pro-
and Adolescents tect the interests of the adolescent, consistent with
the risk presented to them [8].
Children and adolescents comprise an important
group of study participants in orthopedics.
Importantly, this is a vulnerable population that 9.5 IRB Submission
warrants additional protections. The risks and and Approval Process
benefits must be carefully evaluated, to prioritize
the welfare of the patient, while recognizing the The definition of research involving human sub-
positive potential benefits of research. This risk/ jects set forth by the common law [45 CFR
benefit evaluation often impacts participant 46.102 (d)] states that it is “[a] systematic inves-
inclusion/exclusion criteria and consideration for tigation, including research development, testing
how the standard of care may be affected by and evaluation, designed to develop or contribute
research activities [8]. to generalizable knowledge” [3]. A majority of
One of the major differences in terms of research involving human subjects adheres to this
research in children is that, by definition, they are definition and, subsequently, requires IRB
unable to provide informed consent [5]. Instead, approval, but there is a subset of research that
children can provide assent, which is defined in appears to adhere to this definition that does not
the policy as “a child’s affirmative agreement to require IRB oversight. This includes certain qual-
participate in research” (§46. 402 [b]), and par- ity improvement and quality assurance initia-
ents or legal guardians can grant permission for tives, case reports, and case reviews [13]. The
their child to participate (§46. 402 [c]). The pro- Food and Drug Administration (FDA) provides
cess of obtaining assent should address the devel- two general guidelines indicating that a quality
opmental stage of the child and provide improvement or quality assurance initiatives con-
opportunities for the child to discuss their will- stitute human research if the investigators will
ingness or unwillingness to participate, the seek publication in a scientific/national journal or
degree to which the child has control over the presentation at a national meeting or if the results
participation decision, and whether certain infor- will be applicable in a wider setting [13]. With
mation will or will not be shared with the parents regard to case series or case reports, there is no
[8]. Typically, local IRBs will provide guidelines regulatory guidance on this particular issue [13].
for the age and conditions where a child’s assent If there are any questions of whether IRB
is required. approval is required for a particular study, it is
If the research involves acute illnesses or inju- advisable to consult with the IRB prior to starting
ries, the investigators and IRB should provide for the study.
“ongoing process for permission and assent” to The IRB submission process can be an ardu-
accommodate for the evolving understanding in ous and time-consuming task that may involve
the changes of the child’s medical and mental con- multiple modifications and revisions. It can espe-
dition [8]. Waivers of parental permission for ado- cially be cumbersome in multicenter trials involv-
lescent participation should be considered by the ing multiple IRBs, and the variability between
IRB when the “research is important to the health institutions can hinder multi-institutional
9 The Ethical Approval Process 79
research [4]. One study in the United Kingdom Table 9.1 OHRP expedited review categories [3]
found that only 24% of studies submitted were 1. Clinical studies of drugs and medical devices only if
approved without modifications [11]. Common one of the following conditions are met:
reasons for proposal rejection were improperly (a) Research on drugs for which an investigational
new drug application is not required
designed consent form, poor study design, unac-
(b) Research on medical devices for which an
ceptable risk to subjects, and ethical and legal investigational device exemption application is
reasons (Fact Box 9.3) [10]. Some suggestions to not required or the medical device is cleared
the young investigator navigating the IRB include and approved for marketing and is being used
collaborating with an experienced mentor, famil- for which it has been cleared and approved for
2. Collection of blood samples (finger stick, heel stick,
iarizing oneself with the IRB guidelines and pro- ear stick, venipuncture)
cedures of the research site(s), and discussing the 3. Prospective collection of biological samples for
protocol with the IRB prior to submission [10]. research purposes by noninvasive means
4. Collection of data through noninvasive measures
(excluding X-rays and microwaves)
5. Research involving materials that have been
Fact Box 9.3: Common Reasons for Proposal collected or will be collected for non-research
Rejection [10] purposes
6. Collection of voice, video, digital, or image
1 . Improperly designed consent form
recordings for research
2. Poor study design 7. Research on individual or group characteristics or
3. Unacceptable risk to subjects behavior or research involving interviews, surveys, etc.
4. Ethical and legal reasons 8. Continuing review of research previously approved
by IRB if:
(a) Enrollment of new subjects is closed or if all
subjects have completed the research-related
There are three levels of review that are identi- interventions or if the research remains active
only for long-term follow-ups
fied by federal regulation: expedited review, full or
(b) No subjects have been enrolled and no
convened review, and exemption from review. If additional risks have been identified
the study poses only “minimal risk” to the subject, (c) Remaining research activities are limited to
the study may be suitable for expedited review. data analysis
“Minimal risk” is defined as such “that the proba- 9. Continuing review of research (not conducted under an
bility and magnitude of harm or discomfort antici- investigational new drug application or investigational
device exemption that does not fit under items 2
pated in the research are not greater in and of through 8) for which the IRB has determined that the
themselves than those ordinarily encountered in research involves no greater than minimal risk and no
daily life or during the performance of routine additional risks have been identified
physical or psychological examination or tests” * This concise summary is modified and abbreviated from
[45 CFR 46.102(i)]. Studies that may be suitable the OHRP Expedited Review Categories [3]. Refer to the
OHRP for complete information
for expedited review based on aforementioned
definition are shown in Table 9.1. For minimal risk
studies, there is considerable variability in the IRB uments, specimens, and taste and food quality eval-
process [7]. These studies are not reviewed by the uation (for full list, see 45 CFR 46.101 (b)).
full IRB and typically reviewed by a subcommit- However, an exemption of a study must be made by
tee or administratively within the office [13]. the IRB, and no further notification is required
A study may be exempt from review if it meets from the IRB if that status is granted. A study
one of six of the federally defined exempt catego- requires full review if it poses “greater than mini-
ries. Examples include research conducted in mal risk.” Examples include Phase I, II, and III
established or commonly accepted educational clinical trials, studies involving vulnerable popula-
settings, retrospective review of existing data, doc- tions, and studies including investigational devices.
80 K. Takamura and F. Petrigliano
© ISAKOS 2019 83
V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_10
84 Y. Hoshino and A. Barnechea
Patient-reported outcome instruments: These –– General health status: They report the patient’s
outcome instruments are subjective and are overall well-being and functionality; they can
reported by the patient such as: be applied to multiple medical etiologies and
across multiple patients with different cultural
• Degree of shoulder pain at rest measured by and educational backgrounds [14]
the visual analogue scale –– Disease/joint/region specific: They measure a
• Ability to perform activities such as opening a condition’s effect on the patient. Disease-
door knob, twisting a jar cap, or walking to the specific measurements focus on a subgroup of
bathroom from the bedroom patients affected by a condition and can mea-
• Degree of return to the patient’s previous sports sure the effect of changes in it. They have the
level advantage of measuring general effects of a dis-
ease other than those related to its location.
Also, patient-reported experience measures Region specific (or joint specific) measures the
(PREMs) are instruments for real-time feedback effect of any condition that affects a specific part
about the experience of patients’ quality of care. of the body; these measurements can be applied
Three domains have been described: patient safety to diverse etiologies that affect a specific region.
(includes hygiene and physical safety), clinical The disadvantage sometimes is that they cannot
effectiveness (results of procedures and aftercare), differentiate the etiology if more than one is
and patient experience (compassion, dignity, present (e.g., the ASES self- evaluation form
respect, etc.). It has been shown there is a weak score can be affected if the patient has both a
association between PREM and PROM results: In rotator cuff injury and a secondary frozen shoul-
an English study, this was shown after hip and knee der). They have the advantage that they can
surgery using institutional PREMs and well-vali- measure small changes on a specific region and
dated PROMs (Oxford hip and knee scores) [5]. are relatively easy to apply [14]
10 How to Assess Patient’s Outcome? 85
The MDC is calculated from the ICC and the tional outcome measures to assess the clinical
standard deviation of the outcome measure and and/or scientific impact of the study results.
implies the range of possible measurement error. Most clinical outcome measures do not require
If the change of an outcome measure is lower administration cost, but, in rare cases, e.g., SF-36,
than its MDC, this change can technically be additional cost is needed for collecting and ana-
regarded as nothing. Even if the change of an out- lyzing the data and limits the application of the
come measure is more than MDC, it is still unde- outcome measure.
termined if the patients perceive improvement or
deterioration. The difference over the MCID
would be highly likely to affect the patient
Fact Box 10.2
impression significantly. Each outcome measure
Here are the issues which should be consid-
has its MDC and MCID [1, 2, 11, 13, 15–17, 22,
ered to find the best selection of outcome
23, 25, 28], but they are not always referable in
instruments for your study. Technically
the literature especially for some new outcome
solid and practically usable PROM can be
measures. The researcher should be aware of the
selected through these items.
responsiveness of their selected outcome mea-
sure and would be better to select those whose
• Technical issues
MDC and MCID are available.
–– Is it reliable?
Further selection process goes through con-
ICC and SEM should be checked for
sideration of their practical administration. There
continuous data while Cohen
are some factors which should be considered,
coefficient for discrete data.
such as usability, applicability, and affordability.
–– Is it valid for your research purpose?
Questionnaires should be easy to use and under-
–– Is it responsive enough to answer
standable. Only then lay persons such as patients
your research question?
can answer these without any trouble. Language
MDC and MCID of the outcome
problems might happen in some non-English
measure should be checked.
speaking countries. Because majority of the out-
• Practical issues
come instrument questionnaires are written in
–– Is it commonly used in the same field
English, proper translation and cross-cultural
of research?
adaptation of the questionnaires are necessary. A
–– Is it simple to use and analyze?
properly translated version of the questionnaire
–– Is it readable and understandable for
would be very helpful, but they are not always
your patients?
available for some languages. Mistranslation or
–– Is it available and affordable for you?
misunderstanding by the patients might impair
the quality of the outcome instrument. If there is
no available translated version of the question-
naire, the researcher should make his/her own
translation and go through validation process. Clinical Vignette
Also, the scoring should be simple to interpret A physician innovated a new postoperative
the results with minimal effort. rehabilitation protocol after the ACL recon-
When you design a clinical study, at least one struction. He wanted to demonstrate the
PROM and one clinician-based outcome instru- clinical advantage of the newly developed
ment which are widely accepted in your field of rehabilitation method and considered a clin-
interest should be utilized for each study. ical research. The purpose of this research
Otherwise, the results would not be compared to was determined to compare the short-term,
those reported in the previous papers. Rarely i.e., 2-year, clinical outcome after the ACL
used outcome measures might need other addi- reconstruction between conventional and
10 How to Assess Patient’s Outcome? 87
19. MacDermid JC, Turgeon T, Richards RS, Beadle M, and Patient Generated Index in people with chronic
Roth JH. Patient rating of wrist pain and disability: knee pain commenced on oral analgesia: analysis of
a reliable and valid measurement tool. J Orthop data from a randomised controlled clinical trial. Qual
Trauma. 1998;12(8):577–86. Life Res. 2017;26(3):761–6.
20. Martin RL, Philippon MJ. Evidence of validity for the 26. Richards RR, An KN, Bigliani LU, Friedman RJ,
Hip Outcome Score in hip arthroscopy. Arthroscopy. Gartsman GM, Gristina AG, et al. A standardized
2007;23:822–6. method for the assessment of shoulder function. J
21. Martin RL, Irrgang JJ, Burdett RG, Conti SF, Van Shoulder Elb Surg. 1994;3:347–52.
Swearingen JM. Evidence of validity for the Foot 27. Roos EM, Lohmander LS. The Knee injury and
and Ankle Ability Measure (FAAM). Foot Ankle Int. Osteoarthritis Outcome Score (KOOS): from joint
2005;26:968–83. injury to osteoarthritis. Health Qual Life Outcomes.
22. Marx RG, Jones EC, Allen AA, Altchek DW,
2003;1:64.
O’Brien SJ, Rodeo SA, et al. Reliability, valid- 28.
SooHoo NF, Li Z, Chenok KE, Bozic
ity, and responsiveness of four knee outcome KJ. Responsiveness of patient reported outcome mea-
scales for athletic patients. J Bone Joint Surg Am. sures in total joint arthroplasty patients. J Arthroplast.
2001;83-A(10):1459–69. 2015;30(2):176–91.
23. Mohtadi NG, Griffin DR, Pedersen ME, Chan
29. Tegner Y, Lysholm J. Rating systems in the evalua-
D, Safran MR, Parsons N, et al. Multicenter tion of knee ligament injuries. Clin Orthop Relat Res.
Arthroscopy of the Hip Outcomes Research 1985;198:43–9.
Network. The Development and validation of a self- 30. Walton MK, Powers JH III, Hobart J, Patrick D,
administered quality-of-life outcome measure for Marquis P, Vamvakas S, et al. International Society
young, active patients with symptomatic hip dis- for Pharmacoeconomics and Outcomes Research Task
ease: the International Hip Outcome Tool (iHOT-33). Force for Clinical Outcomes Assessment. Clinical
Arthroscopy. 2012;28(5):595–605. Outcome Assessments: Conceptual Foundation-
24. Nilsdotter AK, Lohmander LS, Klässbo M, Roos
Report of the ISPOR Clinical Outcomes Assessment—
EM. Hip disability and osteoarthritis outcome score Emerging Good Practices for Outcomes Research
(HOOS)—validity and responsiveness in total hip Task Force. Value Health. 2015;18(6):741–52.
replacement. BMC Musculoskelet Disord. 2003;4:10. 31. Ware JE Jr, Sherbourne CD. The MOS 36-item short-
25. Papou A, Hussain S, McWilliams D, Zhang W,
form health survey (SF-36). I. Conceptual framework
Doherty M. Responsiveness of SF-36 Health Survey and item selection. Med Care. 1992;30:473–83.
Basics of Outcome Assessment
in Clinical Research
11
Monique C. Chambers, Sarah M. Tepe,
Lorraine A. T. Boakye, and MaCalus V. Hogan
11.1 Introduction edical interventions are best suited for the local
m
patient population. Finally, transparency in clini-
The World Health Organization defines an out- cal outcomes empowers healthcare institutions to
come measure as the “change in the health of an make healthcare decisions that help to improve
individual, group of people, or population that is patient care and ultimately drive down cost.
attributable to an intervention or series of interven- There has been an increased focus on vari-
tions” [13]. Measuring clinical outcomes is essen- ous outcome measures because of clinical inter-
tial to track progress as healthcare institutions ventions. Much of the focus on clinical and
continue to shift toward more value-based poli- translation research seeks to satisfy the scien-
cies, programs, and practice. Outcome measures tific curiosity of the clinician and/or scientist.
are largely motivated by national standards of care However, as research guides clinical manage-
and financial incentives to minimize waste and ment, it is important to apply the appropriate sci-
cost burden associated with care delivery. Clinical entific methods to accurately assess how medical
outcome measures may include health-related fac- interventions impact patient outcomes. In this
tors, health system-related factors that serve as a chapter, we will discuss these outcome measures
proxy for medical resource utilization, and/or the to guide clinicians that are pursuing research and
subjective perspective of patients. Tracking clini- to equip young investigators with the necessary
cal outcomes allows healthcare organizations to tools to adequately design studies that utilize the
identify variations in care delivery to properly appropriate measure to assess outcomes based on
align healthcare practices within an organization. the clinical problem.
Additionally, it provides a model to assess evi-
dence-based practices and determine which
11.2 Principles of Outcome
Measures
M. C. Chambers
Foot and Ankle Injury Research [F.A.I.R] Group,
Department of Orthopaedic Surgery, University of An outcome measure is the result of a test that is
Pittsburgh School of Medicine, Pittsburgh, PA, USA used objectively to determine the baseline func-
S. M. Tepe · L. A. T. Boakye · M. V. Hogan (*) tion of a patient at the beginning of treatment
Department of Orthopaedic Surgery, University of [13]. Once treatment has commenced, the same
Pittsburgh School of Medicine, Pittsburgh, PA, USA metric can be used to determine progress and
e-mail: hoganmv@upmc.edu
© ISAKOS 2019 89
V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_11
90 M. C. Chambers et al.
treatment efficacy. Appropriate outcome mea- clinical investigators should understand which
sures allow one to assess the quality of medical data sources are best suited to answer their clini-
interventions. The most useful outcomes should cal question. Mandates related to the meaning-
be measurable based on discrete parameters and ful use of electronic medical records lead the
distinct time points. The measurement tool Food and Drug Administration (FDA) to estab-
should be accurate within a small range of error. lish aims of electronic data sources. It is also
The outcome measure should also be reliable important to understand how various data
with the ability to achieve the same result if sources are generated. Common data sources
applied again by a well-trained provider. The tool are generated from administrative claims and
should be validated to ensure consistency with health insurance enrollment, clinical encounters
other measures used to track outcomes [10]. As with providers, patient surveys, and clinical tri-
such, the goal of measuring outcomes is to pro- als, among other sources. There are many
vide a baseline assessment for the improvement advantages/disadvantages to these sources
of patient outcomes, patient satisfaction, and the (Table 11.1). The appropriate collection, main-
cost burden associated with healthcare. tenance, and utilization of data sources will
result in the accurate assessment of medical
treatments and outcomes that are specific to the
11.3 Considering Various Data patient population of interest. Clinical investi-
Sources gators should have a critical eye for the data
source used and should seek to constantly
Regulatory agencies seek to streamline the col- improve the thorough collection of medical data
lection and utilization of data sources; therefore, and outcomes assessment.
11.5 Performance-Based
Outcome Measures
Fact Box 11.2: Qualitative Research
• Seeks to ask a question Performance-based outcome measures have
• Emphasizes people and process been used for decades in orthopedics includ-
• Acknowledges that how clinical prac- ing in biomechanical and kinematic studies.
tice occurs is more important than doing Performance-based outcome measures usually
a task involve observation of the patient in some capac-
• Results should be placed within the con- ity to assess functional status. The advantage of
text of the qualitative research question collecting objective data means that the results
should be reproducible and without bias [16].
92 M. C. Chambers et al.
Despite the widespread use of performance based status and well-being [2]. The collection of PROs
outcomes, many have not been validated or cor- has become more prevalent in physician practices
related with clinical outcomes. Clinical outcome across the United States. The use of PROs is a
measures may be applied inappropriately to a valuable tool for physicians, patients, and clinical
population that the outcome measures were not investigators. PROs have been collected by phy-
designed or validated to assess. Such measures sicians for many years; however, they have his-
should be modified and validated for the current torically been used only to determine the efficacy
population of interest prior to use in a clinical of various treatments [2].
investigation. Nonetheless, performance-based In the past few years, the focus of a patient’s
outcomes present an objective method to deter- self-assessment has shifted towards the idea that
mine a patient’s functional capacity. the patient perspective and input is an integral
aspect of clinical assessment and progress.
Patients can actively participate in medical
11.6 Self-Reported Outcome decision-making to help guide their treatment and
Measures understand improvement or changes that may
occur. Each patient responds to treatment in a
Outcomes in healthcare are not only representa- unique way. Therefore, it is important that, while
tive of the quality of care provided but also reflect standardized, the questionnaires are answered
the service as experienced by the patient [1]. accurately by the patient. While physicians remain
Self-reported outcome measures were developed the educators, patients can maintain a certain
as a means of gathering information based on degree of autonomy by self-reporting symptoms,
patient experience and perception. Many of these progress, and health-related quality of life [3].
measures was designed to assess treatment effec- Medical professionals, researchers, hospital
tiveness and to emphasize the importance of the administrators, and even insurers still use the
individual’s own perspective when making the information as a guide to what treatments are
evaluation of care [2, 8]. These outcomes may most clinically and financially effective. The con-
include pain scores, patient satisfaction scores, as stant progression toward value-based policies and
well as patient-reported outcomes (PROs). The implementation will lead to increased value of
challenge is to encourage healthcare profession- PRO utilization, Orthopaedic surgery will likely
als (HCPs) to focus on individual patient percep- be impacted especially given the higher rate of
tions of disease impact as well as disease activity elective procedures within our field [4].
measures, with the aims of providing patient-
centered care through shared decision-making
and improving outcomes considered important 11.7.2 Collecting PROs Within
by patients themselves [3]. a Systematic Performance
Platform
establish systems designed to optimize data col- healthcare system. In order to ensure the accurate
lection, analysis, and interpretation. The use of assessment of PRO data, the information should
the EMR affords the opportunity to optimize be thorough, complete, and collected at key
PRO data collection. stages within the disease state [7]. Any variations
The development of a system-wide perfor- within the data set create limitations to both the
mance platform allows for uniformity in the col- use and the validity of outcomes data.
lection process. To optimize PRO collection, there
should be a standardized system in place for
patients to complete specific PRO questionnaires 11.7.3 Utilization of PROs
at distinct time periods within the disease process
and an episode of care. The platform can be built Patient-reported outcomes create an environment
into the EMR database with predetermined or “red that fosters shared decision-making. Decisions
flag” reminders to alert healthcare providers when made with patient input often make that patient
patients are due for the next set of surveys. This feel more at ease because he or she can take on a
information is then stored within the EMR and can more active role in regards to guiding treatment.
be pulled as part of a large data set to assess the Although tracking PROs has become more com-
impact on the patient population. This information monplace in orthopaedic research, there has been
can also be retrieved and viewed as part of the indi- difficulty in translating the results into changes
vidual patient’s EMR to track the patient’s prog- within clinical practice. To best serve as a guide
ress over time. Access to the PRO results over time for clinical decision-making, the collection and
allows the physician to use the patient’s daily assessment of PRO outcomes should take clinical
activities and overall health status in the shared relevance into account. To accomplish this goal,
decision-making process for treatment. specific PROs should be validated based on the
Despite the vast benefits of collecting and specific orthopedic disorder or disease process,
utilizing PROs within the clinical and research particularly for PROs that measure general health
setttings, there remain many challenges that pre- status, which may or may not be greatly impacted
vent PROs from being implemented across the by the orthopedic condition.
While this example is not the direct result of a
research study, the continuous use of PROs in the
Clinical Vignette clinical decision-making process can help to
A patient with an acute ankle injury may shape clinical guidelines and impact the results
have SF-12 and PROMIS-10 scores that of larger population studies as it relates to the uti-
reflect poor health status at baseline. If this lization of PROs. This type of clinical relevance
patient were to undergo surgical interven- should be within the aims of clinical investiga-
tion, there may be a limited change in their tions related to all clinical outcome measures.
SF-12 or PROMIS-10 scores. This patient With clinical implications at the forefront of out-
would be more accurately assessed based come research study design, researchers are bet-
on foot- and ankle-specific outcome mea- ter equipped to develop the right questions and
sures, such as the AOFAS, GRC, and collect the appropriate data to address the root
FAAM scores. Additionally, if this patient cause of outcome discrepancies.
were to be included in a study that assessed
scores at 3-, 6-, and 12-month follow-up,
there would be a large selection bias. 11.7.4 Linking PROs to Comparative
Therefore, the change in PRO measures as Effectiveness Research
the result of a medical intervention is
equally, if not more, important to evaluate To improve the efficiency of PRO collection, the
as the raw pre- and post-operative scores. EMR provides new opportunities to have meaning-
ful use of the data collected and pooled in ways that
94 M. C. Chambers et al.
are innovative and necessary for the shared deci- cal practitioner, there should be consideration for
sion-making process. Comparative effectiveness the clinical implications that can be applied based
research (CER) attempts to synthesize current on the results of any given study. A better under-
medical investigations on various medical inter- standing of the added value of PROs for CER will
ventions with the goal of identifying the optimial help to develop best practices and define more for-
management plan for several orthopedic injuries mally when PROs have demonstrated utility [9].
[11]. As such, the purpose of CER is multifaceted
and aligns well with the utilization of PRO data. Take-Home Message
Linking PRO results provides an evidence-based • The shift from volume-based, physician-
approach to the diagnosis, treatment, prevention, focused care to value-based, patient-centered
and monitoring of orthopaedic conditions [5]. A models has sparked a change in the way treat-
comprehensive approach must be adopted to ensure ment effectiveness is assessed.
that the diverse patient population has been • The use of clinical outcome measures that
accounted for in medical research. PROs that reflect the true value of high-quality and cost-
account for diversity within our population is the efficient care should be considered when eval-
only way to effectively educate patients on which uating medical interventions and determining
treatment has the most appropriate outcomes for if treatment modifications are necessary in
their specific condition and demographics. distinct patient populations.
Unfortunately, many patient populations that • These principles should serve as a guide to the
would be impacted by such medical interventions basis of outcome assessment and study design
are excluded from or opt out of participation in sci- for clinical outcome investigations.
entific studies. As a scientific investigator and clini-
© ISAKOS 2019 97
V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_12
98 J. F. Vega and K. P. Spindler
large randomized trials and as a primary outcome questions scored from 0 to 3 and assesses pain,
in meta-analysis [31, 58]. Also like the ODI, NDI instability, and ability to perform activities of
has been demonstrated to have acceptable psy- daily living with the affected shoulder. The total
chometric properties [27, 73]. score is converted to a 100-point scale, with
higher scores indicating better outcomes [59].
The ASES shoulder outcome score has been
12.5.3 Disabilities of the Arm, validated for use in patients with a wide range of
Shoulder, and Hand Score shoulder diagnoses, including shoulder instabil-
(DASH) and Quick-DASH ity [50].
(Hand, Wrist, and Elbow)
The Disabilities of the Arm, Shoulder, and Hand 12.5.5 Western Ontario Shoulder
Score (DASH) was developed by the Upper Instability Index (WOSI;
Extremity Collaborative Group (EUCG), a joint Shoulder)
initiative made up of members from AAOS, the
Council of Musculoskeletal Specialty Societies The Western Ontario Shoulder Instability
(COMSS), and the Institute for Work and Index (WOSI) was developed in 1998 for use
Health. It debuted in 1996 and was designed to in patients with complaints related to shoulder
evaluate disorders of the upper limb, from the instability. It consists of 21 questions that
shoulder to the hand [2, 32]. The DASH consists evaluate physical symptoms, sports/recre-
of 38 questions, each of which is scored on a ation/work, lifestyle, and emotions. Each
five-point Likert scale. Lower scores indicate question is scored on a 100-mm visual analog
less impairment. What makes the DASH rather scale (VAS), making the total score 2100.
unique is that patients are assessed on their abil- Lower scores indicate less impairment and
ity to complete a task regardless of which limb symptoms [40, 66]. Because this instrument
is needed to perform that action. This is consid- was designed with a rather narrow focus (for
ered both a strength and limitation of the use in shoulder instability only), it is reason-
instrument. able to report the minimal clinically important
The Quick-DASH is an abbreviated version difference (a concept that will be discussed
that was developed and released in 2005. The shortly), which has been reported as 220
Quick-DASH includes only 11 questions but still (10.4%) in the literature [39].
maintains excellent correlation with its parent
survey [4].
12.5.6 Oxford Shoulder Score (OSS;
Shoulder)
12.5.4 American Shoulder and Elbow
Surgeons Standardized The Oxford Shoulder Score (OSS) is a 12-item
Shoulder Outcome Score PROM developed in 1996 by a group of research-
(ASES; Shoulder) ers at Oxford University with the intention of
measuring the effect of surgical intervention for a
The American Shoulder and Elbow Surgeons variety of shoulder diagnoses, except instability
Standardized Shoulder Outcome Score (ASES) [17]. Each item is scored on a five-point scale.
is the second half of a dual outcome measure. The sum of the 12 questions is then converted
The first half of the assessment includes a set of such that higher scores indicate improved out-
physician-rated questions. These are typically comes [18]. The OSS has been shown to be valid,
not reported in the literature. The second half, reliable, and sensitive to changes over time after
which is completed by patients, consists of ten shoulder surgery [79].
102 J. F. Vega and K. P. Spindler
12.5.7 Hip Disability WOMAC in its entirety and thereafter adds the
and Osteoarthritis Outcome same two domains that were developed to create
Score (HOOS) and HOOS Joint the HOOS (sports/recreation and knee-related
Replacement (HOOS JR; Hip) quality of life) [61]. In some ways, the inclusion
of the WOMAC represents a limitation of the
The Hip Disability and Osteoarthritis Outcome KOOS (and the HOOS), as two of the most com-
Survey (HOOS) is a 40-item outcome measure monly performed knee procedures—anterior cru-
developed by Roos and colleagues in the early ciate ligament reconstruction and arthroscopic
2000s. It includes the Western Ontario McMaster partial meniscectomy—are typically done on
osteoarthritis score (WOMAC) in its entirety and younger populations that do not have osteoarthri-
adds an additional two dimensions that assess sports/ tis. However, the benefit of having the WOMAC
recreation and hip-related quality of life, thus allow- built into the questionnaire is that the KOOS (or
ing the HOOS to capture five total domains (includ- the HOOS) can be used in prospective cohort
ing pain, other symptoms, and activities of daily studies or longitudinal databases involving popu-
living, which are captured by the WOMAC) [52]. lations in which osteoarthritis (whether primary
One aspect that makes the HOOS (and the or posttraumatic) is expected to occur.
KOOS) unique is that a total score is not calculated. The KOOS has been validated for use in a
Rather than combined scores, each domain is scored variety of knee diagnosis ranging from osteoar-
separately, producing five subscale scores that range thritis to ACL rupture [5, 22, 36]. The KOOS
from 0 to 100, with higher scores indicating better subscales are also kept separate, although a
outcomes. This is both a strength and limitation to “global” score, dubbed the KOOS4, has been uti-
the HOOS (and KOOS), as the subscales allow for lized in large randomized trials (despite a lack of
more granular assessment of changes to particular evidence supporting the use of the KOOS sub-
domains, while the inability to calculate a total scales in this fashion) [24]. The KOOS4 repre-
score leads to more significant floor/ceiling effects sents an average of four KOOS subscales
and limits one’s ability to compare HOOS scores to (likewise, the KOOS5 represents an average of
other PROMs that generate a single score. The all five KOOS subscales) and is useful for assess-
HOOS has been shown to have excellent psycho- ing outcomes for interventions that are expected
metric properties [41]. Another advantage of the to affect multiple KOOS subscales simultane-
HOOS is that it is responsive enough for use in both ously. The KOOS has been demonstrated to have
short- and long-term studies. excellent psychometric properties as well [15].
The HOOS JR is a six-question short-form Like the HOOS, the KOOS is suitable for use in
version of the HOOS that was developed by both short- and long-term studies.
Lyman and colleagues at the Hospital for Special The KOOS JR is a seven-question short-form
Surgery. It utilizes questions from the “pain” and version of the KOOS that was developed by the
“activities of daily living” domains and has been same HSS group that pioneered the HOOS JR
validated for use in patients undergoing total hip and has been validated for use in total knee
arthroplasty (THA) [46]. arthroplasty (TKA) [45].
Like the HOOS, the Knee injury and Osteoarthritis The International Knee Documentation
Outcome Score (KOOS) was developed by Roos Committee Subjective Knee Form (IKDC-SKF)
et al. in the late 1990s. It too includes the debuted in the early 2000s, after years of devel-
12 Types of Scoring Instruments Available 103
opment by Irrgang et al., and features ten ques- currently able to participate. Scores range from 0
tions that address symptoms, sports activities, (unable to work because of knee problems) to 10
and function [34]. It was intended to be a PROM (competitive sports such as soccer, football,
that could be used for any condition involving the rugby on a national/elite level) [10, 72].
knee and has been shown to be valid, reliable,
and responsive across multiple diagnoses [26, 30,
35]. The raw score is transformed to a 100-point 12.5.12 Foot and Ankle Ability
scale, with higher scores indicating better Measure (FAAM; Foot
outcomes. and Ankle)
12.5.15 Achilles Tendon Total identify the driver of the poor outcome (e.g., the
Rupture Score (ATRS; Foot patient could have pain, poor function, additional
and Ankle) symptoms, or a combination of the three).
The SANE is typically administered as an
The Achilles Tendon Total Rupture Score (ATRS) adjunct to other PROMs such as the IKDC or the
was developed to assess outcomes of patients ASES rather than as a standalone PROM.
having suffered total Achilles tendon ruptures. It
features ten questions and has shown satisfactory
psychometric properties [53]. 12.6.2 Patient Acceptable Symptom
State (PASS)
Physical exam measurements can be separated Imaging measurement techniques are other com-
into two major categories—traditional “hands- monly used OROMs. Examples of such measures
on” techniques and instrumented techniques. include the Kellgren-Lawrence grading scheme
and the Osteoarthritis Research Society
International (OARSI) classification score, both of
12.7.2 Traditional “Hands-On” which are used to assess radiographic changes
Techniques associated with knee osteoarthritis [42]. Additional
examples include the modified Outerbridge scale,
Traditional “hands-on” techniques include basic the Whole-Organ Magnetic Resonance Imaging
physical exam variables such as range of motion Score (WORMS), and the Knee Osteoarthritis
(ROM) and strength, as well as more nuanced Scoring System (KOSS), all of which can be used
examination techniques or special tests (e.g., to evaluate magnetic resonance images in a way
Lachman exam, meniscal testing, special exami- that is standardized and reliable [55, 78].
nation maneuvers of the shoulder, etc.). When
choosing which, if any, of these OROMs to Take-Home Message
include when designing a study, it is important to • Numerous scoring instruments are available
consider characteristics such as reliability (both for use in clinical research.
inter and intra-rater), repeatability, and reproduc- • It is important to recognize the strengths and
ibility [3]. limitations of the instrument(s) that will be
used when designing a clinical research study.
• No perfect scoring instrument exists, and the
12.7.3 Instrumented Techniques soundest clinical research typically utilizes a com-
bination of multiple instruments including both
Instrumented techniques, such as the use of the PROMs (general and joint-specific) and OROMs.
KT-1000 arthrometer or the Telos SD 900 for the
measurement of knee laxity, are used to minimize
measurement error (thus increasing the reliability References
of a measurement) [8]. While the increase in reli-
ability is beneficial, instrumented techniques 1.
Abrams GD, Harris JD, Gupta AK, et al.
typically increase the cost of conducting a study Functional performance testing after anterior cru-
ciate ligament reconstruction. Orthop J Sports
considerably when compared to using “hands- Med. 2014;2(1):2325967113518305. https://doi.
on” techniques. org/10.1177/2325967113518305.
106 J. F. Vega and K. P. Spindler
2. Angst F, Schwyzer H-K, Aeschlimann A, Simmen 14. Collier R. Legumes, lemons and streptomycin: a
BR, Goldhahn J. Measures of adult shoulder func- short history of the clinical trial. Can Med Assoc
tion: Disabilities of the Arm, Shoulder, and Hand J. 2009;180(1):23–4. https://doi.org/10.1503/
Questionnaire (DASH) and its short version cmaj.081879.
(QuickDASH), Shoulder Pain and Disability Index 15. Collins NJ, Prinsen CA, Christensen R, Bartels EM,
(SPADI), American Shoulder and Elbow Surgeons Terwee CB, Roos EM. Knee Injury and Osteoarthritis
(ASES) Society Standardized Shoulder Assessment Outcome Score (KOOS): systematic review and
Form, Constant (Murley) Score (CS), Simple Shoulder meta-analysis of measurement properties. Osteoarthr
Test (SST), Oxford Shoulder Score (OSS), Shoulder Cartil. 2016;24(8):1317–29. https://doi.org/10.1016/j.
Disability Questionnaire (SDQ), and Western Ontario joca.2016.03.010.
Shoulder Instability Index (WOSI). Arthritis Care 16. Davis JC, Bryan S. Patient Reported Outcome
Res. 2011;63(S11):S174–88. https://doi.org/10.1002/ Measures (PROMs) have arrived in sports and exer-
acr.20630. cise medicine: why do they matter? Br J Sports
3. Bartlett JW, Frost C. Reliability, repeatability and Med. 2015;49(24):1545–6. https://doi.org/10.1136/
reproducibility: analysis of measurement errors in bjsports-2014-093707.
continuous variables. Ultrasound Obstet Gynecol. 17. Dawson J, Fitzpatrick R, Carr A. Questionnaire on the
2008;31(4):466–75. https://doi.org/10.1002/ perceptions of patients about shoulder surgery. J Bone
uog.5256. Joint Surg Br. 1996;78(4):593–600.
4. Beaton DE, Wright JG, Katz JN, Upper Extremity 18. Dawson J, Rogers K, Fitzpatrick R, Carr A. The
Collaborative Group. Development of the Oxford shoulder score revisited. Arch Orthop Trauma
QuickDASH: comparison of three item-reduction Surg. 2009;129(1):119–23. https://doi.org/10.1007/
approaches. J Bone Joint Surg Am. 2005;87(5):1038– s00402-007-0549-7.
46. https://doi.org/10.2106/JBJS.D.02060. 19. Devlin NJ, Brooks R. EQ-5D and the EuroQol group:
5. Bekkers JEJ, de Windt TS, Raijmakers NJH, Dhert past, present and future. Appl Health Econ Health
WJA, Saris DBF. Validation of the Knee Injury and Policy. 2017;15(2):127–37. https://doi.org/10.1007/
Osteoarthritis Outcome Score (KOOS) for the treat- s40258-017-0310-5.
ment of focal cartilage lesions. Osteoarthr Cartil. 20. Eechaute C, Vaes P, Van Aerschot L, Asman S, Duquet
2009;17(11):1434–9. https://doi.org/10.1016/j. W. The clinimetric qualities of patient-assessed
joca.2009.04.019. instruments for measuring chronic ankle instability:
6. Berlowitz DR, Foy CG, Kazis LE, et al. Effect of inten- a systematic review. BMC Musculoskelet Disord.
sive blood-pressure treatment on patient-reported out- 2007;8:6. https://doi.org/10.1186/1471-2474-8-6.
comes. N Engl J Med. 2017;377(8):733–44. https:// 21. Emerson Kavchak AJ, Cook C, Hegedus EJ, Wright
doi.org/10.1056/NEJMoa1611179. AA. Identification of cut-points in commonly used
7. Bhatt A. Evolution of clinical research: a history hip osteoarthritis-related outcome measures that
before and beyond James Lind. Perspect Clin Res. define the patient acceptable symptom state (PASS).
2010;1(1):6–10. Rheumatol Int. 2013;33(11):2773–82. https://doi.
8. Boyer P, Djian P, Christel P, Paoletti X, Degeorges R. org/10.1007/s00296-013-2813-1.
[Reliability of the KT-1000 arthrometer (Medmetric) 22. Engelhart L, Nelson L, Lewis S, et al. Validation of
for measuring anterior knee laxity: comparison with the knee injury and osteoarthritis outcome score sub-
Telos in 147 knees]. Rev Chir Orthop Reparatrice scales for patients with articular cartilage lesions of
Appar Mot. 2004;90(8):757–64. the knee. Am J Sports Med. 2012;40(10):2264–72.
9. Brand RA. Ernest Amory Codman, MD, 1869–1940. https://doi.org/10.1177/0363546512457646.
Clin Orthop. 2009;467(11):2763–5. https://doi. 23. Fairbank JC, Couper J, Davies JB, O’Brien JP. The
org/10.1007/s11999-009-1047-8. Oswestry low back pain disability questionnaire.
10. Briggs KK, Steadman JR, Hay CJ, Hines SL. Lysholm Physiotherapy. 1980;66(8):271–3.
score and Tegner activity level in individuals with nor- 24. Frobell RB, Roos EM, Roos HP, Ranstam J,
mal knees. Am J Sports Med. 2009;37(5):898–901. Lohmander LS. A randomized trial of treatment
https://doi.org/10.1177/0363546508330149. for acute anterior cruciate ligament tears. N Engl J
11. Browne JP, Cano SJ, Smith S. Using patient-reported Med. 2010;363(4):331–42. https://doi.org/10.1056/
outcome measures to improve health care: time for a NEJMoa0907797.
new approach. Med Care. 2017;55(10):901. https:// 25. Garratt A, Schmidt L, Mackintosh A, Fitzpatrick
doi.org/10.1097/MLR.0000000000000792. R. Quality of life measurement: bibliographic study
12. Carcia CR, Martin RL, Drouin JM. Validity
of patient assessed health outcome measures. BMJ.
of the foot and ankle ability measure in ath- 2002;324(7351):1417.
letes with chronic ankle instability. J Athl Train. 26.
Greco NJ, Anderson AF, Mann BJ, et al.
2008;43(2):179–83. Responsiveness of the International Knee
13. Codman EA. The classic: the registry of bone sarco- Documentation Committee Subjective Knee
mas as an example of the end-result idea in hospital Form in comparison to the Western Ontario and
organization. Clin Orthop. 2009;467(11):2766–70. McMaster Universities Osteoarthritis Index, modi-
https://doi.org/10.1007/s11999-009-1048-7. fied Cincinnati Knee Rating System, and Short Form
12 Types of Scoring Instruments Available 107
36 in patients with focal articular cartilage defects. 38. Kazis LE, Ren XS, Lee A, et al. Health status in
Am J Sports Med. 2010;38(5):891–902. https://doi. VA patients: results from the Veterans Health Study.
org/10.1177/0363546509354163. Am J Med Qual. 1999;14(1):28–38. https://doi.
27. Hains F, Waalen J, Mior S. Psychometric properties org/10.1177/106286069901400105.
of the neck disability index. J Manip Physiol Ther. 39. Kirkley A, Griffin S, Dainty K. Scoring systems for the
1998;21(2):75–80. functional assessment of the shoulder. Arthroscopy.
28. Hale SA, Hertel J. Reliability and sensitivity of the 2003;19(10):1109–20. https://doi.org/10.1016/j.
foot and ankle disability index in subjects with chronic arthro.2003.10.030.
ankle instability. J Athl Train. 2005;40(1):35–40. 40. Kirkley A, Griffin S, McLintock H, Ng L. The devel-
29. Hays RD, Bjorner JB, Revicki DA, Spritzer KL,
opment and evaluation of a disease-specific quality
Cella D. Development of physical and mental health of life measurement tool for shoulder instability. The
summary scores from the patient-reported outcomes Western Ontario Shoulder Instability Index (WOSI).
measurement information system (PROMIS) global Am J Sports Med. 1998;26(6):764–72. https://doi.org
items. Qual Life Res. 2009;18(7):873–80. https://doi. /10.1177/03635465980260060501.
org/10.1007/s11136-009-9496-9. 41. Klässbo M, Larsson E, Mannevik E. Hip dis-
30. Higgins LD, Taylor MK, Park D, et al. Reliability ability and osteoarthritis outcome score. An
and validity of the International Knee Documentation extension of the Western Ontario and McMaster
Committee (IKDC) Subjective Knee Form. Joint Bone Universities Osteoarthritis Index. Scand J Rheumatol.
Spine. 2007;74(6):594–9. https://doi.org/10.1016/j. 2003;32(1):46–51.
jbspin.2007.01.036. 42. Kohn MD, Sassoon AA, Fernando ND. Classifications
31. Hu Y, Lv G, Ren S, Johansen D. Mid- to long- in brief: Kellgren-Lawrence classification of osteoar-
term outcomes of cervical disc arthroplasty ver- thritis. Clin Orthop. 2016;474(8):1886–93. https://
sus anterior cervical discectomy and fusion for doi.org/10.1007/s11999-016-4732-4.
treatment of symptomatic cervical disc disease: a 43. Kvien TK, Heiberg T, Hagen KB. Minimal clinically
systematic review and meta-analysis of eight pro- important improvement/difference (MCII/MCID)
spective randomized controlled trials. PLoS One. and patient acceptable symptom state (PASS):
2016;11(2):e0149312. https://doi.org/10.1371/jour- what do these concepts mean? Ann Rheum Dis.
nal.pone.0149312. 2007;66(Suppl 3):iii40–1. https://doi.org/10.1136/
32. Hudak PL, Amadio PC, Bombardier C. Development of ard.2007.079798.
an upper extremity outcome measure: the DASH (dis- 44. Laucis NC, Hays RD, Bhattacharyya T. Scoring the
abilities of the arm, shoulder and hand) [corrected]. The SF-36 in orthopaedics: a brief guide. J Bone Joint Surg
Upper Extremity Collaborative Group (UECG). Am J Am. 2015;97(19):1628–34. https://doi.org/10.2106/
Ind Med. 1996;29(6):602–8. https://doi.org/10.1002/ JBJS.O.00030.
(SICI)1097-0274(199606)29:6<602::AID-AJIM4> 45. Lyman S, Lee Y-Y, Franklin PD, Li W, Cross MB,
3.0.CO;2-L. Padgett DE. Validation of the KOOS, JR: a short-
33. Ingelsrud LH, Granan L-P, Terwee CB, Engebretsen form knee arthroplasty outcomes survey. Clin Orthop.
L, Roos EM. Proportion of patients reporting 2016;474(6):1461–71. https://doi.org/10.1007/
acceptable symptoms or treatment failure and their s11999-016-4719-1.
associated KOOS values at 6 to 24 months after 46. Lyman S, Lee Y-Y, Franklin PD, Li W, Mayman
anterior cruciate ligament reconstruction: a study DJ, Padgett DE. Validation of the HOOS, JR: a
from the Norwegian Knee Ligament Registry. short-form hip replacement survey. Clin Orthop.
Am J Sports Med. 2015;43(8):1902–7. https://doi. 2016;474(6):1472–82. https://doi.org/10.1007/
org/10.1177/0363546515584041. s11999-016-4718-2.
34. Irrgang JJ, Anderson AF, Boland AL, et al.
47. Martin RL, Irrgang JJ. A survey of self-reported out-
Development and validation of the international knee come instruments for the foot and ankle. J Orthop
documentation committee subjective knee form. Am J Sports Phys Ther. 2007;37(2):72–84. https://doi.
Sports Med. 2001;29(5):600–13. https://doi.org/10.11 org/10.2519/jospt.2007.2403.
77/03635465010290051301. 48. Martin RL, Irrgang JJ, Burdett RG, Conti SF,
35. Irrgang JJ, Anderson AF, Boland AL, et al.
Van Swearingen JM. Evidence of validity for
Responsiveness of the International Knee the foot and ankle ability measure (FAAM).
Documentation Committee Subjective Knee Form. Foot Ankle Int. 2005;26(11):968–83. https://doi.
Am J Sports Med. 2006;34(10):1567–73. https://doi. org/10.1177/107110070502601113.
org/10.1177/0363546506288855. 49. Marx RG, Stump TJ, Jones EC, Wickiewicz TL,
36. Johnson DS, Smith RB. Outcome measurement in Warren RF. Development and evaluation of an activ-
the ACL deficient knee—what’s the score? Knee. ity rating scale for disorders of the knee. Am J Sports
2001;8(1):51–7. Med. 2001;29(2):213–8.
37. Jones D, Kazis L, Lee A, et al. Health status assessments 50. Michener LA, McClure PW, Sennett BJ. American
using the veterans SF-12 and SF-36: methods for evalu- Shoulder and Elbow Surgeons Standardized Shoulder
ating otucomes in the Veterans Health Administration. Assessment Form, patient self-report section: reli-
J Ambul Care Manage. 2001;24(3):68–86. ability, validity, and responsiveness. J Shoulder Elb
108 J. F. Vega and K. P. Spindler
numeric evaluation method and two shoulder rating arthroscopic correlation. J Bone Joint Surg Am.
scales. Outcomes measures after shoulder surgery. 2014;96(14):1145–51. https://doi.org/10.2106/
Am J Sports Med. 1999;27(2):214–21. https://doi.org JBJS.M.00929.
/10.1177/03635465990270021701. 79. Younis F, Sultan J, Dix S, Hughes PJ. The range of the
75. Williams GN, Taylor DC, Gangel TJ, Uhorchak JM, Oxford Shoulder Score in the asymptomatic popula-
Arciero RA. Comparison of the single assessment tion: a marker for post-operative improvement. Ann R
numeric evaluation method and the Lysholm score. Coll Surg Engl. 2011;93(8):629–33. https://doi.org/1
Clin Orthop. 2000;373:184–92. 0.1308/003588411X13165261994193.
76. Wright AA, Hensley CP, Gilbertson J, Leland JM III, 80. Zaina F, Tomkins-Lane C, Carragee E, Negrini
Jackson S. Defining patient acceptable symptom state S. Surgical versus non-surgical treatment for
thresholds for commonly used patient reported out- lumbar spinal stenosis. Cochrane Database
comes measures in general orthopedic practice. Man Syst Rev. 2016;(1):CD010264. https://doi.
Ther. 2015;20(6):814–9. https://doi.org/10.1016/j. org/10.1002/14651858.CD010264.pub2.
math.2015.03.011. 81. Zimerman AL. Evidence-based medicine: a short his-
77.
Wright RW, Baumgarten KM. Shoulder out- tory of a modern medical movement. Virtual Mentor.
comes measures. J Am Acad Orthop Surg. 2013;15(1):71. https://doi.org/10.1001/virtualmentor.
2010;18(7):436–44. 2013.15.1.mhst1-1301.
78. Wright RW, Ross JR, Haas AK, et al. Osteoarthritis
classification scales: interobserver reliability and
Health Measurement
Development and Interpretation
13
Andrew Firth, Dianne Bryant, Jacques Menetrey,
and Alan Getgood
13.1 W
hat Is an Outcome 13.1.1 Discriminative, Predictive,
Measure, and Why Measure and Evaluative Outcomes
Health Outcomes?
To select an appropriate outcome measure for
In 1948, the World Health Organization defined research, the objective of the measurement must
health as ‘a state of complete physical, mental, be clear and aligned with the objectives of the
and social well-being [28]’. This idealistic con- study, the population being investigated, and the
cept of health as a comprehensive, theoretical study methodology [7]. For example, an instru-
framework is problematic for those attempting to ment used to discriminate between individuals
quantify health within an individual to inform might be used to determine a patient’s eligibility
decision-making or within or between groups to for study participation or may be a diagnostic
inform broader clinical recommendations. tool designed to classify individuals as negative
Despite the complexity of health measurement, or positive for disease. For example, the Kellgren
outcome measures are designed with one of three and Lawrence system is used to classify the
purposes in mind: discrimination, evaluation, or severity of radiographic knee osteoarthritis (OA)
prediction. In addition, outcome measures may [10] and is used in conjunction with clinical
be surrogates for important outcomes or them- symptoms to discriminate between individuals
selves measure outcomes that are of direct impor- with advanced OA and those with mild OA.
tance to patients. A predictive instrument, on the other hand, is
used to predict future events. For example, there
is some evidence that a test of a patient’s landing
A. Firth · A. Getgood (*)
biomechanics post-anterior cruciate ligament
Fowler Kennedy Sport Medicine Clinic, 3M Centre,
University of Western Ontario, London, ON, Canada (ACL) reconstruction can predict reinjury [6, 18]
e-mail: afirth5@uwo.ca; alan.getgood@uwo.ca and is therefore used by some clinicians as a
D. Bryant guide for making recommendations regarding
Faculty of Health Sciences, Elborn College, return to sport.
University of Western Ontario, London, ON, Canada Finally, an evaluative measurement tool is
e-mail: dianne.bryant@uwo.ca
used to assess change and is most commonly
J. Menetrey used to measure the effect of a therapy. For exam-
Centre de medicine du sport et de l’exercice,
ple, if an investigator wishes to determine
Hirslanden Clinique La Colline, University Hospital
of Geneva, Geneva, Switzerland whether a regimen of aerobic exercise will
e-mail: Jacques.menetrey@lacolline.ch improve the quality of life, pain, and mobility in
© ISAKOS 2019 111
V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_13
112 A. Firth et al.
patients with osteoarthritis of the knee, she/he the incidence of adverse outcomes (e.g. death,
will administer each of the evaluative measures MI, stroke, revision surgery, etc.) and patient-
before the intervention and then again after the reported outcome measures (PROMs).
intervention and assess whether scores have A PROM is a subjective assessment completed
improved. Similarly, if a researcher’s objective is by the patient where they are asked to score
to compare the effectiveness of two or more ther- aspects of their own perception of their health
apies, each group of patients will complete the [20]. Some PROMs include only questions related
evaluative measure at the endpoint of interest so to functional ability, called patient-reported func-
that between-group comparisons can be made. tional ability questionnaires (e.g. Lower Extremity
Functional Score (LEFS), American Shoulder and
Elbow Surgeons (ASES) Score), while others
13.1.2 Surrogate- and Patient- attempt to measure health from a more compre-
Important Outcomes hensive perspective and include items querying
physical, mental, and social well-being (e.g.
Surrogate outcomes are not necessarily mean- 12-Item Short-Form Survey (SF-12), Western
ingful to patients but are believed to proxy out- Ontario Rotator Cuff (WORC) Index). The latter
comes that are directly important to patients [2]. are referred to as health-related quality-of-life
Early orthopaedic research often relied solely (HRQOL) instruments.
on surrogate outcomes like imaging, HRQOL measures often fall into one of two
performance-based tests, or physiological tests categories: disease-specific or generic. Disease-
to provide evidence of a treatment’s success. specific measures are designed to ask patients
Today, surrogate outcomes are most appropriate about constructs of health directly affected by the
for smaller, explanatory, or proof-of-concept disease in question (e.g. the Western Ontario
studies prior to investing in larger, more prag- Rotator Cuff (WORC) Index is specific to patients
matic studies that will apply to broader popula- with rotator cuff pathology). Generic measures,
tions. For example, shoulder range of motion is on the other hand, ask questions about health
a proxy for shoulder function but is not a direct from a non-specific, highly applicable perspec-
measure of function since an individual may not tive. Thus, while disease-specific measures are
require full motion to complete desired tasks or more sensitive to changes in health [27], generic
can find a way to compensate for a loss of measures have a larger scope and can therefore
motion and still accomplish desired tasks in a be interpreted across different health states,
fulfilling way. Other examples of common sur- including those who are healthy [7].
rogate outcomes are radiographic imaging of Often, the measurement properties of the most
joint space narrowing, a surrogate for pain and widely accepted outcome measures are exten-
impaired function; bone mineral density, a sur- sively published and provide evidence that the
rogate for risk of fragility fracture; strength, a measure is accurate, precise, and able to detect
surrogate for functional ability; etc. change in the population of interest. If this evi-
Although surrogate measures are important dence is not available, those hoping to use the
determinants of health, and help to provide expla- instrument may elect to first assess its measure-
nations for impairments or predict future patient- ment properties before implementing its use in
important health issues, pragmatic studies that clinic or as part of a research study or run the risk
aim to make recommendations for practice of collecting uninterpretable data. In the next sec-
change should be measuring effectiveness using tion, we describe some of the most common mea-
a measure of direct importance to patients, like surement properties.
13 Health Measurement Development and Interpretation 113
advances in imagining became available, like Both criterion and construct validity can be
magnetic resonance imaging and ultrasound, evaluated at a single point in time as just
these imaging methods were compared to obser- described, called cross-sectional validity, or, over
vations during surgery to determine their accu- time, to evaluate whether changes in the new
racy. In this example, the observations made instrument are associated to changes in existing
during surgery served as the gold standard. measures [3, 11, 13, 25]. If we return to our
Statistics like sensitivity and specificity are often example of a new knee OA-specific question-
used to communicate accuracy. naire, we could hypothesize that, over time, the
True gold standards are common for measures mental component score (MCS) of the SF12 may
of structures, physiology, physical function, and remain relatively unchanged despite small to
performance; however, they rarely exist for con- moderate changes in the knee OA-specific ques-
structs like HRQOL. For this more abstract para- tionnaire. Selecting outcome measures that have
digm, we evaluate validity using a theoretical demonstrated longitudinal construct validity
framework of health against which we make within the relevant study population is important
assumptions about how the instrument should when designing clinical trials where instruments
function if it’s accurately capturing the construct. are used to assess health status prior to and fol-
This type of validity is referred to as construct lowing treatment.
validity [9]. Although important, validity is insufficient for
To assess construct validity, we hypothesize an investigator wanting to make clinical recom-
the magnitude and direction of the correlation mendations based on the results of an evaluative
between the new outcome measure and related outcome measure. While validity shows that the
aspects of health to determine whether the new outcome measure is associated with changes in
test behaves as expected. For example, we may health, it lacks interpretability. Interpretability of
hypothesize that as the severity of disease changes in health states begins with our final two
increases, the quality of life score will decrease measurement properties, sensitivity to change
(directional hypothesis) and that the association and responsiveness. Sensitivity to change refers
between severity and quality of life will be small to the ability of an outcome measure to detect
(expected magnitude). As part of constructing changes in health when change has occurred but
these hypotheses, it’s common to compare the does not provide any insight as to the significance
performance of the new instrument to outcome of the change to those being treated [13].
measures designed to assess a similar attribute Conversely, responsiveness describes the out-
(convergent validity) [17, 23]. For example, we come measure’s ability to detect meaningful
may hypothesize that as the score on our new change from the patient’s perspective, no matter
disease-specific quality of life questionnaire how small the change is [13].
increases, so too will the physical component
score (PCS) of the generic health-related quality
of life instrument, the 12-Item Short-Form 13.2.3 Responsiveness
Survey (SF12).
Conversely, discriminant validity assesses Assessing responsiveness requires a marker of
whether the outcome measure performs better clinical importance. Traditional methods of
than an instrument designed to assess a general or determining responsiveness use either an anchor-
unrelated construct when measuring the outcome based (patient defined) or distribution-based (sta-
of interest [17, 23]. For instance, the investigator tistical) approach. The goal of either method is to
could compare a new knee OA-specific question- define the minimally clinically important differ-
naire, to a generic measure of mental health, like ence (MCID). MCID, originally proposed by
the mental component score (MCS) of the SF12, Jaeschke et al. [8] in 1989, is the smallest change
and expect that mental health is only very weakly in the outcome of interest that informed parties
associated to OA-related quality of life. (patient/clinician) consider important, whether
116 A. Firth et al.
favourable or harmful, which would cause those the difference (i.e. Is the difference sufficient to
involved to consider changing treatment [21]. change practice?). Specifically, a study result may
Using the anchor-based approach requires the achieve statistical significance even though the dif-
patient to complete the new questionnaire before ference is small and unimportant if the sample is
and after change is expected and to complete a highly homogeneous and/or quite large. On the
global rating of change (GRC) questionnaire. A other hand, an important difference may never
GRC requires the patient to indicate whether they reach statistical significance in a study with a het-
have improved, remained unchanged, or worsened; erogeneous sample and/or small sample size.
and if they have changed, they next indicate by how Clinicians scrutinizing or reporting results should
much they have changed. An estimate of the MCID place little interest on the p-value and instead con-
is made by calculating the average change score of sider indicators of precision, like sample size and
patients who indicated that they changed by a confidence intervals. Understanding the play of
‘small but important amount’ on the GRC. chance or random sampling error is also extremely
When determining responsiveness from a important in deciding how much weight to place
distribution-based perspective, investigators may on the results of a study.
construct two normal distributions: the first con-
structed from the change scores of patients who
should have remained stable over the two mea- 13.3.1 Random Sampling Error
surements and the second constructed from the and Sample Size
change scores of patients who are expected to
have changed over the two measurements. The Producing a study with results that are applicable
value of the MCID is said to be greater than the to the population is the overarching goal of clini-
majority of scores in the first distribution and less cal research. When we conduct a study, the par-
than the majority of scores in the second ticipants are only a sample of the population. The
distribution. smaller our sample, the more likely we are, just
by chance, to recruit a nonrepresentative sample.
The more representative your sample, the more
13.3 I nterpreting the Results: Bias likely that estimates of treatment effect observed
and Precision with the sample will also apply to the population.
As the sample size becomes larger (closer to the
Once the appropriate outcome measures have size of the population) or the results of repeated
been used to collect data, the challenge becomes studies are pooled together (e.g. in a meta-
presenting the data in a meaningful way. Simply analysis), the probability of a random sampling
presenting the results of statistical tests of the error decreases. A sample size estimate takes into
data (i.e. p-values) or the summary measures consideration the variability between individuals
(e.g. mean change pre- to post- or mean differ- in the population and the size of the expected dif-
ence between groups) is often meaningless to ference and provides an estimate of the size of the
those without an incredible amount of experience sample required to overcome random sampling
using the same outcome measure, which is error. Although a larger sample size will help to
extremely rare outside of intensive research overcome the risk of random sampling error, it
programmes. cannot overcome sampling bias in the results that
The problem with only presenting p-values (the arise from purposefully recruiting an unrepresen-
probability of an observed result given the null tative sample (e.g. selecting only those most
hypothesis is true) [26] is that p-values are highly likely to be responsive, adherent, etc.).
sample-specific because they are influenced by the In the next section, we introduce the impor-
sample’s variability and sample size. A p-value tance of including estimates of precision, like
does not express the magnitude of effect, the confidence intervals (CIs) when reporting or
reproducibility of the result, or the importance of interpreting the results of a study. But even confi-
13 Health Measurement Development and Interpretation 117
dence intervals cannot overcome random sam- For example, the 95% CI around the relative
pling error; thus, one should always be wary of risk of revision surgery is 3.5–8.6 (where
the applicability of studies with a small sample p < 0.05); both sides of the CI are relaying the
size (usually defended by a sample size calcula- same message that the risk of surgery is much
tion with an unreasonably large expected effect greater in one group compared to the other. In
size) or a highly selective method of sampling. this situation, the study can make definitive con-
clusions; in a large study with a representative
sample, this study convincingly demonstrates the
13.3.2 Confidence Intervals (CIs) benefit of one intervention over the other. If, on
the other hand, the 95% CI around the relative
A CI represents the range in which the true score risk of revision surgery is 1.01–10.5, the lower
is expected to lie. Selecting a 95% CI represents boundary of the CI is implying that there is not
means the investigator can be 95% certain that much risk of revision surgery in one group versus
the true population mean would be contained another, whereas the upper boundary of the CI is
within the interval if repeated representative sam- implying that there is 10× greater risk of revision
ples were studied from the population. CIs should surgery in one group compared to the other. In
be reported no matter the summary measure this situation, even though p < 0.05, the message
being used to present the results (e.g. relative is unclear; the study cannot make definitive
risk, odds ratio, mean difference, etc.). Then, conclusions.
instead of depending on the p-value to reach con- Interpreting the CIs around the results of stud-
clusions about the results of a study, one should ies that use PROMs presents an additional chal-
interpret the lower and upper boundaries pro- lenge. Specifically, how do we know whether the
vided by the CI. If the lower and upper boundar- difference being presented represents an impor-
ies of the CI are implying similar conclusions, tant finding?
then the study may make definitive conclusions
assuming the study sample is sufficiently large
and representative of the population. 13.4 Reporting and Interpreting
the Results
LSI needs to change by approximately 14 cm MCID available in the literature for most PROMs
for the clinician to be certain that the change iswill have been predominantly determined using
real and not measurement error. In our scenario, within-group change (as described in our reliabil-
the difference in LSI between time one and time ity section), which is appropriate for interpreting
two is (95–80) 15 cm, and thus, the clinician can pre-post-intervention studies, but not appropriate
be certain that true change has occurred. Further,for interpreting between-group studies. Here’s
the lower boundary of the 95% CI around the why; consider the amount of change that will be
second LSI is 88%, which is much closer to observed from pre- to postoperative total knee
the desired 90% in terms of recommending that replacement. Following recovery from a TKA,
the patient returns to sport. we expect quite remarkable improvements. For
example, in a study by Kahn et al. [9] that
included 172 patients who underwent TKA, the
13.4.2 Interpreting Change Within preoperative total WOMAC score was around 35
a Group of Patients points, and the postoperative score was around 12
points; 95% CI around the mean change is
To report the results of a within-group change, approximately 20–24 points. If the MCID is
one could report the proportion of patients who approximately 15 points [4], then the study can
have changed by an amount greater than the conclude that patients undergoing a TKA will
MDC threshold or, more meaningfully, could experience an important improvement.
present the mean change from pre- to post- and However, what is the average difference
use the MCID to interpret the 95% CI around between groups when both groups are undergo-
the difference. For example, if the MCID for a ing TKA? Perhaps the comparison is between
PROM is 10 points and the average pre- to post- two different types of implants. In this situation,
change score is 15 points (95%CI 5 points to 25 all patients are expected to benefit from TKA by
points), then the lower boundary of the CI an important amount. However, the research
implies that not all patients will achieve an question is about measuring the difference
important change and, thus, the study cannot between groups. Surely we do not expect the dif-
conclude that the intervention will provide ference between groups to be as large as the
important improvements for all patients. On the change within groups, especially when both
other hand, if the 95% CI around the pre- to groups are receiving active treatment (versus no
post-change score is 11–19 points, then (if the treatment). A 1993 study by Goldsmith et al. [5]
study is sufficiently large) the study can confi- determined the magnitude of the between-group
dently conclude that the intervention is highly MCID is significantly smaller than the within-
likely to provide important improvements for group MCID (between 20 and 40%). Thus, if the
most patients. MCID for a between-group comparison is around
5 points, then in the study by Victor et al. [24]
that compared two different implants for TKA
13.4.3 Interpreting Change Between (n = 131) where the mean difference between
Groups of Patients groups was 3 points (95%CI 10 points to 15
points), we can determine that the study is grossly
To report the results of a between-group compari- underpowered because the upper and lower
son, one could compare the proportion of patients boundaries of the CI include 5 (the possibility of
within each group who change by an amount an important difference); unfortunately, both
greater than the MDC threshold or, more mean- boundaries conclude in favour of the other
ingfully, could present the mean between-group implant. To make a definitive conclusion that the
difference and use the MCID to interpret the 95% two implants offer a similar outcome (as mea-
CI around this difference. However, the latter sured by the WOMAC), both CIs would exclude
presents a problem because the value of the five WOMAC points.
13 Health Measurement Development and Interpretation 119
3. Deyo RA, Diehr P, Patrick DL. Reproducibility and retest reliability of health status instruments. J Clin
responsiveness of health status measures. Statistics Epidemiol. 2003;56(8):730–5.
and strategies for evaluation. Control Clin Trials. 16. McDowell I, Newell C. Measuring health: a guide to
1991;12(4 Suppl):142S–58S. rating scales and questionnaires. 2nd ed. New York:
4. Escobar A, Quintana JM, Bilbao A, Aróstegui I, Oxford University Press; 1996.
Lafuente I, Vidaurreta I. Responsiveness and clini- 17. Messick S. Validity. In: Linn R, editor. Educational
cally important differences for the WOMAC and measurement. Phoenix: Oryx Press; 1993. p. 13–103.
SF-36 after total knee replacement. Osteoarthr Cartil. 18. Paterno MV, Schmitt LC, Ford KR, Rauh MJ, Myer
2007;15(3):273–80. GD, Huang B, et al. Biomechanical measures during
5. Goldsmith CH, Boers M, Bombardier C, Tugwell landing and postural stability predict second anterior
P. Criteria for clinically important changes in out- cruciate ligament injury after anterior cruciate liga-
comes: development, scoring and evaluation of rheu- ment reconstruction and return to sport. Am J Sports
matoid arthritis patient and trial profiles. OMERACT Med. 2010;38(10):1968–78.
Committee. J Rheumatol. 1993;20(3):561–5. 19. Reid A, Birmingham TB, Stratford PW, Alcock GK,
6. Hewett TE, Myer GD, Ford KR, Heidt RS, Colosimo Giffin JR. Hop testing provides a reliable and valid
AJ, McLean SG, et al. Biomechanical measures of outcome measure during rehabilitation after ante-
neuromuscular control and valgus loading of the rior cruciate ligament reconstruction. Phys Ther.
knee predict anterior cruciate ligament injury risk 2007;87(3):337–49.
in female athletes: a prospective study. Am J Sports 20. Rothman ML, Beltran P, Cappelleri JC, Lipscomb
Med. 2005;33(4):492–501. J, Teschendorf B, Group MFP-ROCM. Patient-
7. Jackowski D, Guyatt G. A guide to health measure- reported outcomes: conceptual issues. Value Health.
ment. Clin Orthop Relat Res. 2003;413:80–9. 2007;10(Suppl 2):S66–75.
8. Jaeschke R, Singer J, Guyatt GH. Measurement 21. Schünemann HJ, Guyatt GH. Commentary—good-
of health status. Ascertaining the minimal clini- bye M(C)ID! Hello MID, where do you come from?
cally important difference. Control Clin Trials. Health Serv Res. 2005;40(2):593–7.
1989;10(4):407–15. 22.
Shrout PE, Fleiss JL. Intraclass correlations:
9. Kahn TL, Soheili A, Schwarzkopf R. Outcomes of uses in assessing rater reliability. Psychol Bull.
total knee arthroplasty in relation to preoperative 1979;86(2):420–8.
patient-reported and radiographic measures: data 23. Streiner D, Norman G. Health measurement scales: a
from the osteoarthritis initiative. Geriatr Orthop Surg practical guide to their development and use. Oxford:
Rehabil. 2013;4:117–26. Oxford University Press; 1995.
10. Kellgren JH, Lawrence JS. Radiological assess-
24. Victor J, Ghijselings S, Tajdar F, Van Damme G,
ment of osteo-arthrosis. Ann Rheum Dis. 1957; Deprez P, Arnout N, et al. Total knee arthroplasty at
16(4):494–502. 15-17 years: does implant design affect outcome? Int
11. Kirshner B, Guyatt G. A methodological frame-
Orthop. 2014;38(2):235–41.
work for assessing health indices. J Chronic Dis. 25. Ware J, Kosinski M, Keller SD. A 12-Item Short-
1985;38(1):27–36. Form Health Survey: construction of scales and pre-
12. Laupacis A, Sackett DL, Roberts RS. An assess- liminary tests of reliability and validity. Med Care.
ment of clinically useful measures of the conse- 1996;34(3):220–33.
quences of treatment. N Engl J Med. 1988;318(26): 26. Wasserstein R, Lazar N. The ASA’s statement on
1728–33. p-values: context, process and purpose. Am Stat.
13. Liang MH. Longitudinal construct validity: establish- 2016;70(2):129–33.
ment of clinical meaning in patient evaluative instru- 27. Wiebe S, Guyatt G, Weaver B, Matijevic S, Sidwell
ments. Med Care. 2000;38(9 Suppl):II84–90. C. Comparative responsiveness of generic and spe-
14. Marshall G, Hays R, Nicholas R. Evaluating agree- cific quality-of-life instruments. J Clin Epidemiol.
ment between clinical assessment methods. Int J 2003;56(1):52–60.
Methods Psychiatric Res. 1994;4:249–57. 28. World Health Organization. Preamble to the constitu-
15. Marx RG, Menezes A, Horovitz L, Jones EC, Warren tion of the World Health Organization as adopted by
RF. A comparison of two time intervals for test- the international health conference, Geneva; 1948.
How to Document a Clinical Study
and Avoid Common Mistakes
14
in Study Conduct?
Although these authorizations may differ from one 14.2.1 Protocol, Case Report Form
country or a hospital to another, a study should not (CRF), Information
be started until all legal requirements are fulfilled. to the Patient, Informed
Regulatory compliance can sometimes be Consent Form (ICF),
laborious and time-consuming, with the conse- and Amendments
quence of delaying the start of a clinical study.
However, this waiting time may be efficiently This section should include the initial study pro-
used by the investigator to establish the organiza- tocol as submitted to the IEC/IRB as well as all
tional dimension of the study. Associated with an available subsequent amendments. The latter are
efficient communication, it will limit time loss documented changes to the protocol initially
and allow staying on track to complete and pub- approved by the IEC/IRB.
lish the study in a timely manner. Several inad- The CRF is a printed or electronic document
vertent consequences may occur during the study used to record all data required by the study pro-
conduct in case of an insufficient preparation, an tocol such as patient demographics data, study
early discontinuation of the study, a delay in the interventions, outcomes, and adverse events. It
recruitment of subjects, a high rate of patients should not be confused with source documents
lost to follow-up, low data quality, or a delay in which will be described later in this chapter. A
data analysis. Greater awareness of the entire blank copy of the case report form (CRF) as well
process of the study conduct and its common as a blank copy of patient-related outcomes mea-
mistakes is therefore the key to success for a sures (PROMs) and adverse event (AE) report
high-quality study, which may be considered for must be available on-site. This section of the
publication in a high-ranked journal. binder should also include the latest version of all
The aim of this chapter is (1) to give an over- documents used for the recruitment of partici-
view of the legal requirements to be respected dur- pants, that is to say the information provided to
ing a study conduct as well as (2) to provide the patient, the informed consent form (ICF), and
practical advice to make good use of the legally all advertisements used to recruit participants
required on-site documents, to organize the study, (newspaper, radio, posters, and flyers) [16].
and to avoid common mistakes in the monitoring of As for the protocol, the CRF or the informa-
the study or data handling. The link between each tion to the patients may be subjected to changes
phase of the study conduct and how it may prevent during the study conduct (i.e., following an
the investigator to publish will also be discussed. amendment in the protocol). To avoid time loss, it
is important to evaluate properly how a possible
change in the study protocol may affect the other
14.2 On-Site Documents: study documents (i.e., CRF, PROM, and AE) and
Definitions and Use submit all new versions together.
The screening and enrollment logs report all These documents are helpful to document quali-
patients screened and informed about the study. It fications and eligibility of investigators to con-
is used to report the outcome of the screening (as duct a study and/or provide medical supervision
screening failures can occur). It usually includes of subjects.
the following items: investigator name, site,
patient initials, date inform consent signed, ver-
sion of consent, date of screening, reasons for 14.2.11 Other On-Site Documents
screening failure, reason for withdrawal/exclu- According to the Study
sion, study code assigned, and staff initials. Any
withdrawal or lost to follow-up during the study In addition to the previously cited documents, if
conduct will be reported separately on the drop- the protocol includes known technical proce-
out log. dures, it is recommended to report any update
Logs can help to conduct the study in a more that may occur during the study conduct in test-
efficient and organized manner. Documentation ing procedures, including updates on normal
of the reasons why the screening has failed (i.e., value(s), certification, accreditation, or quality
patient refused to participate in the study, patient control. From a scientific point of view, it may
older than the upper limit of age considered) can help to trace any outlier/abnormal value when
provide information regarding the ability of the analyzing the data and help to correctly interpret
study team to enroll patients and retain them. the data.
This information may be useful to adapt the If biological samples are collected and stored
recruitment period. during the study, it is important to document the
location and identification process of retained
samples in case a test has to be repeated.
14.2.8 Subject Identification If the study includes the investigation of a
Code List product (i.e., medical device), the investigator’s
brochure and updates, instructions for handling
This log enables the PI to track the correspon- investigational product, marketing authorization
dence between enrolled subject name and allo- (i.e., CE mark), and procedures for shipment and
cated study code. This list is kept confidential, storage must be available on-site.
well secured, and/or protected by password if it is Finally, if the study is being sponsored, other
digital. Only the PI and delegated staff should documents may apply, such as an up-to-date
have access to this list. insurance, financial agreements between
involved parties, monitoring visit reports to doc-
ument visits, and findings of the monitor and any
14.2.9 Study Staff and Training Log relevant communication other than site visits
including letters, meeting notes, notes of tele-
The PI can delegate tasks to his staff through the phone calls, etc.
signature sheet. The latter documents signatures
and initials of all persons authorized (co-
investigators) to screen subjects, assess inclu- 14.3 Monitoring the Study
sion/exclusion criteria, gather patient consent,
make entries and/or corrections on CRFs, per- Once the regulatory binders contain all required
form measurements, etc. It can be completed documents, the study can theoretically start.
with a training log to prove that all staff members Beyond GCP requirements, it may be recom-
have been trained for the study procedures. mended to conduct the study according to the
14 How to Document a Clinical Study and Avoid Common Mistakes in Study Conduct? 125
quality of manuscript foreseen. For example, the The above list is not exhaustive but includes
PI must be aware that a loss to follow-up greater the main responsibilities of the PI during study
than 20% will inevitably lead to a lower level of conduct. Further guidance on investigator
evidence of the results. He/she should thus ensure responsibilities has been made available by the
that the follow-up of patients as well as all aspects FDA [5].
126 C. Mouton et al.
cated person of the team for further question on –– The participation to the study: participation on
the study and/or inform that he/she will be con- a voluntary basis, the possibility to refuse to
tacted by a team member (naming a specific per- participate and to withdraw from the study at
son may here be useful so that the patient is any time without penalty or loss of benefits,
waiting for this phone call). any compensation and amount if applicable,
reasons for possible early termination of par-
14.3.3.4 H ow to Report Prescreened ticipation/study (i.e., patient not compliant
Patients? with visits, medical condition interfering with
Screening logs should remain restricted in con- study protocol), and new information if new
tent but should document as many eligibility cri- findings become available that may be rele-
teria as possible [15]. Since there is no informed vant to the subject’s willingness to continue
consent from the patient, research procedures or participation in the study.
interventions should not yet take place. –– Data protection: confidentiality of personal
Practically, the site may create a prescreening information recorded and records identifying
sheet to follow track for screened/contacted the subject, protection of privacy, and access
patients. This sheet should include screening to pseudonymized data from third parties if
date, eligibility criteria that can be assessed applicable.
using the medical record (i.e., patients planned –– Additional medication administered if needed,
for ACL reconstruction, age, type of graft procedure, and insurance if a damage occurs.
planned), and contact date. Only procedures that –– Approval of the study by EC/IRB.
are performed as part of the routine clinical prac- –– Principal investigator/funder of study/contact
tice may be looked at, and only results used for person for any question related to the study
determining study eligibility should be screened and in case of adverse event
before obtaining consent [17].
14.3.3.6 H ow to Record Patient
14.3.3.5 H ow to Instruct the Patient Consent?
About the Study? Informed consent is documented by means of a
Participants should receive information both in written, signed, and dated ICF. Each consent
written and oral formats (call, visit) in a nontech- form (usually a minimum of 2) must be dated and
nical language. The PI or the delegated person signed both by the patient and the investigator.
should capture the patient’s perspective and take The PI will be responsible for any misconduct on
time to answer his/her questions. Patients are the process of ICF (wrong date, falsified signa-
indeed less likely to enter studies that they find ture, missing consent). One consent is then kept
difficult to understand and that require multiple for the regulatory binders, and one is given to the
follow-ups [29]. It may be worth mentioning participant. The hospital may also require a copy
whether visits are part of the clinical routine or for the electronic health record.
not and explicitly mention additional visits and
procedures. Patients may indeed not realize that 14.3.3.7 W hat If New Information
some procedures will solely be performed for About the Medical Device/
research purposes and are not required for their Drug Becomes Available
medical care. During the Study or If
Information given to patients should include the Protocol Procedure
information about: Changes?
If new information on risk and/or benefits arises
–– The study: its purpose, participation duration or if major protocol amendments occur during
and number of visits, subject’s involvement the study, the PI should ensure that subjects are
and responsibilities, side effects, risks and informed and re-consent to participate in the
benefits, and alternatives to treatment. study.
128 C. Mouton et al.
14.3.3.8 W hat to Do If team should keep track of visits and windows in
the Recruitment Is Slower a sheet/file. Questionnaires to be filled in by the
as Expected? patient, if applicable, may be sent some time
Poor subject recruitment and retention is 1 of the before the visit, together with a reminder of the
15 common reasons for failure in clinical research appointment. If the patient does not come to the
[11]. The PI should be able to predict recruitment visit, the study staff should make all efforts pos-
rate according to his facilities, patients, and pro- sible to contact the patient to avoid a loss to fol-
tocol. He/she should also be able to identify any low-up. These efforts should be documented.
factors that could prevent the team to properly Any definitive loss to follow-up should be
recruit for the study. During the study conduct, reported in the dropout log.
the PI should thus review the recruitment on a
regular basis (i.e., with the help of the screening 14.3.4.2 Compliance with Protocol
log). If the recruitment rate is lower than expected, and Protocol Amendments
a discussion with the study staff may help to Adherence to the study protocol is essential. Any
identify the difficulties met in practice. The study deviation from or violation of the protocol should
team may also consider to adapt the protocol and be documented [8, 28]. Deviations from the pro-
information given to the patient if too complex or tocol are defined as changes or noncompliance
leading to confusion (i.e., too many visits, diver- with the study protocol that does not have a sig-
gence from routine care) or to increase the dura- nificant effect on the participant’s rights, safety,
tion of the recruitment period [29]. or welfare (i.e., missing visits or data). Protocol
violation may affect the participant’s rights,
safety, or welfare (i.e., inclusion/exclusion crite-
14.3.4 Study Visits: Compliance ria not met, failure to obtain valid informed
with the Approved Protocol consent).
and Protocol Amendments If substantial protocol modifications become
necessary during the study conduct (either to
14.3.4.1 Study Visits avoid protocol deviation/violation or for other
The PI should consider providing the study staff reasons), amendments to the protocol must first
with a brief worksheet/checklist indicating pro- be approved by the IEC/IRB. Overall, about two-
cedures to be performed at each visit. It can serve thirds of protocols are reported to require one or
as a reminder and ensure that procedures are more amendments [14, 19] although the latter
completed in a timely manner without any have an additional impact on study costs, time-
missing data. line, and resources [19, 24]. One-third of these
To efficiently organize the study visits, the amendments are considered to be avoidable if
study team should consider the following aspects: inconsistencies/errors in the protocol and diffi-
visit windows (range of days in which a subject culties in recruiting study volunteers are better
visit can occur according to the study protocol), anticipated.
room and staff availability, availability of support The definition of “substantial” may vary
department (i.e., X-ray), and whether procedures according to legal authorities but generally
are part of the clinical routine or not. includes all modifications that may impact the
The study staff should be careful to avoid long safety or physical or mental integrity of the sub-
waiting times for study visits. Ideally, the visits jects or the scientific value of the study. According
should coincide with routine visits if applicable to the EU guidance (Sects. 14.3.3 and 14.3.4)
and be short [23]. The follow-up visits may also [12], substantial amendments include (non-
be proposed at a location convenient for the exhaustive list):
patient and/or outside working hours. The
appointments should be scheduled as soon as –– Amendments to the protocol: Any changes in
possible (i.e., at discharge if any surgery), and the the population studied, procedures, and moni-
14 How to Document a Clinical Study and Avoid Common Mistakes in Study Conduct? 129
toring visits including changes of the main All AE should be followed, and detailed writ-
objective or endpoints or changes in the ten reports should be provided until the risk is
recruitment procedure (inclusion/exclusion eliminated or AE resolved. If judged necessary
criteria, additional group of patients, etc.) are by the authorities, the protocol may be amended
considered as substantial changes. This does and ICF updated to inform patients about new
not include modification of the title, addition/ risks. Alternatively, the study may be termi-
deletion to tertiary endpoints, minor increase nated prematurely.
in the duration of the study (<10% of the over-
all time), or increase of >10% of the overall
time of the study provided that monitoring
visits are unchanged (i.e., no additional visit).
–– Amendments concerning the product studied: Fact Box 14.2
any new information on the study product or The PI assumes the responsibility for
any new information made available by the proper conduct of the study even if tasks
manufacturer. are delegated.
–– Amendments to other documents: any changes Communicate not only with the study
of sponsor/principal investigator or revocation team, but inform all departments/doctors
of the product marketing authorization. involved in the patient’s care about the
ongoing study.
Substantial amendments must be approved by Aspect to organize for an efficient study
the IEC/IRB. As for non-substantial amend- conduct:
ments, it is possible to record them and submit
them simultaneously with the notification of a –– Identification of eligible patients.
substantial amendment or at least inform the –– Contact with patients and information
IEC/IRB. about the study (written and oral).
For each new protocol version, date and ver- –– Consent signature and on-site documen-
sion identifier should be properly reported, and tation (screening log, study code
modifications should be highlighted, and a allocated to the patient).
detailed summary of protocol changes including –– Study visits (according to visit w indows,
old text, new text, and rationale for change should room and staff availabilities, and clinical
be provided (SPIRIT guidelines—Item 3—date routine).
and version identifier) [10]. It is usually recom- –– Procedures to be performed at each
mended to maintain both a track change and a visit.
clean version of the protocol to the IRB/IEC –– Contact with patients missing a visit.
including a document history.
Bibliography
Fact Box 14.3
–– Respect the confidentiality of data on 1. ICH tripartite guideline for good clinical practices
the CRF: Use the participant study code. E6 (R1). 1996. http://www.ich.org/fileadmin/Public_
Web_Site/ICH_Products/Guidelines/Efficacy/E6/
–– Control data: Identify regularly missing E6_R1_Guideline.pdf.
data, control inclusion/exclusion crite- 2. NCCIH clinical research toolbox—essential docu-
ria, and check for abnormal values. All ments/regulatory binder. https://nccih.nih.gov/grants/
information should be accurate and ver- toolbox#binder.
3. (2016) regulation (EU) 2016/679 of the European
ifiable (data entered in the CRF should Parliament and of the council of 27 April 2016 on the
be identical to the data in the patient protection of natural persons with regard to the pro-
medical record). No field should be left cessing of personal data and on the free movement of
blank (report “not done,” “not applica- such data, and repealing Directive 95/46/EC (General
Data Protection Regulation). Official Journal L 119,
ble,” “unknown”). Any change or cor- 4.5.2016, p. 1–88. Official Journal L 119 1–88.
rection of the CRF should be dated, 4. Administration USDoHaHSFaD. Guidance for
initialed, and explained. industry—electronic source data in clinical inves-
–– Respect privacy: Only release informa- tigations. 2013. https://www.fda.gov/downloads/
Drugs/GuidanceComplianceRegulatoryInformation/
tion to authorized team and parties (i.e., Guidances/UCM328691.pdf.
sponsor if applicable). 5. Administration USDoHaHSFaD. Guidance for indus-
try—investigator responsibilities—protecting the
rights, safety, and welfare of study subjects. 2009.
https://www.fda.gov/downloads/Drugs/.../Guidances/
UCM187772.pdf.
Take-Home Message 6. Bargaje C. Good documentation practice in clinical
• Principal investigators are responsible for the research. Perspect Clin Res. 2011;2(2):59–63.
7. Bellary S, Krishnankutty B, Latha MS. Basics of case
entire study conduct, independent of whether report form designing in clinical research. Perspect
study related tasks are delegated or not. Clin Res. 2014;5(4):159–66.
• Compliance with legal requirements and 8. Bhatt A. Protocol deviation and violation. Perspect
GCPs should be ensured during the entire Clin Res. 2012;3(3):117.
9. Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA,
duration of the study. Glasziou PP, Irwig L, et al. STARD 2015: an updated
• Regulatory binders should be up to date, the list of essential items for reporting diagnostic accu-
informed consent process should be per- racy studies. BMJ. 2015;351:h5527.
formed according to the participants’ rights, 10. Chan AW, Tetzlaff JM, Gotzsche PC, Altman DG,
Mann H, Berlin JA, et al. SPIRIT 2013 explanation
and data confidentiality and privacy should be and elaboration: guidance for protocols of clinical tri-
maintained. als. BMJ. 2013;346:e7586.
• The organization of the study conduct from 11. Clark GT, Mulligan R. Fifteen common mistakes
patient screening to data quality will inher- encountered in clinical research. J Prosthodont Res.
2011;55(1):1–6.
ently follow the Deming circle: Plan, commu- 12.
Communication from the commission—detailed
nication with the team, agreement on task guidance on the request to the competent authori-
delegation, planning of patient enrollment and ties for authorisation of a clinical trial on a medicinal
study visits, and respect of confidentiality; product for human use tnos.
13. Theisen D, Moksnes H, Hardy C, Engebretsen L, Seil
Do, recruitment, data collection, and restric- R. How to organise an international register in compli-
tion of the access to study records; Check, ance with the European GDPR: walking in the foot-
regular discussion with the team to identify steps of PAMI (Paediatric ACL Monitoring Initiative).
problems (i.e., adverse events, low rate of 14. Decullier E, Lhéritier V, Chapuis F. The activity of
French Research Ethics Committees and charac-
enrollment) and regular check of the database teristics of biomedical research protocols involving
to correct for missing data and inconsisten- humans: a retrospective cohort study. BMC Med
cies; and Act, adapt procedures if necessary. Ethics. 2005;6:9.
132 C. Mouton et al.
15. Elm JJ, Palesch Y, Easton JD, Lindblad A, Barsan double blind randomised trial of “opt-in” versus “opt-
W, Silbergleit R, et al. Screen failure data in clini- out” strategies. BMJ. 2005;331(7522):940.
cal trials: are screening logs worth it? Clin Trials. 23. Kaur M, Sprague S, Ignacy T, Thoma A, Bhandari M,
2014;11(4):467–72. Farrokhyar F. How to optimize participant retention
16. FDA. Guidance for Institutional Review Boards
and complete follow-up in surgical research. Can J
and clinical investigators—recruiting study sub- Surg. 2014;57(6):420–7.
jects—information sheet. 2018. https://www.fda.gov/ 24. Lösch C, Neuhäuser M. The statistical analysis of a
RegulatoryInformation/Guidances/ucm126428.htm. clinical trial when a protocol amendment changed the
17. FDA. Guidance for Institutional Review Boards
inclusion criteria. BMC Med Res Methodol. 2008;8:16.
and clinical investigators—screening tests prior to 25. Moher D, Hopewell S, Schulz KF, Montori V,
study enrollment—information sheet. 2018. https:// Gotzsche PC, Devereaux PJ, et al. CONSORT 2010
www.fda.gov/RegulatoryInformation/Guidances/ explanation and elaboration: updated guidelines for
ucm126430.htm. reporting parallel group randomised trials. BMJ.
18. Getz KA, Wenger J, Campo RA, Seguine ES,
2010;340:c869.
Kaitin KI. Assessing the impact of protocol design 26. Ross S, Grant A, Counsell C, Gillespie W, Russell
changes on clinical trial performance. Am J Ther. I, Prescott R. Barriers to participation in ran-
2008;15(5):450–7. domised controlled trials: a systematic review. J Clin
19. Getz KA, Zuckerman R, Cropp AB, Hindle AL,
Epidemiol. 1999;52(12):1143–56.
Krauss R, Kaitin KI. Measuring the incidence, causes, 27. Schroy PC 3rd, Glick JT, Robinson P, Lydotes MA,
and repercussions of protocol amendments. Drug Inf Heeren TC, Prout M, et al. A cost-effectiveness analy-
J. 2011;45(3):265–75. sis of subject recruitment strategies in the HIPAA era:
20. Haidich AB, Ioannidis JP. Effect of early patient
results from a colorectal cancer screening adherence
enrollment on the time to completion and publica- trial. Clin Trials. 2009;6(6):597–609.
tion of randomized controlled trials. Am J Epidemiol. 28. Sweetman EA, Doig GS. Failure to report protocol
2001;154(9):873–80. violations in clinical trials: a threat to internal valid-
21. Hewison J, Haines A. Overcoming barriers to recruit- ity? Trials. 2011;12:214.
ment in health research. BMJ. 2006;333(7562):300–2. 29. Thoma A, Farrokhyar F, McKnight L, Bhandari
22. Junghans C, Feder G, Hemingway H, Timmis A,
M. Practical tips for surgical research: how to optimize
Jones M. Recruiting patients to medical research: patient recruitment. Can J Surg. 2010;53(3):205–10.
Framework for Selecting Clinical
Outcomes for Clinical Trials
15
Adam J. Popchak, Andrew D. Lynch,
and James J. Irrgang
evaluate outcomes at the level of an impairment. a child) that rely on self-perception of the symp-
Likewise, a trial with a purpose of reducing a dis- toms, impairments, and abilities [21]. Patient-
ability should have an outcome measure capable reported outcomes of health-related quality of
of assessing levels of disability. Ultimately, the life (HRQOL) can be either general or specific,
outcome measure should serve the purpose of the with pros and cons being present with each type
research trial to provide information necessary to of measures.
draw a conclusion (Fig. 15.1). Generic health status measures are applicable
to diverse populations and usually measure mul-
tiple aspects of health such as physical, emo-
15.2 Measures of Activity tional, and social. The common examples of a
and Participation general health measure are the Medical Outcomes
Study 36-Item Short Form (SF-36) [25] and
There are two main approaches to assess activity 12-Item Short Form (SF-12) [35]. A more con-
and participation, performance-based testing and temporary measure is the PROMIS Global-10
patient-reported measures [21]. Performance- [9], which utilizes item response theory (see
based measures (PBM) rely on a rater’s assess- chapter on adaptive testing). General health mea-
ment of a patient’s performance on specific sures permit comparisons across populations
physical tasks [21]. Performance may be with different health conditions as well as being
assessed qualitatively with rating scales for the more likely to detect unexpected effects of an
level of assistance needed or quality of move- intervention. However, they are less responsive
ment or quantitatively based on the amount of than specific measures of health status; are sus-
time required to complete a task, the tolerance ceptible to being unable to distinguish between
for completing a task, or the measurable results groups secondary to their scores being either as
of a test such as jumping distance, balance, good as possible or as bad as possible in both the
speed, or time. For example, when considering treatment and control groups when used for high
outcomes related to movement in the context of or low functioning individuals, otherwise known
acute care or rehabilitation settings, level of as ceiling and floor effects; generally have con-
assistance works particularly well. Advantages tent that is less relevant to the patient and clini-
of PBM of physical function compared to cian; and tend to be longer and more difficult to
patient-reported measures include superior score.
reproducibility, greater sensitivity to change Specific health status measures focus on con-
(responsiveness), and less vulnerability to exter- tent specific to the primary condition or popula-
nal influences or biases [8, 21]. tion of interest, potentially creating a more
Patient-reported outcomes (PROs) are com- responsive instrument. To achieve this, specific
pleted by the patient or a proxy (e.g. a parent for health status measures include only the aspects of
Fig. 15.1 Determining
the primary outcome Level of Outcome Measurement
measure
Activity Participation
Level of Intervention
Impairment
Limitation Restriction
Impairment X X
Activity Limitation X X
Participation Restriction X X
15 Framework for Selecting Clinical Outcomes for Clinical Trials 135
HRQOL that are relevant to the condition or pop- patients with varying levels of independence
ulation being studied. Specific health status mea- [11]. The PSFS has shown to be a valid, reliable,
sures have the distinct advantages of improved and responsive outcome measures for a variety of
responsiveness, lower respondent burden, are musculoskeletal problems [10]. Patient-specific
easy to score and interpret, and are more likely to scales are applicable to a large number of condi-
be accepted by patients and clinicians secondary tions, are efficient and easy to administer, and
to having greater relevance to the condition of have adequate psychometric characteristics, in
interest. However, specific health measures do particular responsiveness to change over time.
not measure all aspects of health that may influ- However, patient-specific scales limit compari-
ence the overall status, nor do they allow for sons between patients secondary to the lack of
comparison between different disease states and/ uniformity of content items that are determined
or populations. by each individual.
Disease-, region-, or patient-specific scales As outcome measures are being utilized in
exist as specific health status measures. Disease- clinical trials, the measures should be standard-
specific scales are designed for a particular dis- ized to ensure the ability to compare results,
ease process or pathology. The content reflects thereby working with a “common currency” of
symptoms, activity limitations, and participation effect. In the absence of such standardization,
restrictions that are experienced by the individual results and conclusions of studies are unable to
with the disease. Examples of disease-specific be compared, and additional evidence to support
scales used include the Lysholm [33] and or refute a conclusion remains missing. There has
Cincinnati Knee Rating Scale [27] (knee liga- been some effort to document core outcome sets
ment scales), the Western Ontario and McMaster for a body region or condition. A core outcome
Universities Osteoarthritis Index (WOMAC) [4] set establishes a minimum data set that should be
(osteoarthritis-specific scales), and the Western collected for a particular condition to allow for
Ontario Rotator Cuff Index (WORC) [17] (rota- comparison amongst studies.
tor cuff dysfunction).
Region-specific scales are designed for use on
a wide variety of disorders or impairments that 15.3 Psychometric Considerations
affect a particular region. The content reflects all
possible symptoms, activity limitations, and par- Psychometric considerations are vitally impor-
ticipation restrictions that can arise from impair- tant when selecting outcome measures. Primarily,
ment of a specific region. Examples of investigators must be concerned with the out-
region-specific scales include the Neck disability come measure’s reliability, validity, and respon-
Index (NDI) [34]; Penn Shoulder Score (PSS) siveness. Additional considerations include the
[22]; Disabilities of the Arm, Shoulder, and Hand purpose of the measurement, the relevance to the
(DASH) [12]; Oswestry Disability Index (ODI) patient population, and the clinician or respon-
[7]; and the Knee Outcome Survey Activities of dent burden. All of these factors should be con-
Daily Living Scale (KOS-ADLS) [15]. sidered when selecting an appropriate outcome
Patient-specific scales are defined by the measure for clinical trials.
patient, often with the patient providing a list of
3–5 relevant activities they are unable to do or are
have difficulty performing. Generally, the activi- 15.3.1 Reliability
ties are then given a numerical value rating, often
on an 11-point scale, where 0 is “unable to do” Reliability is the consistency of measurement
and 10 is “able to do at preinjury level”. and how much error one can expect in the cho-
Specifically, the Patient-Specific Functional sen outcome. Acceptable reliability levels are
Scale (PSFS) is used primarily in patients with necessary to ensure that the error associated
musculoskeletal disorders and can be used in with the measurement is small enough to detect
136 A. J. Popchak et al.
actual changes in what is being measured [31]. change rapidly, the time between repeat mea-
Therefore, reliability can be conceptualized as surements should be longer [30]. As such, test-
the dependability or the predictability of a retest reliability is not a fixed property of an
measure [30]. Reliability is fundamental to instrument but rather a degree of measurement
clinical research. In the absence of it, the consistency when applied to certain populations
researcher cannot be confident in the data that under particular measurement conditions.
is collected, and no definitive conclusion can Therefore, the population in the study which
be made from it [30]. evaluated test-retest reliability must be the repre-
Reliability can be assessed on multiple levels. sentative of the target population of interest. The
When assessing whether multiple items measure reliability coefficients used for test-retest reli-
the same construct, such as with questionnaires ability depend on the type of data that is present.
and interviews, reliability is related to internal For interval or ratio data, the Pearson correlation
consistency [30]. Internal consistency is the coefficient and the intraclass correlation coeffi-
degree to which all items on a scale consistently cient (ICC) are commonly used for normally dis-
measure the underlying condition [30]. Internal tributed data. Test-retest reliability for ordinal or
consistency applies to measures that consist of nominal data is measured with percent agree-
multiple items and therefore is related to errors of ment or Cohen’s kappa. Reliability values are
measurement that are linked to content sampling. placed on a common scale of 0–1.0. If a measure
Two primary approaches exist to measure inter- is perfect, without error, the reliability is 1.0. If
nal consistency, split-half reliability, and test item the measures are full of error, the reliability is
reliability [30]. Split-half measures the extent to 0.0. Generally, coefficient values of less than
which all parts of the test contribute equally to 0.50 indicate poor reliability, 0.50–0.75 repre-
what is being measured and is based on dividing sent moderate reliability, and those greater than
the items on an instrument into two halves and 0.75 suggest good reliability [30]. In clinical tri-
correlating the results [30]. The split-half method als, to ensure the valid interpretation of the find-
is a quick and relatively easy way to establish ings, measures should generally exceed 0.90.
reliability. However, its use is limited to larger However, acceptable reliability is often a judge-
questionnaires which only measure one con- ment call that depends on the knowledge of how
struct. Test item reliability assesses internal con- precise the measurements must be in order to be
sistency through item analysis, where each item used in a meaningful way [30].
on the test is examined to determine how it relates In clinical testing, measurements are rarely
to every other item on the test and to the instru- perfectly reliable [30]. In addition to the limita-
ment as a whole [30]. By examining how each tions of measurement instruments, human sub-
item relates to one another, the test item method ject research adds a level of inconsistency that
does not require nearly the length of a test that the must be acknowledged [30]. The standard error
split-half method necessitates. of measurement (SEM) is a measure of precision
Test-retest reliability (intra-tester and inter- that is used to determine such limitations. The
tester) is the degree to which the score remains SEM estimates how well repeated measures are
stable when there is no change in the underlying distributed around a true score. This standard
condition being measured. Test-retest reliability error is directly related to the reliability of a test,
is estimated by measuring individuals two or with a larger SEM being associated with lower
more times over an interval when the individu- reliability and less precision in the scores
al’s condition is expected to remain stable. The obtained.
time between the repeat measurements is an Precision of the measurement can also be
important consideration. When the condition assessed via confidence intervals. A confidence
being measured is expected to change quickly, interval gives an estimated range of values, which
the time between repeat measurements should be is likely to obtain the true score for a number of
short. When the condition is not expected to variations in the population [30]. The specific
15 Framework for Selecting Clinical Outcomes for Clinical Trials 137
limits of the confidence interval are determined to measure. Validity emphasizes the objectives of
by the variability in the data as well as the level of a test and the ability to make inferences from the
confidence the researcher wants to assign to the associated measurements. In essence, validity
point estimate [30]. Confidence intervals typi- determines what you are able to do with the test
cally are presented with 95% confidence but can results [30]. Implied in validity is that the mea-
also be reported as 90 or 99%. The confidence surement has an acceptable level of error or is
interval is reflective of the SEM. The tradition- reliable. Inaccurate or unreliable measurements
ally used 95% confidence interval is determined cannot provide meaningful measurements [30].
by multiplying the SEM by 1.96. Likewise, the Establishing validity of an outcome measure is
90% and 99% confidence intervals are calculated not as straightforward as establishing reliability
by multiplying the SEM by 1.64 and 2.58, [30]. Often there is no obvious manner to deter-
respectively. mine if an outcome is measuring exactly what it
The minimal detectable change (MDC) is an is intending to measure. Therefore, the researcher
absolute measure of reliability or error and is must determine if the measure has enough valid-
used to determine the threshold for true change in ity to be utilized in research and practice through
a measure, i.e. what amount of change must be various means.
seen to be sure that the difference is not related to Validation procedures are based on the type of
measurement error of the instrument [3, 19]. evidence that can be provided to determine an
When interpreting scores on an outcome mea- outcome’s validity [30]. Content, criterion-
sure, knowing the minimal detectable change is related, and construct validity are essential ele-
important to ensure the change is not the result of ments that a researcher must determine to some
the measurement error alone. The MDC can be degree before using an outcome measure.
determined with knowledge of the SEM and can Content validity is the degree to which items
be applied to findings in clinical research trials. on the instrument adequately reflect the content
domain that is being measured. Specifically, con-
tent validity addresses the question “are all
15.3.2 Validity important item content included on the instru-
ment and all irrelevant item content excluded?”
Traditionally, validity describes the degree to Content validity is useful with questionnaires and
which an instrument measures what it is intended inventories [30].
Criterion-related validity is the degree to lying construct that one is intending to measure.
which the score on the instrument reflects the Construct validity requires one to demonstrate
current or future standing on a gold standard. hypothesized relationships with other measures
Essentially, this is an issue that is related to prog- of the construct.
nosis and how well the score predicts the status of
the individual. When appropriate criterion valid-
ity is present, the outcomes obtained from one 15.3.3 Responsiveness
test could be used as a substitute for an estab-
lished gold standard [30]. There are two standard Responsiveness possesses two major elements,
sub-forms of criterion validity. One sub-form of internal and external responsiveness [13]. Internal
criterion validity is concurrent validity, which responsiveness is the degree to which the score
establishes validity when two measures are changes as the underlying condition that is being
obtained at a similar time. Establishing concur- measured by the scale changes [13]. Essentially it
rent validity is important when one test is consid- is the instrument’s ability to detect change over
ered more efficient, economical, or practical time. External responsiveness reflects the extent
compared to the established gold standard [30]. to which in a measure corresponds to actual
The other form of criterion validity is predictive changes in a reference measure of health status
validity, which specifically establishes that the [13]. With external responsiveness, the measure
outcome on one test can be used to predict the is not of primary interest but rather the change in
score of the criterion test. Strong criterion valid- the external health status standard [13]. In con-
ity allows the researcher or clinician to utilize the trast to internal responsiveness, external respon-
more efficient and generalizable outcome while siveness will depend on the choice of the
acknowledging the similarities to the gold reference health standard and not on the treat-
standard. ments under investigation [13].
Finally, construct validity is the degree to There is a lack of consensus on the appropri-
which scores on the instrument reflect the under- ate statistic for assessing responsiveness; thus
more than one statistic is often reported [13]. As with sample size. An effect size of 0.5 implies the
the patient’s condition improves or deteriorates, average change score is equal to one-half of the
the score on the measure should change in a simi- standard deviation of the initial scores. A general
lar manner. However, there are a number of fac- interpretation of effect sizes is that small effect
tors that can affect the magnitude of change for a sizes are generally on the magnitude of 0.20,
given instrument. Factors that affect responsive- medium are 0.50, and large are around 0.80 [5].
ness include the patient group (i.e. an acute ver- The SRM is an alternative to the effect size
sus a chronic condition), the type of treatment, and is used to gauge the responsiveness of
the timing of the data collection, and the con- instruments to actual clinical change. The SRM
struct of change [2]. Responsiveness statistics is determined by dividing the mean change score
include the effect size, the standardized response by the standard deviation of the change score. In
means (SRM), the minimally clinically important clinical research and in determining which out-
difference (MCID), and the patient acceptable come measure to use, the measure with the larger
symptom state (PASS) [2]. SRM will be more able to detect clinical change
The effect size relates change to the standard [1, 23].
deviation of the initial scores and is expressed in The MCID [16] was first described in order to
standard deviation units. It is a way of quantify- better determine if statistically significant change
ing the extent of change without confounding it in an outcome measure also had clinical signifi-
15. Irrgang JJ, Snyder-Mackler L, Wainner RS, Fu FH, 25. McHorney CA, Ware JE Jr, Raczek AE. The
Harner CD. Development of a patient-reported mea- MOS 36-Item Short-Form Health Survey (SF-36):
sure of function of the knee. J Bone Joint Surg Am. II. Psychometric and clinical tests of validity in mea-
1998;80(8):1132–45. suring physical and mental health constructs. Med
16. Jaeschke R, Singer J, Guyatt GH. Measurement
Care. 1993;31:247–63.
of health status. Ascertaining the minimal clini- 26. Nagi SZ. A study in the evaluation of disability and
cally important difference. Control Clin Trials. rehabilitation potential: concepts, methods, and
1989;10(4):407–15. procedures. Am J Public Health Nations Health.
17. Kirkley A, Alvarez C, Griffin S. The development 1964;54(9):1568–79.
and evaluation of a disease-specific quality-of-life 27. Noyes FR, Barber SD, Mooar LA. A rationale for
questionnaire for disorders of the rotator cuff: the assessing sports activity levels and limitations in knee
Western Ontario Rotator Cuff Index. Clin J Sport disorders. Clin Orthop. 1989;(246):238–49.
Med. 2003;13(2):84–92. 28. Paulsen A, Odgaard A, Overgaard S. Translation,
18. Kirshner B, Guyatt G. A methodological frame-
cross-cultural adaptation and validation of the Danish
work for assessing health indices. J Clin Epidemiol. version of the Oxford Hip Score: assessed against
1985;38(1):27–36. generic and disease-specific questionnaires. Bone
19. Kovacs FM, Abraira V, Royuela A, Corcoll J, Alegre Joint Res. 2012;1(9):225–33.
L, Tomás M, et al. Minimum detectable and minimal 29. Porter ME. What is value in health care? N Engl J
clinically important changes for pain in patients with Med. 2010;363(26):2477–81.
nonspecific neck pain. BMC Musculoskelet Disord. 30. Portney L, Watkins M. Foundation of clinical
2008;9(1):43. research: application to practice. Norwalk: Appleton
20. Kvien TK, Heiberg T, Hagen KB. Minimal clinically & Lange; 1993.
important improvement/difference (MCII/MCID) and 31. Rankin G, Stokes M. Reliability of assessment tools
patient acceptable symptom state (PASS): what do in rehabilitation: an illustration of appropriate statisti-
these concepts mean? Ann Rheum Dis. 2007;66(Suppl cal analyses. Clin Rehabil. 1998;12(3):187–99.
3):iii40–1. 32. Stucki G, Liang M, Stucki S, Katz J, Lew
21. Latham NK, Mehta V, Nguyen AM, Jette AM,
R. Application of statistical graphics to facili-
Olarsch S, Papanicolaou D, et al. Performance- tate selection of health status measures for clinical
based or self-report measures of physical func- practice and evaluative research. Clin Rheumatol.
tion: which should be used in clinical trials of 1999;18(2):101–5.
hip fracture patients? Arch Phys Med Rehabil. 33. Tegner Y, Lysholm J. Rating systems in the evalu-
2008;89(11):2146–55. ation of knee ligament injuries. Clin Orthop.
22. Leggin B, Iannotti J. Shoulder outcome measurement. 1985;(198):42–9.
Disorders of the shoulder: diagnosis and manage- 34. Vernon H, Mior S. The neck disability index: a study
ment. Philadelphia: Lippincott Williams & Wilkins; of reliability and validity. J Manipulative Physiol
1999. p. 1024–40. Ther. 1991;14(7):409–15.
23. Liang MH, Lew RA, Stucki G, Fortin PR, Daltroy 35. Ware JE Jr, Kosinski M, Keller SD. A 12-Item Short-
L. Measuring clinically important changes with Form Health Survey: construction of scales and pre-
patient-oriented questionnaires. Med Care. liminary tests of reliability and validity. Med Care.
2002;40(4):II-45–51. 1996;34(3):220–33.
24. McHorney CA, Tarlov AR. Individual-patient moni- 36. International Classification of Functioning, Disability
toring in clinical practice: are available health status and Health (ICF) [press release]. Geneva: World
surveys adequate? Qual Life Res. 1995;4(4):293–307. Health Organization; 2001.
Advances in Measuring Patient-
Reported Outcomes: Use of Item
16
Response Theory and Computer
Adaptive Tests
Andrew D. Lynch, Adam J. Popchak,
and James J. Irrgang
example, the Patient-Reported Outcome again poor. Considering these results, the Pain
Measurement Information System (PROMIS) Interference CAT may not be suitable for indi-
Physical Function Item Bank is designed to mea- viduals for whom pain is only a minor inconve-
sure the physical function of individuals in the nience, and the Physical Function CAT may not
general population and to distinguish between be suitable for individuals who function at the
someone who is functioning “normally” and highest end of the spectrum. Each of these scales
someone who is limited in some fashion. It was may benefit from replenishment of the item bank
not designed to accurately measure the function of to expand the range of the latent trait being mea-
either elite athletes or those who are nearly bedrid- sured (see below).
den. One way to assess the measurement capabil-
ity of a CAT is to graphically look at a representation
of the location vs. precision. We graph the location 16.3 CAT Administration
of the score along the x-axis and the precision Parameters
associated with that score on the y-axis. Generally,
a U-shaped graph is seen with a range of scores When setting up the CAT parameters, the user
that are associated with a precision (standard error) can create stopping rules for administration of
of less than 3.3 in the center of the graph and less items. The most intuitive of these stopping rules
precise scores at either end. is the number of items to administer. Generally,
However, some measures do not follow this the user can state the minimum and maximum
U-shape, as can be seen in the ability of the number of items to administer, which generally is
PROMIS Pain Interference CAT and Physical set at a minimum of 4 items and a maximum of
Function CAT in Fig. 16.1. Each measure is asso- 12 items for the PROMIS instruments.
ciated with good precision (i.e., standard error of The second stopping rule involves the overall
less than 3.3 for scores between 40 and 60) (the- precision of the location estimate. When estimat-
tas from −1 to 1). However, for Pain Interference ing the position of the individual on the scale of
t-scores less than 40 (scores indicating that pain the latent trait, we want to know how precise that
does interfere with daily function), there is poor estimate is. Having an estimate that your patient
precision (SE > 5). Similarly, for Physical is functioning at the 50th percentile of mobility is
Function scores greater than 55, precision is helpful if the 90% confidence interval (i.e., the
Fig. 16.1 Sample data Standard Error (Y) vs. Theta (X)
(previously unpublished) Assessment Center Administered Follow-Up CATs
for PROMIS CAT scores
6
in individuals with
orthopedic knee
conditions 5
Standard Error
1
–2 –1 0 1 2
Theta
true location) is between the 47th and 53rd per- met after three items, but the minimum was set to
centiles. However, if the 90% confidence interval four, the algorithm will administer a fourth item.
ranges from the 30th to the 70th percentiles, we If the maximum number of items is administered,
are much less confident in our estimate. Therefore, but the precision requirement has not been met,
the administration setup often allows the user to the algorithm will stop administering items.
identify a level of precision with which they are
comfortable for the location estimate. Depending
on the desired use of the data, this can range from 16.4 I nterpreting Score Reports
two to ten standard error points. There is a trad- from Computer Adaptive
eoff in the number of items for being more pre- Tests
cise—with greater required precision, generally
more items must be administered. Scores are typically expressed as a t-score. These
It is important to understand that the number of standardized scores are used when a large, nor-
items to administer generally overrides the preci- mative data set is available and encourages com-
sion requirement. If the precision requirement is parison of the individual to the population
average as is available for the PROMIS measures
[6, 16]. When expressed as a t-score, the expected
Fact Box 16.1: Computer Adaptive Testing mean is 50, and the expected standard deviation
• Computer adaptive testing (CAT) uses is 10. Therefore, we can expect that 68% of indi-
an algorithm to choose an item from an viduals will score between 40 and 60, 90%
IRT-calibrated item bank that will pro- between 30 and 70, and 99.7% between 20 and
vide the most information about an indi- 80. We also then know that a difference of 10
vidual. Therefore, two individuals can points on a t-score between individuals repre-
complete the same CAT-administered sents about one standard deviation.
outcome measure and respond to differ- As mentioned in Chap. 15, psychometric prop-
ent items. erties of an instrument are not fixed. Reliability,
• The score on a CAT for an IRT-calibrated responsiveness, and patient acceptable symptom
measure considers only the responses to state may change depending on the population.
individual items that are administered to Prior to choosing a CAT measure, the investigator
the individual and not responses to a should assess whether it has been used in a similar
complete item bank. population. The PROMIS Physical Function mea-
• Therefore, a CAT does not need to
administer all items to arrive at a score.
• Because the administration is an esti- Fact Box 16.2: t-Score Interpretation
mate of a score, there is an associated • A t-score normalizes an individual score
standard error with the score. The lower relative to a large, representative popu-
the standard error, the more precise the lation who have also completed the out-
estimate. come measure.
• Most often, a minimum of 4 and maxi- • On a t-score, 50 represents the popula-
mum of 12 items are administered; once tion average and 10 points is 1 standard
four items have been administered and deviation.
an acceptable level of precision for the • Because t-scores are normally distrib-
score estimate has been achieved, test- uted, 68% of individuals will score
ing stops. This makes test administra- between 40 and 60, 90% between 30
tion more efficient than administration and 70, and 99.7% between 20 and 80.
of a full item bank or fixed-length out- • Both CAT administration and short form
come measure. administration will provide a t-score.
16 Advances in Measuring Patient-Reported Outcomes: Use of Item Response Theory and Computer… 147
sures have been used routinely in orthopedic pop- 50; however, the location estimates for someone
ulations [1, 2, 12, 15, 18, 21]. who achieves the highest possible score on the
form is not a very precise measure. On the other
hand, raw summed scores that are not at the max-
16.5 Assessment with Short imum are typically associated with a standard
Forms error of 3 or less, which is usually regarded as
acceptable. Therefore, these forms would have an
Computer adaptive tests obviously require com- obvious ceiling effect when measuring physical
puters and depending on the mode of administra- function at the highest level of function.
tion (e.g., Research Electronic Data Capture In the instance where a broad range of a latent
System (REDCap) or measure-specific web site) trait must be measured, it is possible to select two
an Internet connection. However, there are work- versions of the short form—one for the low end
arounds for scenarios in which an Internet- of the spectrum and one for the high end. It is up
connected device capable of administering the to the clinician to judge which version would best
measure is not available. serve the individual patient. In this case, there
Using the item characteristic curves, it is pos- may be a total of 24 items that could be adminis-
sible to create static short forms that resemble tered, but any given individual will only be asked
classic fixed-length patient-reported outcome to complete half of them to arrive at a t-score
surveys. A short form has a fixed number of with an acceptable level of precision.
items, to which the patient responds. There are Alternatively, to determine if the high or low ver-
scores associated with each of the items; how- sion of the short form should be administered, a
ever, scoring a short form is not as simple as single screening item that is highly discrimina-
totaling the points on the form or expressing the tive may be administered first. The response to
achieved points as a percentage of the total points the item would determine if the high or low ver-
possible as is typically done with legacy outcome sion of the short form is administered.
measures.
After totaling the points from all items, the
person scoring the form must refer to a conver- 16.6 Expanding and Improving
sion table to arrive at a t-score and standard error. Content Coverage
Conversion tables are typically available in
administration manuals. IRT-calibrated item banks are superior to classi-
Reviewing a conversion table prior to choos- cally created, fixed-length assessments for mea-
ing a short form can help to identify if a particular suring at the extremes of function, but the item
short form will meet the needs of the clinician- bank must cover the complete range of the
researcher. Because each short form “total score” intended trait for precise measurement [9, 11, 14].
is associated with a t-score and a standard error, The ability to improve the psychometric proper-
the clinician-researcher can review the range of ties of a measure by adding and calibrating new
t-scores that are associated with a particular short items—referred to as item bank replenishment—
form and the associated standard errors. As an is a significant advantage of IRT-based measures
example, multiple PROMIS Physical Function [10]. In the scenario where an item bank is deemed
Short Forms are available from HealthMeasures. insufficient to measure a certain aspect of func-
org (e.g., Physical Function 4a, Physical Function tion, a replenishment study may be completed to
6b, and Physical Function 8b). The maximum augment the item bank. Replenishment studies
t-score for each of these measures is 57, 59, and typically seek to expand the upper or lower
60.1, respectively; however, each is associated boundaries of the item bank to improve the range
with a standard error of 5.9 or greater. These of measurement; however, it is also conceivable
short forms are all capable of measuring physical that an item bank may need better coverage in the
function that is above the population average of middle of the spectrum of the latent trait.
148 A. D. Lynch et al.
In a replenishment study, additional items are should be considered in addition to the general
identified via comparison to other existing item measure.
banks and legacy outcome measures and through It can be argued that a region-specific measure
interviews with relevant stakeholder groups [3, 4, related to the hip or a condition-specific measure
7, 10]. For instance, an item bank may be excel- about osteoarthritis gives more information about
lent at measuring general mobility, but may not a particular patient’s situation. However, it is not
have items that appropriately measure mobility possible to compare scores on a shoulder-specific
that is assisted by a cane, walker, or wheelchair. measure and a knee-specific measure. It is possi-
In this case, existing measures could be con- ble to determine how each of those individuals is
sulted, for example, items, and individuals who impacted in general physical function when mea-
routinely get around with assistance may be sured by a general IRT-based CAT.
asked to provide input on what types of items
should be included or how the items should be
worded. This provides a group of candidate items 16.8 Limitations Associated
that may be used to augment the existing bank. with Computer Adaptive
The item bank calibration method is a more Testing
direct method to calibrate new items. In this
instance, an individual is asked to respond to a Legacy patient-reported outcome measures, with
large portion of the existing item bank (possi- a fixed number of items that are always adminis-
bly the entire item bank) and to each new can- tered, are easy to program into the electronic
didate item. Because there is more information medical record (EMR) for administration in the
with which to calibrate the new items, fewer clinic. However, CAT-based measures require
overall participants are needed. However, each specific, proprietary algorithms to be pro-
participant assumes a greater burden by grammed into the EMR. This is not yet a standard
responding to a large number of items. practice for the EMR; therefore, adoption in clin-
Regardless, an item bank is not a completely ical practice is difficult and has not yet become
static entity. It may be augmented and refined routine. Alternatively, research based programs
through multiple administrations. such as the Research Electronic Data Capture
(REDCap) System have the PROMIS CATs built
into their system. Additionally, the vendors of
16.7 Using CATs in Conjunction each program have websites and applications that
with Legacy Measures can administer the CAT-based PROs, usually at a
cost.
Presently, the majority of IRT-calibrated mea- The CAT-based PRO may also lack specific-
sures that can be administered as CATs are gen- ity to the patient or clinician [5]. Because it is a
eral measures. As an example, the PROMIS has general measure of a specific trait, it contains
tools to measure fatigue, pain intensity, pain general items. This is unlike legacy PROs which
interference, physical function, sleep distur- may be joint or region specific and therefore
bance, anxiety, depression, and ability to partici- contain items about a specific injury, symptom,
pate in social roles and activities as their primary or function. Some individuals comment that the
measures. These are not disease- or condition- CAT-based PROs are not specific to their current
specific measures, and therefore, they will not situation, and therefore they do not see value in
ask specific questions about how a recent rotator completing it. To address and overcome this
cuff tear affects sleep or physical function. If concern, it is imperative that the clinician look
disease-specific or body region-specific informa- at the PRO and discuss the results with the
tion is needed, an additional legacy measure individual.
16 Advances in Measuring Patient-Reported Outcomes: Use of Item Response Theory and Computer… 149
16.9 Resources and Web Sites 10. Haley SM, Ni P, Jette AM, Tao W, Moed R, Meyers
D, Ludlow LH. Replenishing a computerized adap-
tive test of patient-reported daily activity function-
• PROMIS Resources via Healthmeasures. ing. Qual Life Res. 2009;18:461–71. https://doi.
net—http://www.healthmeasures.net org/10.1007/s11136-009-9463-5.
• Activity Measure for Post-Acute Care—http:// 11. Haley SM, Ni P, Ludlow LH, Fragala-Pinkham
MA. Measurement precision and efficiency of mul-
am-pac.com/category/home/ tidimensional computer adaptive testing of physical
functioning using the pediatric evaluation of disability
inventory. Arch Phys Med Rehabil. 2006;87:1223–9.
https://doi.org/10.1016/j.apmr.2006.05.018.
12. Haskell A, Kim T. Implementation of patient-reported
References outcomes measurement information system data
collection in a private orthopaedic surgery prac-
1. Beleckas CM, Padovano A, Guattery J, Chamberlain tice. Foot Ankle Int. 2018;39:517–21. https://doi.
AM, Keener JD, Calfee RP. Performance of patient- org/10.1177/1071100717753967.
reported outcomes measurement information system 13. Hays RD, Morales LS, Reise SP. Item response theory
(PROMIS) upper extremity (UE) versus physical and health outcomes measurement in the 21st century.
function (PF) computer adaptive tests (CATs) in upper Med Care. 2000;38:II28–42.
extremity clinics. J Hand Surg Am. 2017;42:867–74. 14. Jette AM, Haley SM. Contemporary measurement
https://doi.org/10.1016/j.jhsa.2017.06.012. techniques for rehabilitation outcomes assess-
2. Bernholt D, Wright RW, Matava MJ, Brophy RH, ment. J Rehabil Med. 2005;37:339–45. https://doi.
Bogunovic L, Smith MV. Patient reported outcomes org/10.1080/16501970500302793.
measurement information system scores are respon- 15. Lee AC, Driban JB, Price LL, Harvey WF, Rodday
sive to early changes in patient outcomes following AM, Wang C. Responsiveness and minimally impor-
arthroscopic partial meniscectomy. Arthroscopy. tant differences for 4 patient-reported outcomes mea-
2018;34:1113–7. https://doi.org/10.1016/j. surement information system short forms: physical
arthro.2017.10.047. function, pain interference, depression, and anxiety in
3. Bruce B, Fries J, Lingala B, Hussain YN, Krishnan knee osteoarthritis. J Pain. 2017;18:1096–110. https://
E. Development and assessment of floor and ceil- doi.org/10.1016/j.jpain.2017.05.001.
ing items for the PROMIS physical function item 16. Liu H, Cella D, Gershon R, Shen J, Morales LS,
bank. Arthritis Res Ther. 2013;15:R144. https://doi. Riley W, Hays RD. Representativeness of the patient-
org/10.1186/ar4327. reported outcomes measurement information system
4. Bruce B, Fries JF, Ambrosini D, Lingala B, Gandek internet panel. J Clin Epidemiol. 2010;63:1169–78.
B, Rose M, Ware JE Jr. Better assessment of physical https://doi.org/10.1016/j.jclinepi.2009.11.021.
function: item improvement is neglected but essen- 17. McHorney CA, Cohen AS. Equating health sta-
tial. Arthritis Res Ther. 2009;11:R191. https://doi. tus measures with item response theory: illus-
org/10.1186/ar2890. trations with functional status items. Med Care.
5. Cella D, Chang CH. A discussion of item response 2000;38:II43–59.
theory and its applications in health status assessment. 18. Minoughan CE, Schumaier AP, Fritch JL, Grawe
Med Care. 2000;38:II66–72. BM. Correlation of PROMIS physical function upper
6. Cella D, et al. The patient-reported outcomes mea- extremity computer adaptive test with American
surement information system (PROMIS) developed shoulder and elbow surgeons shoulder assessment
and tested its first wave of adult self-reported health form and simple shoulder test in patients with shoul-
outcome item banks: 2005-2008. J Clin Epidemiol. der arthritis. J Shoulder Elb Surg. 2018;27:585–91.
2010;63:1179–94. https://doi.org/10.1016/j. https://doi.org/10.1016/j.jse.2017.10.036.
jclinepi.2010.04.011. 19. Rose M, Bjorner JB, Gandek B, Bruce B, Fries JF,
7. DeWalt DA, Rothrock N, Yount S, Stone AA, Ware JE Jr. The PROMIS physical function item bank
PROMIS Cooperative Group. Evaluation of item can- was calibrated to a standardized metric and shown to
didates: the PROMIS qualitative item review. Med improve measurement efficiency. J Clin Epidemiol.
Care. 2007;45:S12–21. https://doi.org/10.1097/01. 2014;67:516–26. https://doi.org/10.1016/j.
mlr.0000254567.79743.e2. jclinepi.2013.10.024.
8. Fries JF, Bruce B, Cella D. The promise of PROMIS: 20. Tao W, Haley SM, Coster WJ, Ni P, Jette AM. An
using item response theory to improve assessment exploratory analysis of functional staging using
of patient-reported outcomes. Clin Exp Rheumatol. an item response theory approach. Arch Phys Med
2005;23:S53–7. Rehabil. 2008;89:1046–53. https://doi.org/10.1016/j.
9. Haley SM, Ni P, Hambleton RK, Slavin MD, Jette apmr.2007.11.036.
AM. Computer adaptive testing improved accuracy 21. Zdziarski-Horodyski L, et al. An integrated-delivery-
and precision of scores over random item selection in of-care approach to improve patient reported physical
a physical functioning item bank. J Clin Epidemiol. function and mental wellbeing after orthopaedic trauma:
2006;59:1174–82. https://doi.org/10.1016/j. study protocol for a randomized controlled trial. Trials.
jclinepi.2006.02.010. 2018;19:32. https://doi.org/10.1186/s13063-017-2430-5.
Part III
Basics in Statistics: Statistics Made Simple!
Common Statistical Tests
17
Stephan Bodkin, Joe Hart, and Brian C. Werner
TIME
jects. Due to this, no casual effect can be cohort of interest are compared to those of a con-
established. Observational studies can be either trol group.
retrospective or prospective in design. Cross-sectional studies involve data observa-
Observational studies are often inexpensive and tions of a particular study cohort at one time
take less time to complete compared to interven- point with no follow-up. These studies are
tion studies; however, as discussed later in this thought of being a “snapshot” of the cohort of
chapter, their results possess a lower level of evi- interest. Though not as strong as longitudinal
dence. These studies are commonly performed to studies, with collection of similar subjects at dif-
develop hypotheses on which larger scale studies ferent time points post-diagnosis, postsurgery,
can be performed. etc., the researcher may get an idea of how the
characteristic changes over time. A cross-
sectional study is useful to describe the frequency
17.1.2 Observational Research of a characteristic of interest to make associations
Designs with other variables over time.
Randomization
T1 T2 T3 T4
experimental research.
(a) Parallel study design.
Study Sample
(b) Crossover study
design
T1 T2 T3 T4
Randomization
Crossover
Study Sample T1 T2 T3 T4
tions, and 99.7% of the values are within three or differences among groupings of data.
standard deviations of the mean. Inferential statistical tests are divided into para-
Skewness is a measure of symmetry or central metric and nonparametric statistics. Optimally,
tendency of the data distribution. Data distribu- researchers want inferences of the study sample
tions can be skewed to the left (negative) or to the to represent a parameter (i.e., the population
right (positive). An excessive outlier in the data you are studying), so default statistical tests
can often cause skewness of the distribution should be parametric. However, parametric sta-
(Fig. 17.3). tistics are most powerful when the dataset is
Kurtosis describes the peak or variance of the normally distributed. If the dataset is skewed or
distribution. A high kurtosis value would be kurtotic, statistical tests are less robust. There
indicative of a high peak in the data distribution are equivalent nonparametric tests that are simi-
or a low sample variance. A low kurtosis value lar in concept to parametric tests, only more
would represent a flat peak in the data distribu- appropriate for datasets that don’t meet the
tion or a high sample variance. assumptions necessary to justify parametric
tests. The decision on which test to use is based
on the type of data in the data set, the distribu-
17.2.1 Inferential Statistics tion of data, as well as the research question and
study design.
Inferential statistical tests are aimed to general- Parametric statistical tests use the mean and
ize data collected from a representative sample standard deviation of the distribution to compare
to the entire population. These statistics can be groups or identify relationships between vari-
used to test hypotheses in terms of relationships ables. Nonparametric statistical tests use the
medians and ranks of the data and are less sensi-
tive to outliers and more robust. Nonparametric
tests are also applied for categorical data and
samples with a small sample size.
grouping variables in research designs are sex other set alpha) is observed, the test would be
(female, male) and treatment (treatment, con- deemed statistically significant. Therefore, the
trol). This would be analyzed through a 2 × 2 fac- conclusion would be to reject the null hypothesis
torial design due to both independent variables and that there are statistical differences between
having two levels. Factorial designs that are nor- groups. If the p-value is above the established
mally distributed can be analyzed through alpha, the results conclude that any observed dif-
ANOVAs, MANOVAs, ANCOVAs, or a repeated ferences would be due to chance and not due to
measures ANOVA (often called a split-plot any interventions or grouping, therefore, accept-
design). A factorial repeated measures ANOVA ing the null hypothesis.
(or split-plot ANOVA) compares the multiple A Bonferroni correction is used to adjust the
groups over serial time points. For example, a p-value or level needed for statistical signifi-
research study which aimed to see the effect of an cance, when more than one comparison is being
intra-articular injection (treatment group, corti- performed between the groups (Table 17.1). As
costeroid; control group, saline) on knee joint the number of comparisons you make between
pain over time (1-week postinjection, 2-week the group increases, your likelihood of finding a
postinjection, 3-week postinjection) would uti- statistical significant result also increases. To pre-
lize this study design. The two factors in this vent this, or commonly referred to as “protecting
study would be group and time. This design your alpha,” a Bonferroni correction will be
would allow the ability to see the difference in applied to lower your threshold of significance
pain between groups, time points, and/or an for a greater number of comparisons being made.
interaction of group (i.e., treatment allocation) This is applied by dividing your alpha (typically
and time. 0.05) by the number of comparisons you are
making. This new threshold for defining statisti-
cal significance is an adjustment due to the pos-
Clinical Vignette 2 sibility of error from making multiple
The study Age Influences Biomechanical comparisons.
Changes after Participation in an Anterior Statistical power is the ability to detect group
Cruciate Ligament Injury Prevention differences or an association when one actually
Program utilized a Univariate ANOVA to exists. Type II errors (beta) occurs when a statis-
assess biomechanical differences between tical test infers that there are no differences
preadolescents and adolescent athletes who between groups when one actually exists. A typi-
did and did not participate in a training pro- cal established beta is 0.2 or up to 20% of the
gram [2]. time. Therefore, statistical power (1 − β) should
be greater or equal to 0.80 or able to detect differ-
ences or relationships between variables at least
negative r value would be indicative of a high Table 17.3 Interpretation of the intraclass correlation
coefficient and Cohen’s kappa coefficient
value in one variable associating to a low value of
the other variable. Pearson classification consid- Intraclass correlation
(ρI) Cohen’s kappa (κ)
ers correlation coefficients less than 0.33 “weak,”
Poor reproducibility <0.4 Marginal
those less than 0.66 “moderate,” and those greater reproducibility
than 0.66 “strong.” Fair to good ≥0.4, Good
The nonparametric equivalent to the Pearson reproducibility <0.75 reproducibility
correlations coefficient is Spearman’s rho, which Excellent ≥0.75 Excellent
reproducibility reproducibility
should be used for non-normally distributed data
or categorical data.
Regression describes the ability to predict a to determine the consistency of the results.
specific outcome variable and is expressed in a Interpretation of the intraclass correlation coef-
coefficient of determination (R2). In comparison ficient (ICC or ρI) is defined in Table 17.3.
to correlation, a regression analysis has an Cohen’s kappa coefficient (κ) is a measure of
intended outcome variable that is explained by interrater agreement for categorical data. Kappa
one or more predictor variables. An outcome is usually used as a measure of reproducibility of
variable is the variable singled out to be predicted
by the other variables, often called the dependent
variable. Predictor variables, or independent vari-
ables, are variables used to make the prediction. Clinical Vignette 3
The use of one predictor is known as simple lin- In the study The Reliability of Assessing
ear regression compared to the utility of multiple Radiographic Healing of Osteochondritis
predictors, known as multiple regression. The Dissecans of the Knee, ICC values were
coefficient of determination ranges from 0 to 1, used to determine interrater and intrarater
where a higher value is indicative of a greater reliability of assessing OCD healing on
proportion (%) of variance explained and a better plain radiographs [3]. In this study, multi-
predictive value. Both simple linear and multiple ple surgeons evaluated the radiographic
regression are used when the outcome variable is healing of the knee at two time points, min-
continuous. When the outcome variable is imally 1 month apart. Healing of OCD
categorical (often dichotomous) or not normally lesions was found to have excellent inter-
distributed, logistic regression should be used. rater reliability (ICC = 0.94), indicating
high agreement among the raters.
qualitative outcomes between repeated assess- cance does not always mean clinical
ments of the same variable. Guidelines for evalu- meaningfulness.
ating kappa are in Table 17.3.
race. Other less common types of data are count between the mild and moderate level of an
data and censored data. Count data is made up of injury severity variable may not be the same as
whole numbers that represent counts such as the the difference between the moderate and severe
number of falls in a year of follow-up or the num- level. For both types of discrete data, each
ber of heartbeats per minute. While there are observation must belong to exactly one level of
analyses created specifically for count data, it is the discrete variable, and the levels should
often treated as continuous data for simplicity. cover all possible values that exist in the data
Censored data, such as number of years till death set. For example, if the discrete variable race
after having a certain procedure, occurs when the has levels black, white, and other, then each
event researchers are interested in (such as death) observation in the data set must be categorized
may not occur during the study. Another type of as black, white, or other. If a discrete variable
data, longitudinal data, occurs when measure- has only two levels, then it is called a dichoto-
ments are taken repeatedly from subjects over a mous variable. Examples include gender and
period of time. disease status (the disease is either present or
A continuous variable is one whose values can not present).
be any real number. It is meaningful to measure
the distance between values, and arithmetic oper-
ations such as addition and multiplication make 18.3 Data Description
sense for continuous variables but not for discrete
variables. Technically, all continuous variables Summarizing discrete data is simpler than sum-
are measured discretely since we don’t have marizing continuous data. Discrete data is often
instruments that can be measured continuously. described by reporting the frequency and propor-
One can think of continuous data as discrete with tion (or percent) of people belonging to each
lots of levels or categories. For example, blood level of the discrete variable. For example, say
pressure is typically measured to 2 mmHg you were reporting on disease severity in a study
because measuring with higher precision would of 50 people. Your description of the variable
be difficult with the instruments that are used. “disease severity” could be 23 (46%) mild, 12
However, we still treat blood pressure as continu- (24%) moderate, and 15 (30%) severe if 23 peo-
ous since there would be far too many levels to ple in the study had mild disease, 12 had moder-
treat it as discrete. Sometimes discrete variables ate, and 15 had severe. If the variable is
with many levels such as the visual analog scale dichotomous, then it is acceptable to report only
(VAS) for measuring pain or variables measured the frequency and proportion in one level of that
with the Likert scale are treated as continuous for variable. For example, if researchers were sum-
ease of analysis. When treating discrete variables marizing the dichotomous variable “gender” and
as continuous, you are assuming that each level putting it in a table of demographic information
of the variable is equidistant apart. Another for a study, then they could simply report the fre-
example of a continuous variable is the Lysholm quency and proportion of women in the study
scale for assessing ACL injuries which gives a sample. Researchers would not need to include
score from 0 to 100 with higher scores indicating the number of men in the study since this can be
fewer symptoms. deduced by subtracting the number of women in
There are two major types of discrete data: the study from the sample size. Some researchers
nominal and ordinal. Nominal data has no depict the proportions of subjects in each level of
inherent ordering such as gender, race, and a discrete variable in a bar graph. This may be
marital status. Ordinal data can be ordered from appropriate if the publication has no other fig-
low to high such as injury severity, level of edu- ures. However, when other figures are present,
cation, and household income. The differences graphing proportions is superfluous as they are
between the levels of ordinal variables are not described adequately by frequencies and propor-
necessarily equal. For example, the difference tions alone.
18 The Nature of Data 165
thus the average distance from the mean. The are farther in absolute value than observations
range is simply the maximum (largest) value above the median. Left skewed data has a long
minus the minimum (smallest) value. The inter- tail to the left since most of the values are
quartile range (IQR) is the third quartile minus large and there are a few extreme observations
the first quartile. The IQR describes the spread of that are much smaller than the rest. The mean
the central 50% of the data. For all measures of is greater than the median in right skewed data
spread, a higher value indicates that observations and less than the median in left skewed data.
are more spread around the mean and smaller There is a skewness index that measures the
values indicate they are more tightly clustered degree of skewness in the data. The index is
about the mean. zero if the data is symmetric, greater than zero
if the data is right skewed, and less than zero if
the data is left skewed [3]. A normal distribu-
Fact Box 18.1 tion is a symmetrical hill or bell shape with the
majority of the values close to the central
value (the mean) and a few extreme observa-
å
n
x
i =1 i
Mean = tions on either side of the mean (i.e., in the
n. tails of the distribution). A bimodal distribu-
n +1 tion looks like the two humps of a camel; it
Position of median for odd n =
2 has two central values. When the data is sym-
metric, the best numerical summaries are the
n æn ö
+ ç + 1÷ mean and standard deviation. When the data is
2 è2 ø skewed, it is best to use the median and inter-
Position of median for even n =
2 quartile range or interquartile deviation (half
of the interquartile range).
Measures of the shape of a distribution
describe the overall trend of the data. As an
example, a distribution could have mostly Fact Box 18.2
large values with a few extreme outliers, or it
å (x - x)
n
could have values evenly distributed across the 2
i =1 i
range. Some examples of distribution shapes Variance =
n -1
are symmetric, normal, bimodal, and left or
right skewed. The mean, median, and mode of Standard deviation = variance
a continuous variable are equal if the distribu-
tion of its values is symmetric. In terms of IQR = Q3 − Q1
symmetric data, the relative position of obser-
vations is the same on either side of the
median. Right skewed (or positively skewed)
data occurs when observations above the Kurtosis is another measure of shape that
median are farther in absolute value than describes how flat or steep the distribution of val-
observations below the median. Another way ues is compared to a bell-shaped (or normal) dis-
of saying this is that the distribution has a long tribution. If there are a lot more observations in
tail to the right. In right skewed data, the the tails of a distribution compared with a normal
majority of the values are relatively small and distribution, then the graph appears flatter than a
close together, and a minority of the values are bell shape. If there are many fewer observations
extreme or much larger in value than the rest. in the tails of a distribution compared with a nor-
Left skewed (or negatively skewed) data mal distribution, then the graph appears more
occurs when observations below the median peaked than a bell shape [2].
18 The Nature of Data 167
18.4 Visual Displays along the median will result in each half being a
mirror image of the other. A common example of
It is good practice to plot the data before sum- a symmetric distribution is a hill or bell-shaped
marizing and performing statistical tests on it. distribution. Figure 18.2 shows a histogram for
This will give the researcher a sense of the type the 15 BMI measurements in the last example.
of data available. There are various methods for The width of the rectangles in this histogram
describing and analyzing data, and which method is five observations, and the vertical axis is the
to be used depends on the nature of the data. frequency of occurrences. The x-axis shows the
A stem-and-leaf plot is a simple visual display range of values for each rectangle. The histogram
of data points that shows the distribution or shape shows the general hill-shaped trend of the data:
of the values of a continuous variable. An advan- most of the data (9 observations) fall within the
tage of this plot is that it includes the value of range of 23–28 kg/m . This histogram shows a
2
each individual observation. This plot is appro- similar shape as the stem-and-leaf plot turned on
priate when there are a small number of observa- its side. If the width of the rectangles is too large,
tions. As an example, suppose there is a small important information about the shape of the dis-
sample of 15 subject’s BMI measurements that tribution can be lost. The smaller the width of the
have been rounded to the nearest integer. BMI is rectangles, the more detail about the shape of the
a continuous variable and its units are kg/m2. The distribution will be shown. Most statistical pro-
first step of making a stem-and-leaf plot of these grams will automatically choose a width that is
values would be to list them in order: appropriate for the data.
Another visual display for continuous data is
the box plot. The box plot shows the interquartile
18,19, 23, 24, 24, 24, 25, 25, 26, 26, 27, 28, 30, 32, 37
range, the median, and any extreme observations
(i.e., observations that have values that are much
The stem of the plot is made up of the leading larger or much smaller than the rest of the data).
number, and the leaves are made up of the trailing If there is a lot of variability in the data, then the
number. Both the numbers in the stem and in the box and whiskers will be elongated. If there is not
leaves are ordered smallest to largest. a lot of variability, then the box and whiskers will
1 89
| appear squatter. Figure 18.3 shows a box plot of
the BMI example data.
2 | 3444556678
The first quartile of the BMI data is the bottom
3 | 027 line of the box, the median is the middle line in the
It can be seen from this plot that the BMI mea- box, and the third quartile is the top line of the box.
surements are distributed in roughly a hill shape: When the third quartile is farther from the median
most of the values are in the middle and there are than the first quartile, the data is right skewed, and
a few in either tail. If there are more observations,
more than one line can be added for each digit in 10
the stem. 9
A histogram is a graph that shows the shape of 8
7
the distribution of values of a continuous vari-
6
able. The horizontal (or x) axis has the values of 5
the variable, and the vertical (or y) axis has the 4
frequency or proportion of observations. The 3
2
height of each rectangle represents the proportion 1
or frequency of observations whose values fall 0
within the range specified by the width of the [18, 23] [23, 28] [28, 33] [33, 38]
rectangle. If the distribution of values of a vari- Fig. 18.2 Histogram of a sample of 15 BMI
able is symmetric, then cutting the histogram measurements
18 The Nature of Data 169
when the first quartile is farther from the median, of axes. This allows direct comparison of the
the data is left skewed. The histogram and box plot distributions of the continuous variable across
of the BMI data in Fig. 18.3 show a slight right various levels of the categorical variable. As an
skew in the shape of the distribution. The whiskers example, consider the BMI data again, but sup-
of the box plot are drawn to the smallest and largest pose there is information on whether the subjects
observations in the sample that are not outliers. were over 25 years old or under 25 years old.
Outliers are defined as values that are greater than Figure 18.4 is an example of a way to visualize
Q3 + 1.5*(IQR) or less than Q1 – 1.5*(IQR). The the relationship between a continuous variable
dots in the box plot are extreme outliers, which are (BMI) and a categorical variable (age category).
defined to be larger than Q3 + 3*(IQR) or smaller From Fig. 18.4 it can be seen that subjects
than Q1 – 3*(IQR) [3]. who are over 25 years old have a higher BMI than
The box plot is a good visual display to use people who are 25 years old or younger.
when comparing a continuous variable between A scatter plot is useful for understanding the
different groups of a categorical variable since relationship between two continuous variables
they can be plotted side-by-side on the same set and revealing potential outliers. Values of one
variable are plotted on the horizontal axis, and
values of the other variable are on the vertical
39
axis. A scatter plot is a quick way to discover
37 potential trends in the data. For example, if higher
35
values of one variable tend to occur with higher
values of the other, then the scatter plot will show
33 this positive relationship. If there is no relation-
31 ship between the two variables, then the scatter
plot will show a random scattering of points that
29
don’t indicate any specific pattern. If most of the
27 points are clustered tightly together, while one or
25
two points are clearly outside of this cluster, then
these points are potential outliers and should be
23 checked for accuracy. Figure 18.5 is an example
21 of a scatter plot using the BMI data. A second
variable, age, has been added to the vertical axis.
19
Figure 18.5 shows that as age increases so
17 does BMI. In other words, there is a positive
relationship between age and BMI. An example
Fig. 18.3 Box plot of 15 BMI measurements
of an outlier for this data is the point (32, 20),
i.e., the point with BMI = 32 and age = 20. While
40
35 45
40
30
35
BMI
25 30
Age
20 25
20
15
15
10 10
25 or younger Over 25 15 20 25 30 35 40
BMI
Fig. 18.4 Side-by-side box plots of 15 BMI
measurements Fig. 18.5 Scatter plot of 15 subjects’ age and BMI
170 C. Smith
the point is medically feasible, if it were in the statistics and visual displays. Different types of
plot in Fig. 18.5, we would want to check on its data have different properties, and these proper-
accuracy, because it is so far away from the ties determine which statistical tests are appro-
increasing trend of the rest of the points. Two priate for answering the questions of interest.
continuous variables could also have a negative Statistical tests are based on probability theory
relationship if it were the case that as one and allow the researchers to draw conclusions
increased the other decreased. If the scatter plot about a population based on a sample from that
appears to show a positive relationship for some particular population. This is called statistical
values and a negative relationship for others, we inference and is the overarching goal of statisti-
would say the relationship appears to change cal analysis.
direction. A statistic called a correlation coeffi-
cient classifies the strength of the association
between two continuous variables. References
1. D’Agostino RB, Sullivan LM, Beiser AS. Introductory
applied biostatistics. Belmont: Thomson Brooks/
18.5 Conclusion Cole; 2006.
2. Daniel WW, Cross CL. Biostatistics: a foundation
The first steps of data analysis should be to for analysis in the health sciences. 10th ed. Hoboken:
determine what types of variables are present Wiley; 2013.
3. Rosner B. Fundamentals of biostatistics. 8th ed.
and to describe them with appropriate summary Boston: Cengage Learning; 2016.
Does No Difference Really Mean
No Difference?
19
Carola F. van Eck, Marcio Bottene Villa Albers,
Andrew J. Sheean, and Freddie H. Fu
the anatomic location of the problem [76]. This However, the surgeons implementing these PRO
invention was soon followed by the develop- began to observe a variety of postoperative
ment of angiography which could not only problems, including loss of knee range of
visualize the location and severity of the prob- motion [16], impingement of the new ACL graft
lem but also allowed intervention to be per- [13, 24, 25, 52], and failure of the reconstruc-
formed [6]. More recently, innovation within tion requiring revision surgery [29]. In mid- to
the field of cardiology catalyzed the integration long-term follow-up, a large number of patients
of high-resolution computed tomography (CT) were found to have developed osteoarthritis
scanning to more precisely understand the anat- [15]. In this respect, the limitations of early
omy of the heart’s chambers, valves, and blood PRO partially obscured an adequate apprecia-
vessels [54]. This noninvasive test can be per- tion for some of the complications of ACL
formed in 10 s and allows detection of coronary reconstruction. In these respects, early out-
artery disease involving plaques as small as comes demonstrated that the state of the art in
1.5 mm [77]. This highly specialized applica- ACL reconstruction left much to be desired.
tion of CT technology has excelled research
efforts focusing on treatment of coronary artery
disease and other cardiac conditions. However, 19.4 W
hy Our Outcomes Are
within orthopedic surgery, this standard has not So Good
yet been paralleled.
Despite surgeons observing the loss of knee
range of motion [16], impingement [13, 24, 25,
19.3 Mission Accomplished 52], graft failure, and early osteoarthritis [29]
in Orthopedic Surgery? aforementioned, orthopedic publications contin-
ued to report good results after transtibial ACL
Anterior cruciate ligament (ACL) injury is one reconstruction. The discrepancy between the
of the most widely studied topics within the poor observed outcomes and the good reported
field of orthopedic sports medicine. Some of the outcomes is likely due to the widespread use of
earliest described techniques for reconstructing instruments that were not truly sensitive to actual
the torn ligament involved a large arthrotomy outcomes and the flawed interpretations that fol-
[42]. However, as with all modern surgery, min- lowed. As recently as 2011, a poll of orthopedic
imally invasive surgical techniques were intro- surgeons demonstrated that 83% of respondents
duced in knee surgery, leading to the based their assessment of surgical outcomes
development of arthroscopically assisted ACL solely on whether the patient was satisfied, rather
reconstruction [53, 59, 64]. Arthroscopic ACL than on KT arthrometer, pivot shift results, or
reconstruction was first performed using a two- long-term clinical follow-up [62]. In spite of
incision technique, in which the femoral bone technically imperfect surgeries, acceptable
tunnel was drilled from the outside-in [53, 64]. patient satisfaction scores were observed.
Over time, a one-incision technique was However, it is entirely possible that this phenom-
adopted, where the femoral bone tunnel is enon is attributable to placebo effect, which has
drilled from inside-out, through the tibial tunnel been shown to exist in approximately 35% of
(transtibial technique) [17]. Both techniques patients. Moreover, it has been shown that
were fast and efficient; unfortunately, neither 50–70% of patients will heal or cope regardless
technique afforded focus on native ACL anat- of the nature and quality of the treatment ren-
omy [33]. Outcome-based research focused on dered [73]. Thus, in the absence of accurate and
these minimally invasive techniques relied precise measures of outcomes, a true understand-
mostly on subjective patient-reported outcome ing of the actual effect of treatment can easily be
measures (PRO), which suggested good results. obscured.
19 Does No Difference Really Mean No Difference? 173
biomechanical studies, this should ideally have detect knee-related disability, there is consider-
been evaluated (Fig. 19.1) [2]. able variability in the types of patients that each
A number of outcome tools—clinician-based of these measures were intended to assess
and patient-reported—have been described to (patients with knee osteoarthritis versus active,
characterize the nature of patients’ knee function athletic patients with non-arthritic knee condi-
[9, 19, 31, 51, 55, 61]. While many of these tions) [39]. It should be noted that patient activity
instruments have been validated in their ability to level is an important prognostic variable, which
a b
Medial Lateral
Fig. 19.1 (a) Pressure map of the articular cartilage of a graph of a left knee after ACL reconstruction showing
left knee after nonanatomic ACL reconstruction showing medial compartment osteoarthritis with joint space nar-
increased pressures (red and black color) in the medial rowing. (c) Long cassette radiograph of the same patient
compartment. (b) Plain flexion posterior-anterior radio- showing resulting varus malalignment [2]
19 Does No Difference Really Mean No Difference? 175
does not always correlate to symptoms and func- related symptoms and functions. It is frequently
tion [39]. Consequently, studies should be used in clinical research [21, 35].
designed in a way such that the outcome mea- The Single Assessment Numeric Evaluation
sures selected as dependent variables should be (SANE) was designed specifically for college-
specific to the type of patients being studied. age patients after ACL reconstruction [75]. It is
Thus far, this type of patient-specific outcome very simple and involves just one question, ask-
measure selection has been lacking in certain dis- ing a patient on a 0–100% scale how much of a
ciplines of orthopedic research [41]. percentage of normal they would rate their knee.
The International Knee Documentation Though application is easy, it is only useful
Committee (IKDC) has developed two rating when looking at a homogeneous cohort of
scales, one “objective” and one “subjective” patients who would interpret this one question
[20]. The first is clinician-based and grades similarly [39, 75].
patients as normal, nearly normal, abnormal, or The Knee Injury and Osteoarthritis Outcome
severely abnormal on a variety of parameters Score (KOOS) is another patient-related mea-
that include effusion, motion, ligament laxity, sure. It consists of 5 separate scores: 9 questions
crepitus, harvest- site pathology, radiographic for pain, 7 questions for symptoms, 17 questions
findings, and single-leg hop test. The final patient for activities of daily living, 5 questions for sport
grade is determined by the lowest grade in any and recreational function, and 4 items for quality
given group. The subjective one is patient- of life [57]. The KOOS has been employed to
reported and inquires about symptoms, sports evaluate ACL reconstruction but also meniscec-
activities, and ability to function, including tomy, tibial osteotomy, and post-traumatic arthri-
stairs, squatting, running, and jumping. It has tis. The other benefit is that it has been validated
been demonstrated to be reliable, valid, and in multiple languages [56, 58, 74, 78].
responsive when applied to a range of knee con- The quality-of-life outcome measure for
ditions, including ACL tears as well as meniscus chronic ACL deficiency was developed with
and cartilage pathology [26, 27]. input from ACL-deficient patients, primary care
The Cincinnati Knee Rating System combines sports medicine physicians, orthopedic sur-
clinician-based evaluation with patient-reported geons, athletic therapists, and physical therapists
symptoms and function [4, 63]. It is composed of [39]. It consists of 31 visual analog questions
6 subscales that add up to 100 points: 20 for relating to 5 categories: symptoms and physical
symptoms, 15 for daily and sports functional complaints, work-related concerns, recreational
activities, 25 for physical examination, 20 for activities and sports participation, lifestyle, and
knee stability testing, 10 for radiographic find- social and emotional health status relating to the
ings, and 10 for functional testing [3]. It is most knee [39, 45].
often employed to evaluate ACL injuries before There are several instruments used to charac-
and after reconstruction and is proven to be reli- terize patients’ level of physical activity. The
able, valid, and responsive [40, 55]. Tegner is probably the most popular in orthope-
The modified Lysholm scale is a patient- dic publications. It aims to place a patient’s
related measure designed to evaluate outcomes activity level somewhere on a 0–10 scale based
after knee ligament surgery [36]. It is a question- on their specific sport [68, 78]. However,
naire with eight items scaled to a maximum score although commonly used, this instrument has
of 100. Knee stability accounts for 25 points, not been officially validated [41]. The Marx
pain for 25, locking for 15, and swelling and stair activity level is based on function-specific, rather
climbing for 10 each. In addition, a limp, use of a than sport- specific, questions, and it incorpo-
support, and squatting accounts for 5 points each rates the frequency of activities as well [41]. The
[68]. It was originally developed in 1982 and scale consists of four questions, evaluating run-
later modified in 1985. This Lysholm score is one ning, cutting, decelerating, and pivoting. Patients
of the first outcome measures to rely on patient- are asked to score frequency on a 0–4 scale for
176 C. F. van Eck et al.
each element for a total of 16 points. In contrast patient’s compliance. The final grade attributed to
to the Tegner scale, the Marx scale has been vali- amount of instability also results from a subjec-
dated [41]. tive judgment of the examiner. During the pre-
course of ISAKOS, Osaka, in 2009, five selected
experts were invited to perform a pivot shift test
Clinical Vignette on a cadaveric lower body specimen. This setup
A 35-year-old female presents to the office was repeated at the Panther Global Summit in
complaining of right knee pain. She under- Pittsburgh in 2011 with 12 expert surgeons on an
went right knee ACL reconstruction using a actual ACL-injured patient under general anesthe-
transtibial single-bundle technique 15 years sia (Fig. 19.3). During the exam, the pivot shift
ago. She has had no new injury. She states was quantified with an accelerometer. The results
the knee feels stable. Examination in the of this experiment showed no objective agreement
office appears to confirm this with a 1A [34]. This is partly due to differences among
Lachman, negative anterior drawer, and examiners, but also because the pivot shift is
guarding pivot shift test. She undergoes influenced not only by the ACL but also by the
imaging with plain radiographs demon- iliotibial band, capsule, medial meniscus, lateral
strating a vertical tunnel orientation as well meniscus, and bony morphology [48–50]. To
as an MRI indicating the ACL graft is intact make the pivot shift more objective, its use in con-
but vertically oriented (Fig. 19.2). In addi- junction with an accelerometer has been advo-
tion, it reveals a degenerative medial cated and validated (Fig. 19.4) [1, 47].
meniscal tear. A CT scan was also ordered Further objective outcome measures are cur-
which reveals nonanatomic tunnel location rently being investigated and implemented in
(Fig. 19.2). She elects to undergo surgery order to deal with variability in some of the more
for the meniscus tear. Examination under widely reported physical exam measures.
anesthesia reveals an exam very different Dynamic stereo in vivo radiography can be used
from that in the office with a 2A Lachman, for detailed kinematic analysis (Fig. 19.5) [65–
1+ anterior drawer, and 2+ pivot shift test. 67]. Three-dimensional computed tomography
Arthroscopy reveals a vertical graft, degen- scanning is very valuable for evaluation of ACL
erative medial meniscal tear, and advanced tunnel placement [10, 32, 33]. Magnetic reso-
degenerative changes of the medial com- nance imaging can be used for a variety of differ-
partment (Fig. 19.2). This clinical case ent purposes following ACL reconstruction
illustrates that when insufficient or inap- (Fig. 19.6) such as evaluations of graft integrity,
propriate outcome tools are used, clinical graft healing, inclination angle, tunnel position,
outcome is not accurately and reliably as well as the status of the menisci and articular
measured. cartilage [7, 44].
a b
c d
Fig. 19.2 A 35-year-old female, 15 years after right ACL showing the ACL graft is intact but again vertical in orienta-
reconstruction, presents with knee pain. Examination in the tion. (c) Three-dimensional CT scan of the same right knee
office appeared to suggest a stable knee. (a) Plain radio- confirming nonanatomic tunnel position. (d) Arthroscopy of
graphs of a right knee showing status post-ACL reconstruc- that knee showing advanced degenerative changes in the
tion with a vertical tunnel orientation suggestive of medial compartment, frequently seen with residual rotatory
nonanatomic tunnel placement. (b) MRI of the right knee instability after nonanatomic ACL reconstruction
as an attempt to dichotomize the results as nor- This fact may also be attributable to the concern
mal/nearly normal versus abnormal/severely that some of the components of the IKDC grad-
abnormal to allow for easier statistical analysis. ing system, including the pivot shift, are subjec-
178 C. F. van Eck et al.
Fig. 19.3 Panther Global Summit in Pittsburgh in 2011 magnetic tracking system. The results of this experi-
with 12 expert surgeons performing the pivot shift test ment showed no objective agreement in the pivot shift
on a left knee of an actual ACL-injured patient under grade among the experts and significantly improved
general anesthesia. During the exam, the pivot shift was agreement after instruction of a “standardized pivot
quantified with an accelerometer, iPad app, and electro- shift test” [34]
Fig. 19.4 Examination
of a left knee showing
the application of an
accelerometer and iPad
application to quantify
the pivot shift
examination by the
surgeon. Markers on the
skin are used which are
tracked with the iPad
application [1, 47]
19 Does No Difference Really Mean No Difference? 179
a c
Fig. 19.5 In vivo dynamic stereo radiography of a left track the distance and motion pattern between the tibia
knee. (a) The patient runs on a treadmill, while real-time and femur as the patient runs and loads the knee. (c)
radiography is performed from orthogonal angles. (b) Subsequent map of the knee outlining the distance and
Videos are overlapped with a three-dimensional computed motion between the tibia and femur in real time [65–67]
tomography scan of the knee such that it is possible to
a b c
Fig. 19.6 High-resolution magnetic resonance imaging meniscus. (c) Surface mapping of the anterior horn, body,
of a left knee. (a) Sagittal sequence outlining the medial and posterior horn of the medial meniscus and articular
meniscus. (b) Three-dimensional reconstruction of the cartilage [7, 44]
magnetic resonance scan outlining the medial and lateral
tive in nature. Or perhaps some may believe that struct the ACL are not able to completely restore
“nearly normal” is good enough when assessing the knee to normal, this should be the goal of
clinical outcome after ACL reconstruction. future reconstruction techniques [28, 43]. This
However, although traditional methods to recon- point is illustrated by the case of a recent Level I
180 C. F. van Eck et al.
study that made headlines comparing the results allograft ACL reconstruction was performed in
of operative to nonoperative treatment of patients 1996 and indicated a failure rate of 3% [18].
with ACL tears. Overall, there were 121 patients However, when outcomes of the same procedure
treated with either operative or nonoperative were evaluated in 2011 with more advance out-
treatment [12]. Although randomization was per- come measures, the failure rate was shown to
formed, the crossover of patients was allowed per approximate 15% [71].
the intention to treat analysis. The authors con-
cluded that no difference existed between the Take-Home Message
treatment groups, but when looking at the data in • Finding no difference does not mean there is
more detail, in the rehabilitation-only group, no difference.
there were more meniscus injuries at final follow- • Orthopedists should be more like the
up (13 vs. 1), more abnormal Lachman tests
cardiologists.
(75% vs. 35%), more abnormal pivot shift exami- • Cardiology is leading in medicine with regard
nations (53% vs. 25%), and an increased total to research and innovation.
KT-arthrometer translation (8.3 vs. 6.6 mm). • In conducting research, it is important to form
Thus, the authors’ conclusions were based on a a sound hypothesis that allows for determina-
series of dependent variables that may not be the tion of the proper primary outcome measure
most relevant factors in ultimately determining for a particular study.
success of treatment. • The hypothesis of a research study dictates
The use of reliable and valid outcome measure which statistical methods should be applied
specific to the aim of the study is highly recom- and avoids making statistical errors.
mended when conducting a study. For example, • It is important to select the outcome tool that
for anatomic ACL reconstruction, a scoring is most suitable, reliable, valid, and sensitive
system was developed, which was validated and in detecting the primary outcome of a study as
tested for reliability and responsiveness in two indicated in hypothesis.
separate studies [8, 69]. • Failure to do so will result in drawing an
incorrect conclusion.
• Study design and the quality of the follow-up
The use of reliable and valid outcome mea- is also extremely important.
sure specific to the aim of the study is • Randomized studies with large numbers of
highly recommended when conducting a patients and at least 80% follow-up at 2 years
study. or longer are necessary to critically evaluate
new surgical techniques.
4. Bollen S, Seedhom BB. A comparison of the Lysholm 18. Harner CD, Olson E, Irrgang JJ, Silverstein S, Fu FH,
and Cincinnati knee scoring questionnaires. Am J Silbey M. Allograft versus autograft anterior cruciate
Sports Med. 1991;19(2):189–90. ligament reconstruction: 3- to 5-year outcome. Clin
5. Carey JL, Dunn WR, Dahm DL, Zeger SL, Spindler Orthop Relat Res. 1996;324:134–44.
KP. A systematic review of anterior cruciate liga- 19. Heckman JD. Are validated questionnaires valid? J
ment reconstruction with autograft compared with Bone Joint Surg Am. 2006;88(2):446.
allograft. J Bone Joint Surg Am. 2009;91(9):2242–50. 20. Hefti F, Muller W. [Current state of evaluation of knee
6. Chavez I, Dorbecker N, Celis A. Direct intracardiac ligament lesions. The new IKDC knee evaluation
angiocardiography; its diagnostic value. Am Heart J. form ]. Orthopade. 1993;22(6):351–62.
1947;33(5):560–93. 21. Hoher J, Munster A, Klein J, Eypasch E, Tiling
7. Chu CR, Williams AA, West RV, et al. Quantitative T. Validation and application of a subjective knee
magnetic resonance imaging UTE-T2* map- questionnaire. Knee Surg Sports Traumatol Arthrosc.
ping of cartilage and meniscus healing after ana- 1995;3(1):26–33.
tomic anterior cruciate ligament reconstruction. 22. Holm I, Oiestad BE, Risberg MA, Aune AK. No
Am J Sports Med. 2014;42(8):1847–56. https://doi. difference in knee function or prevalence of osteo-
org/10.1177/0363546514532227. arthritis after reconstruction of the anterior cruci-
8. Desai N, Alentorn-Geli E, van Eck CF, et al. A ate ligament with 4-strand hamstring autograft
systematic review of single-versus double-bundle versus patellar tendon-bone autograft: a randomized
ACL reconstruction using the anatomic ante- study with 10-year follow-up. Am J Sports Med.
rior cruciate ligament reconstruction scoring 2010;38(3):448–54.
checklist. Knee Surg Sports Traumatol Arthrosc. 23. Illingworth KD, Hensler D, Working ZM, Macalena
2016;24(3):862–72. JA, Tashman S, Fu FH. A simple evaluation of ante-
9. Eastlack ME, Axe MJ, Snyder-Mackler L. Laxity, rior cruciate ligament femoral tunnel position: the
instability, and functional outcome after ACL injury: inclination angle and femoral tunnel angle. Am J
copers versus noncopers. Med Sci Sports Exerc. Sports Med. 2011;39(12):2611–8.
1999;31(2):210–5. 24. Iriuchishima T, Tajima G, Ingham SJ, et al.
10. Forsythe B, Kopf S, Wong AK, et al. The location of Intercondylar roof impingement pressure after
femoral and tibial tunnels in anatomic double-bundle anterior cruciate ligament reconstruction in a por-
anterior cruciate ligament reconstruction analyzed by cine model. Knee Surg Sports Traumatol Arthrosc.
three-dimensional computed tomography models. J 2009;17(6):590–4.
Bone Joint Surg Am. 2010;92(6):1418–26. 25. Iriuchishima T, Tajima G, Ingham SJ, Shen W,
11. Foster TE, Wolfe BL, Ryan S, Silvestri L, Kaye
Smolinski P, Fu FH. Impingement pressure in the ana-
EK. Does the graft source really matter in the out- tomical and nonanatomical anterior cruciate ligament
come of patients undergoing anterior cruciate liga- reconstruction: a cadaver study. Am J Sports Med.
ment reconstruction? An evaluation of autograft 2010;38(8):1611–7.
versus allograft reconstruction results: a systematic 26. Irrgang JJ, Anderson AF. Development and validation
review. Am J Sports Med. 2010;38(1):189–99. of health-related quality of life measures for the knee.
12. Frobell RB, Roos EM, Roos HP, Ranstam J,
Clin Orthop Relat Res. 2002;402:95–109.
Lohmander LS. A randomized trial of treatment for 27. Irrgang JJ, Anderson AF, Boland AL, et al.
acute anterior cruciate ligament tears. N Engl J Med. Responsiveness of the International Knee
2010;363(4):331–42. Documentation Committee Subjective Knee Form.
13. Fujimoto E, Sumen Y, Deie M, Yasumoto M, Kobayashi Am J Sports Med. 2006;34(10):1567–73.
K, Ochi M. Anterior cruciate ligament graft impinge- 28. Irrgang JJ, Bost JE, Fu FH. Re: outcome of single-
ment against the posterior cruciate ligament: diagnosis bundle versus double-bundle reconstruction of the
using MRI plus three-dimensional reconstruction soft- anterior cruciate ligament: a meta-analysis. Am J
ware. Magn Reson Imaging. 2004;22(8):1125–9. Sports Med. 2009;37(2):421–2; author reply 422.
14. Gabriel MT, Wong EK, Woo SL, Yagi M, Debski 29. Johnson DL, Swenson TM, Irrgang JJ, Fu FH, Harner
RE. Distribution of in situ forces in the anterior cruci- CD. Revision anterior cruciate ligament surgery:
ate ligament in response to rotatory loads. J Orthop experience from Pittsburgh. Clin Orthop Relat Res.
Res. 2004;22(1):85–9. 1996;325:100–9.
15. Gillquist J, Messner K. Anterior cruciate ligament 30. Keays SL, Newcombe PA, Bullock-Saxton JE,
reconstruction and the long-term incidence of gonar- Bullock MI, Keays AC. Factors involved in
throsis. Sports Med. 1999;27(3):143–56. the development of osteoarthritis after anterior
16. Harner CD, Irrgang JJ, Paul J, Dearwater S, Fu FH. Loss cruciate ligament surgery. Am J Sports Med.
of motion after anterior cruciate ligament reconstruc- 2010;38(3):455–63.
tion. Am J Sports Med. 1992;20(5):499–506. 31. Kocher MS, Steadman JR, Briggs K, Zurakowski
17. Harner CD, Marks PH, Fu FH, Irrgang JJ, Silby
D, Sterett WI, Hawkins RJ. Determinants of patient
MB, Mengato R. Anterior cruciate ligament recon- satisfaction with outcome after anterior cruciate
struction: endoscopic versus two-incision technique. ligament reconstruction. J Bone Joint Surg Am.
Arthroscopy. 1994;10(5):502–12. 2002;84-A(9):1560–72.
182 C. F. van Eck et al.
32. Kopf S, Forsythe B, Wong AK, et al. Nonanatomic single- and double-bundle anterior cruciate ligament
tunnel position in traditional transtibial single-bundle reconstruction. Arthroscopy. 2009;25(1):62–9.
anterior cruciate ligament reconstruction evaluated 47. Musahl V, Griffith C, Irrgang JJ, et al. Validation of
by three-dimensional computed tomography. J Bone quantitative measures of rotatory knee laxity. Am J
Joint Surg Am. 2010;92(6):1427–31. Sports Med. 2016;44(9):2393–8.
33. Kopf S, Forsythe B, Wong AK, Tashman S, Irrgang 48. Musahl V, Hoshino Y, Ahlden M, et al. The pivot shift:
JJ, Fu FH. Transtibial ACL reconstruction technique a global user guide. Knee Surg Sports Traumatol
fails to position drill tunnels anatomically in vivo Arthrosc. 2012;20(4):724–31.
3D CT study. Knee Surg Sports Traumatol Arthrosc. 49. Musahl V, Hoshino Y, Becker R, Karlsson J. Rotatory
2012;20(11):2200–7. knee laxity and the pivot shift. Knee Surg Sports
34. Kopf S, Musahl V, Perka C, Kauert R, Hoburg
Traumatol Arthrosc. 2012;20(4):601–2.
A, Becker R. The influence of applied internal 50. Musahl V, Rahnemai-Azar AA, Costello J, et al.
and external rotation on the pivot shift phenom- The influence of meniscal and anterolateral cap-
enon. Knee Surg Sports Traumatol Arthrosc. sular injury on knee laxity in patients with ante-
2017;25(4):1106–10. rior cruciate ligament injuries. Am J Sports Med.
35. Lukianov AV, Gillquist J, Grana WA, DeHaven
2016;44(12):3126–31.
KE. An anterior cruciate ligament (ACL) evaluation 51.
Neeb TB, Aufdemkampe G, Wagener JH,
format for assessment of artificial or autologous ante- Mastenbroek L. Assessing anterior cruciate ligament
rior cruciate reconstruction results. Clin Orthop Relat injuries: the association and differential value of ques-
Res. 1987;218:167–80. tionnaires, clinical tests, and functional tests. J Orthop
36. Lysholm J, Gillquist J. Evaluation of knee ligament Sports Phys Ther. 1997;26(6):324–31.
surgery results with special emphasis on use of a scor- 52. Nishimori M, Sumen Y, Sakaridani K, Nakamura
ing scale. Am J Sports Med. 1982;10(3):150–4. M. An evaluation of reconstructed ACL impinge-
37. Markolf KL, Feeley BT, Jackson SR, McAllister
ment on PCL using MRI. Magn Reson Imaging.
DR. Biomechanical studies of double-bundle poste- 2007;25(5):722–6.
rior cruciate ligament reconstructions. J Bone Joint 53. Passler HH. The history of the cruciate ligaments:
Surg Am. 2006;88(8):1788–94. some forgotten (or unknown) facts from Europe. Knee
38. Markolf KL, Park S, Jackson SR, McAllister
Surg Sports Traumatol Arthrosc. 1993;1(1):13–6.
DR. Simulated pivot-shift testing with sin- 54. Powell WJ Jr, Wittenberg J, Dinsmore RE, Miller SW,
gle and double-bundle anterior cruciate liga- Maturi RA. Definition of cardiac structures using com-
ment reconstructions. J Bone Joint Surg Am. puterized tomography in isolated arrested and beating
2008;90(8):1681–9. canine hearts. Am J Cardiol. 1977;39(5):690–6.
39.
Marx RG. Knee rating scales. Arthroscopy. 55.
Risberg MA, Holm I, Steen H, Beynnon
2003;19(10):1103–8. BD. Sensitivity to changes over time for the IKDC
40. Marx RG, Jones EC, Allen AA, et al. Reliability, form, the Lysholm score, and the Cincinnati knee
validity, and responsiveness of four knee outcome score. A prospective study of 120 ACL reconstructed
scales for athletic patients. J Bone Joint Surg Am. patients with a 2-year follow-up. Knee Surg Sports
2001;83-A(10):1459–69. Traumatol Arthrosc. 1999;7(3):152–9.
41. Marx RG, Stump TJ, Jones EC, Wickiewicz TL,
56. Roos EM, Ostenberg A, Roos H, Ekdahl C,
Warren RF. Development and evaluation of an activ- Lohmander LS. Long-term outcome of meniscec-
ity rating scale for disorders of the knee. Am J Sports tomy: symptoms, function, and performance tests
Med. 2001;29(2):213–8. in patients with or without radiographic osteoarthri-
42. Mayo Robson AW. Ruptured crucial ligaments and tis compared to matched controls. Osteoarthr Cartil.
their repair by operation. Ann Surg. 1903;37(5):716–8. 2001;9(4):316–24.
43. Meredick RB, Vance KJ, Appleby D, Lubowitz
57. Roos EM, Roos HP, Lohmander LS, Ekdahl C,
JH. Outcome of single-bundle versus double- Beynnon BD. Knee Injury and Osteoarthritis
bundle reconstruction of the anterior cruciate Outcome Score (KOOS)—development of a self-
ligament: a meta-analysis. Am J Sports Med. administered outcome measure. J Orthop Sports Phys
2008;36(7):1414–21. Ther. 1998;28(2):88–96.
44. Miyawaki M, Hensler D, Illingworth KD, Irrgang JJ, 58.
Roos EM, Roos HP, Ryd L, Lohmander
Fu FH. Signal intensity on magnetic resonance imag- LS. Substantial disability 3 months after
ing after allograft double-bundle anterior cruciate arthroscopic partial meniscectomy: a prospective
ligament reconstruction. Knee Surg Sports Traumatol study of patient- relevant outcomes. Arthroscopy.
Arthrosc. 2014;22(5):1002–8. 2000;16(6):619–26.
45. Mohtadi N. Development and validation of the quality 59. Rosenberg T. Techniques for ACL reconstruction
of life outcome measure (questionnaire) for chronic with Multi-Trac drill guide. Mansfield: Accufex
anterior cruciate ligament deficiency. Am J Sports Microsurgical Inc.; 1994.
Med. 1998;26(3):350–9. 60. Samuelsson K, Andersson D, Karlsson J. Treatment
46. Morimoto Y, Ferretti M, Ekdahl M, Smolinski P, Fu of anterior cruciate ligament injuries with special ref-
FH. Tibiofemoral joint contact area and pressure after erence to graft type and surgical technique: an assess-
19 Does No Difference Really Mean No Difference? 183
ment of randomized controlled trials. Arthroscopy. mented knee laxity tests. Knee Surg Sports Traumatol
2009;25(10):1139–74. Arthrosc. 2013;21(9):1989–97.
61. Sernert N, Kartus J, Kohler K, et al. Analysis of sub- 71. van Eck CF, Schkrohowsky JG, Working ZM, Irrgang
jective, objective and functional examination tests JJ, Fu FH. Prospective analysis of failure rate and pre-
after anterior cruciate ligament reconstruction. A fol- dictors of failure after anatomic anterior cruciate liga-
low-up of 527 patients. Knee Surg Sports Traumatol ment reconstruction with allograft. Am J Sports Med.
Arthrosc. 1999;7(3):160–5. 2012;40(4):800–7.
62.
Sgaglione NA. Revision ACL reconstruction. 72. van Eck CF, van den Bekerom MP, Fu FH, Poolman
Presented at Orthopedics Today Hawaii 2011. Jan RW, Kerkhoffs GM. Methods to diagnose acute
16–19 Koloa, Hawaii; 2011. anterior cruciate ligament rupture: a meta-analysis
63. Sgaglione NA, Del Pizzo W, Fox JM, Friedman
of physical examinations with and without anaes-
MJ. Critical analysis of knee ligament rating systems. thesia. Knee Surg Sports Traumatol Arthrosc.
Am J Sports Med. 1995;23(6):660–7. 2013;21(8):1895–903.
64. Snook GA. A short history of the anterior cruciate 73. Vavken P. Rationale for and methods of superior-
ligament and the treatment of tears. Clin Orthop Relat ity, noninferiority, or equivalence designs in ortho-
Res. 1983;172:11–3. paedic, controlled trials. Clin Orthop Relat Res.
65. Tashman S, Anderst W. In-vivo measurement of
2011;469(9):2645–53.
dynamic joint motion using high speed biplane radi- 74. W-Dahl A, Toksvig-Larsen S, Roos EM. A 2-year
ography and CT: application to canine ACL defi- prospective study of patient-relevant outcomes in
ciency. J Biomech Eng. 2003;125(2):238–45. patients operated on for knee osteoarthritis with tibial
66. Tashman S, Collon D, Anderson K, Kolowich P,
osteotomy. BMC Musculoskelet Disord. 2005;6:18.
Anderst W. Abnormal rotational knee motion during 75. Williams GN, Taylor DC, Gangel TJ, Uhorchak JM,
running after anterior cruciate ligament reconstruc- Arciero RA. Comparison of the single assessment
tion. Am J Sports Med. 2004;32(4):975–83. numeric evaluation method and the Lysholm score.
67. Tashman S, Kolowich P, Collon D, Anderson
Clin Orthop Relat Res. 2000;373:184–92.
K, Anderst W. Dynamic function of the ACL- 76. Wilson FN, Johnston FD, Hill IGW, Macleod AG,
reconstructed knee during running. Clin Orthop Relat Barker PS. The significance of electrocardiograms
Res. 2007;454:66–73. characterized by an abnormally long QRS interval
68. Tegner Y, Lysholm J. Rating systems in the evalua- and by broad S deflections in Lead I. Am Heart J.
tion of knee ligament injuries. Clin Orthop Relat Res. 1934;9:459.
1985;198:43–9. 77. Wood EH, Ritman EL, Robb RA, Harris LD,
69. van Eck CF, Gravare-Silbernagel K, Samuelsson
Ruegsegger P. Noninvasive numerical vivisection of
K, et al. Evidence to support the interpretation anatomic structure and function of the intact circula-
and use of the anatomic anterior cruciate ligament tory system using high temporal resolution cylindri-
reconstruction checklist. J Bone Joint Surg Am. cal scanning computerized tomography. Med Instrum.
2013;95(20):e1531–9. 1977;11(3):153–9.
70. van Eck CF, Loopik M, van den Bekerom MP, Fu FH, 78. Wright RW. Knee injury outcomes measures. J Am
Kerkhoffs GM. Methods to diagnose acute anterior Acad Orthop Surg. 2009;17(1):31–9.
cruciate ligament rupture: a meta-analysis of instru-
Power and Sample Size
20
Stephen Lyman
20.3 Why Is Power Needed? 3 agree), 67% accuracy, and 100% accuracy.
Obviously, a much larger sample size would be
Statistical power is necessary anytime you want necessary to establish meaningful estimate of
to test hypotheses: to determine if there is a sta- ultrasound diagnostic utility in this setting. An
tistically significant difference between groups or entire literature has been developed evaluating
a statistically significant relationship between the sample size requirements of reliability studies
two variables [4]. Statistical power determines [3, 7, 8, 10].
your ability to detect a difference if one truly
exists.
This rationale is both scientific and philosoph- 20.4 Properties of Power
ical. When conducting research, the project is
only worth performing if it is possible to reject Statistical power is customarily represented as a
the null hypothesis. Underpowered research is percentage between 0 and 100% or sometimes as
less likely to be published or to contribute mean- a proportion between 0 and 1. It tells you what
ingfully to improving our understanding of the your chances are of missing a true finding. Power
physical world. Therefore, it is unethical (philo- is calculated as 1 − β (beta) where β is the Type
sophical argument) to perform underpowered II error or the likelihood of rejecting the alterna-
studies, because humans are being subject to tive hypothesis if true. If β were 0.2, then 1 − β
unnecessary experimentation and risk of harm. would be 0.80, or 80% power, to detect a differ-
Furthermore, a null finding from an underpow- ence if it truly exists. For most clinical studies,
ered study may incorrectly be interpreted as evi- 80% power is considered the lowest acceptable
dence that the medical intervention studied has power value since this means you have just 1 in 5
no benefit. chance of missing a true finding. When consider-
ing interventions or testing hypotheses for which
there are serious consequences, a power of 90%
Fact Box 20.2 (1 in 10 chance of missing a true finding) or even
It is unethical to perform underpowered higher may be warranted.
studies, because humans are being subject Let’s imagine an established surgical proce-
to unnecessary experimentation and risk of dure that is highly effective and relatively inex-
harm. pensive but has high risk of adverse events (e.g.,
perioperative fracture). A new alternative surgi-
cal technique is believed to be both effective and
safer than the established procedure but costs
If you are not formally testing hypotheses, substantially more. In comparing these two surgi-
then statistical power is not strictly necessary. cal interventions, we would not want to miss a
However, if you are interested in assessing the
correlation between two variables, then you still
need adequate sample size. For example, if you Fact Box 20.3
were evaluating whether ultrasound could diag- Power tells you what your chances are of
nose a collateral ligament tear as well as a more missing a true finding and is determined by
expensive imaging modality, then you need a sample size, variability, frequency, the crit-
sample size large enough to calculate a reliable ical p-value, and the minimum relevant
estimate of the ability of ultrasound to correctly effect size. The best approach is to set your
diagnose the tear. A study of three subjects evalu- p-value and effect size and estimate your
ated with both MRI and ultrasound would be variability and/or frequency and then cal-
inadequate to answer this question. There are culate the sample size you need for 80% (or
only 4 possible findings: 0% accuracy (0 of 3 90% power).
ultrasounds agree with MRI), 33% accuracy (1 of
188 S. Lyman
true treatment effect if it exists, so we might con- found implications for statistical power for a
sider powering our study to 90% or higher. If we study considering sex differences.
missed a true treatment effect in the new tech- The p-value represents your chance of a
nique’s favor, it might not be adopted into clinical detected effect not being true. A critical p-value
practice due to the high costs despite being safer of 0.05 is usually considered acceptable for clini-
and more effective. cal research. This represents less than a 1 in 20
Power is determined by sample size, variabil- chance of falsely rejecting the null hypothesis if
ity, frequency, the critical p-value, and the mini- the null hypothesis is true. A smaller p-value may
mum relevant effect size. Adjusting any of these be desired if 1 in 20 seems too large an uncer-
characteristics changes the statistical power. tainty. In that case a critical p-value of 0.01 (1 in
Sample size is what most scientists think of 100) or even 0.001 (1 in 1000) may be warranted.
first when considering statistical power [1]. The These more certain critical p-values will decrease
higher the sample size, the higher the power all power all else being equal. Conversely, if a larger
else remaining equal. Conversely, a smaller sam- p-value were considered acceptable (e.g., p of 0.1
ple size will always have a less power all else or 1 in 10), power would be increased all else
being equal. This is also the most easily modifi- being equal. p-Values are not usually modified
able factor in calculated expected power. We can unless there is a strong justification to be more or
usually recruit more patients, but other compo- less refined in what is considered significant.
nents of power are often more difficult or impos- When many different comparisons are being
sible to change. conducted within the same study, p-values will
Variability is a measure of how much spread often be adjusted for multiple comparisons. One
exists in the variables being considered. A vari- of the most well-known adjustments is the
able that is highly variable (pun intended) Bonferroni correction in which the critical
between study subjects will result in a larger p-value of 0.05 is divided by the number of com-
standard deviation. The larger the variation, the parisons being made. If there were 10 hypotheses
more subjects will be needed, because a larger being tested, the new critical p-value would
variation will make any difference between the become 0.005 (0.05 ÷ 10 comparisons). The
groups harder to detect. Variability only applies power would then be calculated based on this
to power calculations for continuous or scale new effective p-value. As you can imagine, the
parameters. Frequency (see below) is used for Bonferroni correction can be extremely conser-
discrete variables. vative when a large number of comparisons are
A study’s power is optimized when the fre- planned.
quency of either a discrete outcome or explana- Effect size refers to the magnitude of effect
tory variable is balanced. A study being powered (difference between groups) you expect to find.
on a binary (or ordinal) variable with very low Best practice is to use the smallest effect size
frequency will require many more subjects in deemed clinically relevant. If you do not have an
order to achieve adequate statistical power than a expected effect size based on previous informa-
study in which the frequency is balanced across tion (e.g., pilot data or previously published stud-
groups. For example, if 50% of the study subjects ies), then using your clinical judgment may be
are female and 50% are male, then an analysis required but could be difficult to justify in this era
comparing sex differences would have optimum of data-driven information.
power. However, if intersex (those with biologi- Special consideration should be given to effect
cal reproductive anatomy that are not fully male sizes related to patient-reported outcome mea-
or fully female) were of interest as a third cate- sures (PROMs), which are very common outcome
gory of sex (an estimated 1% of live births) [2], tools in elective orthopedic research. We often use
then this low frequency group would have pro- the minimal clinically important difference
20 Power and Sample Size 189
(MCID), which is also sometimes called the mini- be reduced by having narrow inclusion criteria,
mal clinically important change (MCIC) or mini- which creates a more homogenous study popula-
mal clinically important improvement (MCII). All tion. The trade-off here is that you may be limiting
of these are slightly different concepts, but we the generalizability of the results of the study. For
will use them interchangeably here as the mini- example, if you isolate your study to teenage female
mum effect size we’d like to be able to detect soccer players, you cannot extend your results very
when comparing two groups of patients using easily to college-age male basketball players or
their PROM scores [5]. Conceptually, the MCIC other athletic populations, even if you’ve managed
is the smallest change in PROM score for which a to reduce variability in your study population.
subject can actually discern a difference in their Investigators may be tempted to play fast and
state of health. This concept is particularly useful loose with effect sizes to tweak a power calcula-
when you do not have previous data on which to tion, but this should be done with caution. If you
estimate your effect size. A distribution- based use too large an effect size, you’ll have what
MCIC is usually considered 0.5 × standard devia- appears to be an adequately powered study, but
tion of the PROM score, which gives you a rough you may be hopelessly underpowered to detect
estimate of the MCIC. Some recent work has sug- the effect size you are likely to discover even if
gested that this distribution-based MCIC calcula- the treatment is reasonably effective. You’d still
tion is actually closer to the concept of minimal end up with a negative result even though the
detectable change (MDC), which is essentially effect size revealed was clinically meaningful.
the calibration variability of the PROM [9]. Therefore, adjusting sample size is usually the
However, when previous anchor-based MCICs most practical approach to achieving sufficient
are not available, the distribution-based method is statistical power, and this is why we often equate
usually the only alternative. sample size with power. The best approach is to
As a rule of thumb, the MCIC would be the set your p-value and effect size, and estimate
smallest change expected to make a difference. your variability and/or frequency and then calcu-
This difference may be in a subject’s health, qual- late the sample size you need for 80% (or 90%
ity of life, satisfaction, or a myriad of other mea- power). Another approach is to set all those
sures considered clinically important. When parameters, choose a practical target sample size,
using the distribution-based approach, this is and see if your power reaches 80% (or higher). If
often much smaller than we may expect from a not, you can tweak your sample size upward until
treatment thought to be effective. If true, this will you reach your target power.
result in an overpowering of the study but may Not uncommonly, you have very few cases
also allow for subgroup analyses to determine in (e.g., infected total knee arthroplasty (TKA)
which patients the treatment is most effective (or cases) but a nearly unlimited number of potential
ineffective), a concept often called heterogeneity control subjects (noninfected TKA cases). In this
of treatment effect. case, you can identify all of your infected TKA
Adjusting any of these factors will change the cases and then match multiple control subjects
power for a given study. Since most peer-reviewed per case. Statistical power will be increased for
journals require a p-value of 0.05 or less to be each additional control per case you add. Many
considered statistically significant, this is the case-control studies are performed with a 1:1
power characteristic least easily modifiable for a control/case ratio to maximize efficiency, but
power calculation. Although powering to 0.01 or when power is limited due to the small number of
0.001 is not unheard of, powering to a p-value of cases available, efficiency can be sacrificed by
larger than 0.05 is usually not acceptable. matching 2:1, 3:1, or even 4:1. Practically speak-
Variability and frequency can be adjusted ing, power gains are minimal after 6:1, so it’s
through study design considerations. Variability can really not worth going higher than that.
190 S. Lyman
FIGURATIVE MAP of the successive losses in men of the French Army in the RUSSIAN CAMPAIGN OF 1812-1813
Drawn by Mr.Minard,Inspector General of Bridges and Roads in retirement. Paris, 20 November 1869. The numbers of men present are represented by the widths of the colored zones in a rate of one millimeter for ten thousand
Moscow
Mo
men; these are also written beside the zones. Red designates men moving into Russia, black those on retreat. — The informations used for drawing the map were taken from the works of Messrs. Chiers, de Ségur, de Fezensac, de s
100
ko
Chambray and the unpublished diary of Jacob, pharmacist of the Army since 28 October. In order to facilitate the judgement of the eye regarding the diminution of the army, I supposed that the troops under Prince Jèrôme and
00
.000
wa
under Marshal Davoust, who were sent to Minsk and Mobilow and who rejoined near Orscha and Witebsk, had always marched with the army.
riv
er
187.1
Gjat
100.000
100
.000
22.000
6.000
33.000
Mojaisk Tarantino
Polotsk
0
.00
Glubokoe
0
60
175
.0 Vitebsk
00
.00
00
422.000
Malo-jarosewli
400.000
87.0
145
Doroboy Wirma
96.000
00
55.0
Vilna Smolensk
Kovno
0
30.000
37.00
Bé Orscha
0
réz
ina
riv
24.00
er
0
0
Common leagues of France (map of Fezensac)
8.000
4.000
14.000
.00
10.000
20.00
12.000
28
0 5 10 15 20 25 50
0
Studienska
50.00
Mohilow
Fig. 21.1 A modern adaptation of a map by Charles Minard portraying French losses in the Russian Invasion (1812–1813) is highlighted by Tufte as an example of graphic
elegance and excellence. Commons usage: https://commons.wikimedia.org/wiki/File:Minard_map_of_napoleon.png
S. Lyman et al.
21 Visualizing Data 195
Cleveland and McGill [1] have shown that some 4 . Show data variation, not design variation.
visual elements are easier for people to see and 5. The number of information-carrying (vari-
understand than others. able) dimensions depicted should not exceed
Graphical elements in the order they are most the number of dimensions in the data.
accurately perceived: 6. Graphics must not quote data out of context.
1 . Position along a common scale In practical terms, visual integrity often comes
2. Position along identical nonaligned scales down to proper scaling and formatting, as well as
3. Length avoiding “chartjunk.” Examples follow below.
4. Angle and slope
5. Area
6. Volume and density and color saturation Fact Box 21.2
7. Color hue In practical terms, visual integrity can often
be achieved through:
Use this hierarchy to choose the appropriate
visual element to represent your data. For exam- 1. Accurate scaling
ple, if your data can be represented as either 2. Proper formatting
length or area, choose length, as it is more accu- 3. Avoiding “chartjunk”
rately perceived (Example 3). When conveying
complex information, use the hierarchy to priori-
tize information visually. For example, use angle
or slope for the most important information while 21.4.1 Scaling
adding shading or pattern to provide a second
level of information (Example 2). When using Proper scaling ensures that the numbers on the
color, saturation is more accurately perceived and graphic are directly proportional to the numerical
understood than the relationship between hues quantities represented. It allows you to show all
(Example 4). the data, without distortion. By considering the
hierarchy of human graphical perceptual capa-
bilities, you can choose an appropriate scale(s)
21.4 Visual Integrity that makes data patterns easy to see at a glance
(Example 2).
When individual data points and their analyses
are translated from the spreadsheet to a graph or
figure, care must be taken to avoid unintentional 21.4.2 Formatting
“visual lies” or distortions. To maintain visual
integrity, Tufte offers the following rules [2]: To avoid graphical distortion and ambiguity, pay
attention to formatting. In practical terms: use
1. The representation of numbers, as physically simple symbols and clear, thorough labels and
measured on the surface of the graphic itself, avoid remote legends (Examples 3 and 4).
should be directly proportional to the numeri-
cal quantities represented.
2. Clear, detailed, and thorough labeling should 21.4.3 Avoiding “Chartjunk”
be used to defeat graphical distortion and
ambiguity. To focus viewers on data rather than design,
3. Write out explanations of the data on the
avoid unnecessary or confusing elements. Color
graphic itself. Label important events in the is often used with the intention of making a
data. graphic more appealing but can undermine the
21 Visualizing Data 197
effectiveness of the graphic, if it is not used in a If the tables, charts, or other graphics are
manner that conveys relevant information being prepared for a manuscript, make sure to:
(Examples 2, 3, and 4). Three-dimensional graph-
ics are also used for an appealing look, but they 1. Follow the formatting instructions of the spe-
should only be used for three-dimensional data cific journal.
(Examples 2 and 3). Using extra dimensions for 2. Make the values, abbreviations, and/or termi-
two-dimensional data not only encumbers graph- nology in the table consistent with what
ics with “chartjunk” but can create distortions or appears in the text.
optical illusions (“visual lies”). Clearly, these 3. Avoid redundancy. Specific values should not
should be avoided. appear in both the text and table.
4. Use tables for details, for instance, results
instead of lengthy text.
21.4.4 Context
Fig. 21.2 Patient- 60
reported outcome scores
after treatment A, B, C,
or D after N weeks of 50
treatment
40
30
20
10
0
A B C D
40
Score
20
A B C D
Treatment.Group
data points and show that, for treatment group B, Example 2: Line Graph
one of the points is an outlier. Comparing Line graphs are common in orthopedic research.
Figs. 21.2 and 21.3 shows that the type of plot used This is an example showing sample data from a
in Fig. 21.2 hid the presence of a negative value study where patient-reported outcome scores
(below 0) in treatment group B. The comparison were assessed as a measure of treatment efficacy
also reveals that treatment group D included fewer in a treated and (untreated) control group.
patients (data points) than the other groups. Figures 21.4, 21.5, and 21.6 show the same data,
21 Visualizing Data 199
Fig. 21.4 Patient-
reported outcome score 30
after treatment for 25
condition X 20
15
10
5
0
1 Control
2 3
4 Treatment
5
6 7
Treatment Control
Fig. 21.5 Patient- 30
reported outcome scores Treatment
are improved with 25
treatment. Treated
(n = 104) and matched
control (n = 104) 20
patients completed the
Score
0
1 2 3 4 5 6 7
Weeks
represented with various degrees of graphical Visual integrity: Clear, detailed, and thor-
excellence and visual integrity. ough labeling should be used to defeat graphi-
Figure 21.4 is a typical chart that can be gen- cal distortion and ambiguity.
erated using Excel software and is an example of Neither the X nor Y axes of this graphic are
a graphic that fails to follow the rules of visual labeled (Fig. 21.4). The viewer would need to
integrity and graphic excellence. Figures 21.5 search the manuscript to understand what is rep-
and 21.6 show how the chart can be revised resented in this graphic.
(within Excel) to be more effective. Visual integrity: Write out explanations of
Visual integrity: The number of the data on the graphic itself. Label important
information- carrying (variable) dimensions events in the data.
depicted should not exceed the number of The data lines for the treated and control
dimensions in the data. groups are identified in a remote legend at the
This data is two-dimensional, so it should be bottom of the graph (Fig. 21.4). Remote legends
represented in a two-dimensional graph (Figs. 21.5 are not recommended, as they force the viewer to
and 21.6). In Fig. 21.4 the third dimension not look back and forth between the legend and the
only distracts but confuses the viewer, creating an data points to understand the graphic. Wherever
optical illusion, where scores, which are actually possible, labels should appear next to the data
different in treatment and control groups, appear to points so viewers see the data point and what it
overlap at several time points. represents in one glance (Fig. 21.5).
200 S. Lyman et al.
Score
completed the ABC Control
10
survey for 7 weeks after
receiving treatment
X. Mean scores at each
weekly time point are
shown
1
1 2 3 4 5 6 7
Weeks
Graphical Excellence: Draw attention to the Visual integrity: Graphics must not quote
substance of the graph (not its production). data out of context.
Extraneous visual elements or “chartjunk” Figure legend 21.4, while descriptive, does
distract the viewer. In addition to removing the not provide sufficient information for the viewer
extra dimension, removing gridlines, which do to understand the graph without referring to the
not improve the viewer’s ability to estimate the manuscript. Figure 21.5 legend interprets the
numerical value of each data point, improves data and provides enough information to under-
this graph (Fig. 21.5). Placing the tick marks stand the data without additional explanation.
inside the axis labels (instead of outside) is suf- Hierarchy of graphical perception: Angle
ficient for the eye to be able to see where the and slope.
data points are located, relative to the axis scales With the graph in two dimensions, it is easy to
(Fig. 21.5). see that the line representing the scores of treated
Color is also an unnecessary (and possibly patients banks at around 45° (Fig. 21.5). Extreme
distorting) element in Fig. 21.4. angles are difficult to perceive, so, where possi-
Hierarchy of graphical perception: Line ble, choose a scale that results in lines that bank
over color. around 45°.
Color is far down the list in the hierarchy of Visual integrity: The representation of
human graphical perception, so representing numbers, as physically measured on the sur-
information through the use of color should be face of the graphic itself, should be directly
avoided, if the information can be conveyed in proportional to the numerical quantities
black and white. Red and blue are particularly represented.
poor choices as people with the most common Figures 21.5 and 21.6 show the importance of
form of color blindness are often unable to distin- the scale. While the data are the same, the inter-
guish red, purple, and blue. pretation is different, based on the scale. In
Using a solid and dashed lines not only avoids Fig. 21.5, the score (represented on the y-axis) is
unnecessary color, it adds information visually. A based on a 30-point survey where the differences
dashed line is harder to see than a solid line, so between treated and control groups were clini-
viewers will immediately understand that it is the cally meaningful. In Fig. 21.6, the score is based
less important line, even before reading the label, on a 100-point survey where the differences
confirming that it represents the (untreated) con- between treated and control groups were clini-
trol group (Fig. 21.5). cally not significant.
Graphical Excellence: Integrate with statis- Hierarchy of graphical perception: Angle
tical and verbal descriptions. and slope.
21 Visualizing Data 201
Unknown
Impingement
Untreated laxity
Failure of fixation
Unknown
Impingement
Untreated laxity
Failure of fixation
Hyperlaxity
Infection
Where possible, choose a scale resulting in depicted should not exceed the number of
lines that bank around 45°, avoiding extreme dimensions in the data.
angles, which are difficult to perceive. Making the pie chart in Fig. 21.7 three-
dimensional adds an unnecessary dimension
Example 3: Pie Charts (“chartjunk”) as the thickness of each wedge
Pie charts are often used to show how different does not convey any information. Furthermore,
elements account for a specific portion of the the use of a third dimension creates a “visual
whole. However, based on the hierarchy of lie” (unintentional but deceptive optical illu-
human graphical perception, area is perceived sion) in which the segments of the pie that are at
rather inaccurately, so alternatives are preferred, the bottom of the pie appear bigger than those at
when possible. the top, because the perspective used to repre-
Figures 21.7, 21.8, and 21.9 are based on data, sent the third dimension makes the thickness of
which appear in a table format in the original the disk more apparent at the bottom of the
publication by Trojani et al. [3]. graphic.
Visual integrity: The number of Visual integrity: Show data variation, not
information- carrying (variable) dimensions design variation.
202 S. Lyman et al.
Impingement
12
Unknown
15 New trauma,
30
The colors of the pie wedges are arbitrary add- Graphical Excellence: Reveal the data at
ing meaningless variation that distracts, rather several levels of detail.
than informs. The colors may even imply a “visual The circular bar chart (Fig. 21.9) shows the
lie,” suggesting that wedges of related colors may proportion of each type of ACLR failure, like
represent groups that are similar for some reason. the pie chart, but uses curved lines, rather than
Visual integrity: Write out explanations of wedges to represent each type of failure.
the data on the graphic itself. Label important Instead of using arbitrary colors, a gray scale is
events in the data. used to accentuate visually that each adjacent
The remote legend (Fig. 21.7) forces the arc is longer than the next so that the types of
viewer to look back and forth between the legend failure are arranged in order of frequency. The
and the data (pie wedges) to understand the pie labels are placed within or beside each arc,
chart. Wherever possible, labels should appear avoiding remote legends. The labels are also
next to the data points so viewers see the data followed by the number of cases of that type of
point and what it represents in one glance ACLR failure, revealing another level of detail
(Figs. 21.8 and 21.9). of the data.
Hierarchy of graphical perception: Lines Graphical Excellence: Integrate with statis-
are perceived more accurately than areas. tical and verbal descriptions. Serve a clear
Converting the information into a bar graph purpose.
(Fig. 21.8) conveys the information more effi- Visual Integrity: Graphics must not quote
ciently because length is perceived more accu- data out of context.
rately than area. However, while this type of Figure legends 21.7 and 21.8 fall short of tell-
graph makes the number of cases of each type of ing the viewer the point of the graphic. Figure 21.9
failure much easier to understand visually, it legend includes the author’s interpretation of the
makes the proportion that each type of failure data and also provides important contextual
makes up more difficult to discern. information.
21 Visualizing Data 203
Example 4: Maps 0 10 20 30 40
Maps are very useful when comparing disease Alabama
Arizona
prevalence, treatment costs, or other phenomena California
that vary depending on geographical location. Connecticut
Figures 21.10, 21.11, 21.12, and 21.13 show data District of Columbia
on obesity in the USA in 2016 [4]. Georgia
Idaho
Graphical Excellence: Make large datasets Indiana
coherent. Kansas
The bar graph (Fig. 21.10) makes the propor- Louisiana
Maryland
tion of each state’s population who is obese very Michigan
clear, but it’s difficult to appreciate any informa- Mississippi
tion about relationships among states in an alpha- Montana
Nevada
betical list. A map is a visually efficient way to
New Jersey
make geographic information easier to see. New York
Figure 21.11 is a map where it is easy to see that North Dakota
obesity affects a greater proportion of adults in Oklahoma
Pennsylvania
the southeast part of the USA and a lower propor- South Carolina
tion on the East and West coasts and that Tennessee
Colorado, Hawaii, Massachusetts, and the Utah
Virginia
District of Columbia have the lowest proportion West Virginia
of obese adults. Wyoming
20-24.9%
25-29.9%
30-34.9%
>35%
Fig. 21.11 Percentage of adults who are obese (body mass index over 30) by state in 2016
204 S. Lyman et al.
Fig. 21.12 Percentage of adults, who are obese (body mass index over 30) by state in 2016
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
Fig. 21.13 Percentage of adults, who are obese (body mass index over 30) by state in 2016
21 Visualizing Data 205
Hierarchy of graphical perception: Color. easily identify the state in a map without the
The use of color in Fig. 21.11 is confusing. label, making the labels redundant and a distrac-
Although the “color wheel” has been used so that tion from the information.
relationships between colors have a direct rela- Visual integrity: The representation of num-
tionship to the data, this relationship is difficult to bers, as physically measured on the surface of
discern. The fact that red indicates a proportion the graphic itself, should be directly propor-
much higher than green does make sense based tional to the numerical quantities represented.
on the “color wheel” but is not obvious without Using a gray scale allows the appropriate rep-
having to look at the remote legend. resentation of the data as a continuous variable.
Use of color should be limited to graphics Figures 21.11 and 21.12 represented the preva-
where the color itself conveys important infor- lence of obesity as if they occurred as categories
mation, as color is low in the hierarchy of graph- (20–24%, 25–29%, 30–34%, or >35%), but these
ical perception. Assessing relationships between categories were arbitrary. Whether a state has a
colors accurately is difficult even for the nor- proportion of adults in a specific category (20–
mally sighted and impossible for the significant 24%, 25–29%, 30–34%, or >35%) has no inher-
proportion of people who are color blind. ent meaning. Although a graduation from white
Hierarchy of graphical perception: Color. to a color (e.g., teal, in Fig. 21.12) could also rep-
In Fig. 21.12, color has been replaced by hue. resent values in continuity, eliminating the unnec-
Hues are less accurately perceived than color, essary color removes a distraction, so that the
when considering the hierarchy of human graphi- viewer can focus on the data.
cal perception. However, the relationship between
hue and number is easily understood. In Example 5: Tables
Fig. 21.12, lighter represents lower and darker These tables represent sample data comparing
represents higher numerical values—a relation- two surveys for the assessment of hand function.
ship that is easy to see. Table 21.1 title is descriptive but redundant,
Visual economy. including information that is repeated in the table
In Fig. 21.12, the two-letter abbreviation for itself. Information to help the reader understand
each state has been removed as most people can the table should appear in a footnote (Table 21.2).
Table 21.1 Responsiveness of Survey A and Survey B, showing the means, and 95% confidence intervals for the
scores at injection day, Day 30, and the change from injection day to Day 30
Survey Time point N Mean p-value Effect size Standard response mean
A Injection day 109 42 (38, 46) <0.001 1.5 1.2
Day 30 88 72 (68, 76)
Change 87 30 (25, 35)
B Injection 109 20 (16, 23) <0.001 −0.5 −0.7
Day 30 89 12 (9, 14)
Change 88 −9 (−11, −6)
The p-values indicate the significance of the difference between scores at Day 30 and injection day. Effect sizes and
standard response means for each instrument are also displayed
Table 21.2 Responsiveness of Survey A is greater than Survey B in patients with Dupuytren’s disease
Survey A (N = 87) Survey B (N = 88)
Mean change in score (95% CI)a 30 (25, 35) −9 (−11, −6)
Effect size 1.5 −0.5
Standard response mean 1.2 −0.7
a
Mean change in score between Day 0 (treatment with a collagenase injection) and 30 days after treatment for Survey
A (87 patients) and Survey B (88 patients) are reported. The 30-day mean change in score was significant for each
survey (both p < 0.001).
206 S. Lyman et al.
Table 21.1 is also confusing, showing patient is the absolute magnitude of the value that is
number (N) for different time points, when the relevant.
only important information is the change in mean
survey scores between the day of treatment and Take-Home Message
30 days later (change). Also, even though the • The goal is not to make tables and figures look
effect size and standard response mean are the nice but to make sure that viewers understand
most important information, they are far away your research accurately and efficiently. Just
from the name of the survey instrument whose as understanding grammar and syntax make
effectiveness they report. for clear writing, understanding the hierarchy
Table 21.2 title summarizes the analysis and of human graphical perception ability is essen-
provides an interpretation to help readers under- tial for creating graphs and charts that make
stand the data. A footnote provides methodologi- complex information easy to understand.
cal information required to understand the • The appropriate use of figures and tables, cre-
information, so that the table stands alone and ated with Graphical Excellence and Visual
readers do not need to refer to anything else (such Integrity, is as important as well-written text
the rest of the manuscript or poster). in the dissemination of research.
Tables should be used when it is important for
the viewer to see the specific numbers that are
shown. Some of the values in Table 21.1 have References
been removed so that Table 21.2 includes only
the essential numbers to convey the information 1. Cleveland WS, McGill R. Graphical perception:
theory, experimentation, and application to the devel-
from the analysis. Streamlining the information opment of graphical methods. J Am Stat Assoc.
also makes it possible to remove unnecessary 1984;79(387):531–54.
boxes and lines. 2. Tufte ER. The visual display of quantitative informa-
For the information in Tables 21.1 and 21.2, a tion. 2nd ed. Cheshire: Graphics Press; 2001.
3. Trojani C, Sbihi A, Djian P, Potel JF, Hulet C, Jouve F,
table is better suited than a graph. A graph would Bussière C, Ehkirch FP, Burdin G, Dubrana F, Beaufils
show positive data points for Survey A versus P. Causes for failure of ACL reconstruction and influ-
negative ones for Survey B, and this would create ence of meniscectomies after revision. Knee Surg
a “visual lie,” where the fact numbers above or Sports Traumatol Arthrosc. 2011;19(2):196–201.
4. The state of obesity, trust for America’s Health and
below zero appear important, whereas due to dif- Robert Wood Johnson Foundation. https://stateofobe-
ferences in the way the two surveys are scored, it sity.org/adult-obesity/.
Part IV
Basic Toolbox for the Young Clinical
Researcher
How to Prepare an Abstract
22
Elmar Herbst, Brian Forsythe, Avinesh Agarwalla,
and Sebastian Kopf
22.1 Introduction the study fits, then they will generally examine
the introduction and discussion of the paper, and
Orthopaedic research encompasses technical only readers who have a particular interest in the
notes, case reports, systematic reviews, meta- topic will read the entirety of the paper. In many
analyses, retrospective studies, and prospective ways, the abstract functions as the most impor-
studies. While there is a continuum of strength tant part of the manuscript because it “sells” the
associated with each type of research, the com- publication to readers. If readers cannot under-
monality in the dissemination of all orthopaedic stand the study during a brief overview, the man-
research is the abstract—which serves as a snap- uscript is unlikely to undergo further critical
shot of the work that was completed [8, 9, 13, review regardless of if the study has an impecca-
14]. Abstracts are submitted for review in confer- ble design or impactful results.
ences, and if accepted, it is indexed as part of the
proceeding. When fellow researchers then read
the published paper, the manuscript is rarely 22.2 Types of Abstracts
examined in its entirety on the first pass. Rather,
researchers will carefully review the abstract to There are two general types of abstracts: descrip-
understand the study’s purpose, methodology, tive and informative. A descriptive abstract is
results, and conclusion. From this brief overview, approximately 100 words detailing the purpose,
readers determine if the study fits their needs. If goals, and methods of the article. In a descriptive
abstract, a description of the study’s results is not
provided; thus, it is necessary to read the article
in its entirety to view the results. The purpose of
E. Herbst
Department of Orthopaedic Sports Medicine, TU this type of abstract is to introduce the subject to
Munich, Munich, Germany readers, who must then read the entire paper to
B. Forsythe · A. Agarwalla learn about the results and conclusions of the
Midwest Orthopaedics at Rush, Rush University investigation (Example 1). Informative abstracts,
Medical Center, Chicago, IL, USA on the other hand, are approximately 350 words
e-mail: brian.forsythe@rushortho.com and serve as an overview of the entire paper
S. Kopf (*) detailing the purpose, background, methods,
Center for Orthopaedics and Traumatology, Hospital results, and conclusions. Informative abstracts
Brandenburg, Medical School Theodor Fontane,
Brandenburg an der Havel, Germany provide data on the content of the work and high-
e-mail: s.kopf@klinikum-brandenburg.de light salient points of the study in its entirety so
that readers can understand the study and its tendon allograft, or (3) reconstruction with
implications in its entirety without having to read Achilles tendon allograft. Posterior tibial transla-
the complete text (Example 2). These types of tion was measured at neutral and 20° external
abstracts generally follow a format set forth by rotation, and then each specimen underwent a
the publishing journal. preload, cyclic loading protocol of 250 cycles,
It should be noted, however, that abstracts that and, lastly, load to failure.
are submitted to conferences are generally longer Results: The intact specimens achieved the
(approximately one page in its entirety) and a fig- greatest failure load compared to both reconstruc-
ure and/or table can be provided. Abstracts that tions (2048 ± 969 N, p = 0.001). Quadriceps ten-
are submitted to conferences follow the same for- don allografts had a higher maximum force during
mat as an abstract that describes a manuscript, failure testing than the Achilles allograft (2017,
but generally more emphasis can be placed on 1837 N, respectively, p = 0.007). No significant dif-
presenting the results or discussing the findings ferences were noted between quadriceps and
of the investigation. Achilles allograft for differences in displacement
of the graft, creep deformation, or stiffness.
Example 1: Descriptive Abstract [6] Construct stiffness during failure testing was great-
Posterior cruciate ligament (PCL) reconstruction est in the intact group (169 ± 9 N/mm, p = 0.0005)
generally uses an Achilles tendon allograft, compared with the Achilles (56 ± 13 N/mm) and
although recently, quadriceps tendon has been quadriceps (47 ± 3 N/mm) groups.
used an alternative option due to its size and high Conclusion: Quadriceps and Achilles tendon
bone density. In this investigation, we compared allografts had similar biomechanical properties
the biomechanical strength of quadriceps tendon when used for a PCL reconstruction, but both
versus an Achilles tendon allograft during PCL were inferior to the native PCL. However, quad-
reconstruction. Thirty fresh-frozen cadaveric riceps tendon allografts displayed a stronger con-
knees were assigned to one of the three groups: struct with failure load and stiffness in comparison
(1) intact PCL, (2) PCL reconstruction with to Achilles tendon allografts.
quadriceps tendon allograft, or (3) PCL recon- Clinical relevance: The quadriceps tendon is
struction with Achilles tendon allograft. Posterior a viable graft option in PCL reconstruction as it
tibial translation was measured at neutral and 20° exhibits a greater maximum force but is other-
external rotation, and then each specimen under- wise comparable to the Achilles allograft. The
went a preload, cyclic loading protocol of findings of this investigation expand allograft
250 cycles, and load to failure. availability in PCL reconstruction.
Apart from descriptive and informative abstracts,
Example 2: Informative Abstract [6] structured abstracts must be distinguished from
Background: Previous investigations of posterior unstructured abstracts. Structured abstracts follow a
cruciate ligament (PCL) reconstruction suggest clear structure as mentioned below (e.g. back-
that normal stability is not restored in many ground, methods, results, and conclusion; Example
patients. The Achilles tendon allograft is frequently 2) [12]. In contrast, unstructured abstracts do not
used, although recently, the quadriceps tendon has follow distinct paragraphs but represent a running
been utilized due to its size and high bone density. paragraph as shown in the Example 3.
Purpose: The purpose of this investigation
was to compare the biomechanical strength of a Example 3: Unstructured Abstract of a
quadriceps versus an Achilles allograft during Narrative Review [2]
PCL reconstruction. We hypothesize that quadri- Meniscectomy is one of the most popular ortho-
ceps allografts have comparable mechanical paedic procedures, but long-term results are not
properties to those of Achilles allografts. entirely satisfactory, and the concept of meniscal
Methods: Thirty fresh-frozen cadaveric knees preservation has therefore progressed over the
were assigned to one of the three groups: (1) years. However, the meniscectomy rate remains
intact PCL, (2) reconstruction with quadriceps too high even though robust scientific publica-
22 How to Prepare an Abstract 211
tions indicate the value of meniscal repair or non- stem cells (MSCs). The objective of this study was
removal in traumatic tears and nonoperative to show that a coculture of anterior cruciate liga-
treatment rather than meniscectomy in degenera- ment (ACL) cells and MSCs has a beneficial effect
tive meniscal lesions. In traumatic tears, the first- on ligament regeneration that is not observed when
line choice is repair or non-removal. Longitudinal utilizing either cell source independently.
vertical tears are a proper indication for repair, Autologous ACL cells (ACLcs) and MSCs were
especially in the red-white or red-red zones. isolated from Yorkshire pigs, expanded in vitro,
Success rate is high and cartilage preservation and cultured in multiwell plates in varying
has been proven. Non-removal can be discussed %ACLc/%MSC ratios (100/0 75/25, 50/50, 25/75,
for stable asymptomatic lateral meniscal tears in and 0/100) for 2 and 4 weeks. Quantitative mRNA
conjunction with anterior cruciate ligament expression analysis and immunofluorescent stain-
(ACL) reconstruction. Extended indications are ing for ligament markers collagen type I (collagen-
now recommended for some specific conditions: I), collagen type III (collagen-III), and tenascin C
horizontal cleavage tears in young athletes, hid- were performed. We show that collagen-I and
den posterior capsulo-meniscal tears in ACL tenascin C expression is significantly enhanced
injuries, radial tears, and root tears. Degenerative over time in 50/50 cocultures of ACLcs and MSCs
meniscal lesions are very common findings (p ≤ 0.03), but not in other groups. In addition,
which can be considered as an early stage of collagen-III expression was significantly greater in
osteoarthritis in middle-aged patients. Recent MSC-only cultures (p ≤ 0.03), but the collagen-I-
randomized studies found that arthroscopic par- to-collagen-III ratio in 50% coculture was closest
tial meniscectomy (APM) has no superiority over to native ligament levels. Finally, tenascin C
nonoperative treatment. Thus, nonoperative treat- expression at 4 weeks was significantly higher
ment should be the first-line choice, and APM (p ≤ 0.02) in ACLcs and 50% coculture groups
should be considered in case of failure: 3 months compared to all others. Immunofluorescent stain-
has been accepted as a threshold in the ESSKA ing results support our mRNA expression data.
Meniscus Consensus Project presented in 2016. Overall, 50/50 cocultures had the highest colla-
Earlier indications may be proposed in cases with gen-I and tenascin C expression and the highest
considerable mechanical symptoms. The main collagen-I-to-collagen-III ratio. Thus, we con-
message remains: save the meniscus! clude that using a 50% coculture of ACLcs and
Commonly, unstructured abstracts belong to MSCs, instead of either cell population alone, may
narrative reviews of current literature where a better maintain or even enhance ligament marker
structured design would not be appropriate expression and improve healing.
because of the lacking Results or Methods sec-
tions. However, some basic research journals ask
also for unstructured abstracts (Example 4). 22.3 Components of an Abstract
Regardless of the type of abstract, each abstract
should provide a brief background or introduction Depending on the journal, an abstract can be written
to the topic, a summary of the methods, and the as a free-flowing paragraph or each component of
results followed by a concluding remark as the abstract must be separated and formatted into a
described more in detail in the following sections. formal structure. Some journals may require addi-
tional sections that discuss the clinical relevance,
Example 4: Unstructured Abstract of an limitations, or a description of the study design.
Experimental Study [3]
Ligament and tendon repair is an important topic
in orthopaedic tissue engineering; however, the 22.3.1 Background
cell source for tissue regeneration has been a con-
troversial issue. Until now, scientists have been The background of an abstract is generally one to
split between the use of primary ligament fibro- two sentences that present the problem that the
blasts and bone marrow-derived mesenchymal investigator aims to address. In this short space,
212 E. Herbst et al.
the questions that need to be answered are “why is the research gap that this study aims to fill without
this study important?” and “what is the impact of the reader having to conduct an extensive litera-
this study?”. A good statement in background ture search. It’s important to keep the background
section tells the reader what is already known of the abstract short. While an extensive back-
about the topic and what needs to be investigated. ground allows the reader to be well-informed of
A common mistake in this section is to discuss previously established knowledge regarding the
what has already been studied about the topic but topic, it prohibits a more detailed discussion of
fails to discuss the gap that remains in the litera- the present investigation [1, 4, 5].
ture on that topic. A statement of what needs to be
further investigated does not need to be explicitly
stated; rather the reader should be able to infer 22.3.2 Purpose
what further research is warranted. Examples of a
good background section and the most common The purpose is another one to two sentences illus-
pitfall are provided in Table 22.1. The background trating the problem statement of the investigation.
section sets the stage for the reader to understand This portion of the abstract needs to address the
question “What is this study investigating?” The
Table 22.1 Examples of appropriate and insufficient
statement needs to be a concise explanation of
background statements in an abstract with comments on what the study specifically aims to investigate.
the differences between the statements (bolt) Within this section, the scope should be identifi-
Appropriate background statements able—whether the study is addressing a specific
• Previous investigations of posterior cruciate ligament problem or an overarching generic issue. In other
(PCL) reconstruction suggest that normal stability is words, the scope should identify the applicability
not restored in many patients. (Why is the study
of the investigation. For example, “The purpose
needed?) The Achilles tendon allograft is frequently
used, although recently, the quadriceps tendon has of this investigation is to compare the biomechan-
been utilized due to its size and high patellar bone ical strength of a quadriceps tendon versus an
density [6] (What is known about the subject?) Achilles tendon allograft during a transtibial PCL
• During arthroscopic ACL reconstruction, transtibial reconstruction” [6]. In this example, the purpose
drilling techniques are widely used because they
simplify femoral tunnel placement and reduce of the investigation (biomechanical strength of
surgical time (What is known about the subject?) different grafts) and the scope of the investigation
However, there has been concern that this technique (PCL reconstruction using a transtibial approach)
results in non-anatomically positioned bone tunnels, are clearly stated such that the reader understands
which may cause abnormal knee functionality [10].
(Why is the study needed?) what the investigators plan to conduct. It is com-
Insufficient background statements mon to see abstracts combine the purpose with the
• PCL reconstruction outcomes are variable and have background into a single section. Together, the
not had the same success as ACL reconstructions background and purpose serve to tell the reader
(research question not evident and not specific
what has already been established in the literature
enough). Numerous surgical techniques have been
described, including open and arthroscopic tibial inlay and what this current study aims to investigate.
and transtibial techniques. Currently, the most
frequently utilized technique is an open transtibial
technique with an Achilles tendon allograft [6] Fact Box 22.1: Background and Purpose
(insufficient background information related to the
research question) –– Highlight the rationale and importance
• During arthroscopic ACL reconstruction, transtibial of the study
drilling techniques are widely used because they –– Do not summarize what has already
simplify femoral tunnel placement and reduce trauma been studied
and surgical time by employing a single-incision
approach. Thus, the transtibial approach was seen a –– The purpose should be brief and follows
major innovation and quickly became a popular the background sentence
technique for arthroscopic ACL reconstruction [10] –– The purpose should be a clear statement
(research question not provided; the reviewer is not on what the study is investigating
informed why the study is needed)
22 How to Prepare an Abstract 213
Table 22.3 Description of a proper and inappropriate description of the results of an investigation in an abstract
Improper description of results Proper description of results
At 30°, there was a significantly larger Thirty-eight consecutive patients who underwent ACL reconstruction
dial test result in the affected knee prior were prospectively evaluated in the 6-month study period between April
to ACL reconstruction compared to after 2017 and May 2017. The mean age of the included patients was
ACL reconstruction (29.6° vs. 19.0°, 32.0 ± 12.6 years, with mean BMI of 26.3 ± 7.3 kg/m2. Most patients
p < 0.0001) and compared with the were male (58.6%) with sports-related injuries (66.0%) and a median
unaffected knee (29.6° vs. 22.5°, time between injury and surgical intervention of 31 days. The ICC
p < 0.0001), but this difference was between both examiners was 0.969, indicating a high reliability of the
eliminated after reconstruction (14.0° vs. gathered measurements. At 30°, there was a significantly larger dial test
13.5°, p = 0.69). At 90°, there was a result in the affected knee prior to ACL reconstruction compared to after
significantly larger dial test result in the ACL reconstruction (29.6° vs. 19.0°; 95% CI [−4.9, −6.6]; p < 0.0001)
affected knee before ACL reconstruction and compared with the unaffected knee (29.6° vs. 22.5°; 95% CI [5.8,
compared with after ACL reconstruction 7.4]; p < 0.0001), but this difference was eliminated after reconstruction
(31.6° vs. 21.1°, p < 0.0003) and (14.0° vs. 13.5°; 95% CI [0.7, −1.2]; p = 0.69). At 90°, there was a
compared with the unaffected knee (31.6° significantly larger dial test result in the affected knee before ACL
vs. 21.9°, p < 0.0001); with this reconstruction compared with after ACL reconstruction (31.6° vs. 21.1°;
difference was eliminated after 95% CI [−4.9, −6.9]; p < 0.0002) and compared with the unaffected
reconstruction (12.1° vs. 11.9°, knee (31.6° vs. 21.9°; 95% CI [−5.2, −9.4]; p < 0.0001); with this
p = 0.3189) [7]. (Even though the results difference was eliminated after reconstruction (12.1° vs. 11.9°; 95% CI
of the dial test are presented, no 1.31, −1.7]; p = 0.41). The PLC gap was measured at less than the
information on the study cohort are critical 12 mm both pre-ACL reconstruction (6.9 mm) and post-ACL
given. Data presentation lacks standard reconstruction (6.3 mm). The PLC gap decreased significantly (95% CI
deviations and/or confidence intervals, [−0.1, −0.7]; p = 0.0009) [7]. (This results section follows a clear
making proper understanding of the structure with demographics first, followed by inter-rater reliability data
results difficult. Furthermore, the authors and the primary outcome variables. The authors provide confidence
did not present the data as described in intervals as well as data on the aforementioned PLC gap in accordance
the methods section (Table 22.2)) to the methods section in Table 22.2)
22 How to Prepare an Abstract 215
tion; however, they fail to discuss the importance include the aforementioned phrases in the
of those findings. abstract, they may be removed to save space for a
sentence or phrase that helps convey the message
of the abstract.
Fact Box 22.4: Conclusion Abstracts do not include references, and it is also
–– The conclusion must be supported by important to ensure that the abstract does not have
the results. any undefined abbreviations. Lastly, each abstract
–– Make concise and accurate statements. must contain a list of 4–6 “keywords” that will help
–– Provide a sentence on the clinical or sci- guide the search indexes in finding this article.
entific merit of the study. Some authors recommend writing the abstract
from scratch after finishing a manuscript, while
some authors believe that it is best to copy direct
phrases from manuscript to implement into the
22.4 General Guidelines abstract. Those who write abstracts in the former
mechanism believe that copying information
An abstract is a 100–350-word snapshot of the directly from the manuscript leads to an abstract
manuscript, and it should not contain any infor- that is non-confluent or a summary that contains
mation that is not further supported within the too much or too little information. In this method,
manuscript. While the abstract is the first, and in it is best to reread the manuscript in its entirety
many cases, the only, portion of a research project and then summarize information in a new way
that is examined by readers, it should be the last that is unique from the original manuscript.
thing that is completed. Some may believe that an Copying portions of the manuscript is an efficient
abstract should be written first since it is a short method for creating the abstract since every piece
overview of the paper and immediately proceeds of information lies within the manuscript itself.
the manuscript, but it is much easier to summarize Ultimately, neither method is considered more
a manuscript that has already been completed. efficacious than the other, but the modality in
Additionally, it is more efficacious to write an which the abstract is formed is dependent on the
abstract without concern for the word count and writer’s preference and their comfort level.
then pare it down to the specified word limit.
Within a limited space, it is easy for writers to
misconstrue or bias the message of a research Fact Box 22.5: Abstracts for Scientific
publication; thus, the writer must ensure that Articles
readers cannot misinterpret the abstract. Each –– Follow the instructions for authors
component of the abstract should be able to stand –– Tell the same story as in the manuscript
alone such that the reader can clearly understand –– Focus on the data related to the purpose
the message of each aspect without having to and hypothesis of the study
refer to other sections of the abstract for clarifica- –– The conclusion must be supported by
tion. The abstract, and manuscript for that matter, the data and highlight the scientific
should be written in the past tense and in the third merit
person. For example, the sentence “The surgeon
fixed the anterior cruciate ligament at the mid-
point of the anteromedial and posterolateral bun-
dle” should read “The anterior cruciate ligament Take-Home Message
was fixed at the midpoint of the anteromedial and • An abstract is a snapshot of the manuscript
posterolateral bundle.” Instead of phrases that and is a crucial part of each manuscript as it is
incorporate “I” or “we,” phrases such as “the freely available and frequently read.
investigation demonstrates,” “the results illus- • Abstracts should not contain any information
trate,” or “this study explains” should be included that is not further supported within the
in the manuscript. While it is appropriate to manuscript.
22 How to Prepare an Abstract 217
• Regardless of the purpose of the abstract, it posterior cruciate ligament reconstruction with quad-
riceps versus Achilles tendon bone block allograft.
should contain the aim of the study, a brief Orthop J Sports Med. 2016;4:2325967116660068.
overview of the methods, the most important 7. Forsythe B, Saltzman BM, Cvetanovich GL, Collins
results, and a conclusion addressing the clini- MJ, Arns TA, Verma NN, et al. Dial test: unrecog-
cal or scientific merit of the study. nized predictor of anterior cruciate ligament defi-
ciency. Arthroscopy. 2017;33:1375–81.
8. Frazer A. How to write an effective conference
abstract. Emerg Nurse. 2012;20:30–1.
9. Ickes MJ, Gambescia SF. Abstract art: how to write
References competitive conference and journal abstracts. Health
Promot Pract. 2011;12:493–6.
1. Andrade C. How to write a good abstract for a sci- 10. Kopf S, Forsythe B, Wong AK, Tashman S, Anderst
entific paper or conference presentation. Indian J W, Irrgang JJ, et al. Nonanatomic tunnel position
Psychiatry. 2011;53:172–5. in traditional transtibial single-bundle anterior cru-
2. Beaufils P, Becker R, Kopf S, Matthieu O, Pujol ciate ligament reconstruction evaluated by three-
N. The knee meniscus: management of traumatic dimensional computed tomography. J Bone Joint Surg
tears and degenerative lesions. EFORT Open Rev. Am. 2010;92:1427–31.
2017;2:195–203. 11. Li G, Abbade LPF, Nwosu I, Jin Y, Leenus A, Maaz
3. Canseco JA, Kojima K, Penvose AR, Ross JD, M, et al. A scoping review of comparisons between
Obokata H, Gomoll AH, et al. Effect on ligament abstracts and full reports in primary biomedical
marker expression by direct-contact co-culture of research. BMC Med Res Methodol. 2017;17:181.
mesenchymal stem cells and anterior cruciate liga- 12. Pierce LL. How to make a beef stew...or write a struc-
ment cells. Tissue Eng Part A. 2012;18:2549–58. tured abstract. Rehabil Nurs. 2017;42:243–4.
4. Cartwright R, Tikkinen KA, Vierhout ME, Koelbl 13. Turbek SP, Chock TM, Donahue K, Havrilla CA,
H. How to write an ICS/IUGA conference abstract. Oliverio AM, Polutchko SK, et al. Scientific writing
Int Urogynecol J. 2010;21:509–13. made easy: a step-by-step guide to undergraduate
5. De Smet AA, Manaster BJ, Murphy WA,J. How to write writing in the biological sciences. Bull Ecol Soc Am.
a successful abstract. Radiology. 1994;190:571–2. 2016;97:417–26.
6. Forsythe B, Haro MS, Bogunovic L, Collins MJ, Arns 14. Weinstein R. How to write an abstract and present it at
TA, Trella KJ, et al. Biomechanical evaluation of the annual meeting. J Clin Apher. 1999;14:195–9.
How to Make a Good Poster
Presentation
23
Baris Kocaoglu, Paulo Henrique Araujo,
and Carola Francisca van Eck
Fig. 23.1 Example of a conventional, printed poster displayed in an exhibit hall on a poster board
viewer by generating visual interest [2–4, 7, 10, should be disclosed, and contact information for
11, 13, 15, 21, 22]. However, when it comes to the corresponding authors should be provided
presentation of the data, this must be done on a
stand-alone basis and be self-explanatory. This
means that the readers of the poster should be Fact Box 23.2
able to understand the study aim, methods, A poster should be concise enough to attain
results, conclusion, and significance, even when the readers’ attention and also complete
the presenting author is not there to explain any- enough to allow interpretation without ver-
thing. In addition, the figures must have clear fig- bal presentation.
ure legends and labeling where appropriate to
facilitate this.
Generally, there should be a title portion, fol- [14]. Detailed instructions on how to prepare
lowed by objectives, methods, results, and dis- these sections are discussed below.
cussion/conclusion section. Tables and/or figures The title should be concise and attract the
can be used to allow for easier/more interesting attention of people passing by. Oftentimes titles
presentation of the data. Conflicts of interest are too long. Phrasing the title as a strong state-
23 How to Make a Good Poster Presentation 221
Fig. 23.2 Example of an e-poster. An e-poster is essentially a slide presentation in which the slides advance automati-
cally available on computers distributed across the meeting
ment or a question is generally better to spark the Similar to the methods, the results are often
interest of the readers. All authors with their cre- better presented in table or figure format to
dentials and the affiliated institution(s) should catch and keep the reader’s attention. Unlike in
follow the title. If there are any conflicts of a manuscript, only the most important findings
interest to disclose, this should also be done. The of the study should be described keeping this
person in charge of making the poster should section short and concise (Fig. 23.4). The dis-
check the guidelines pertaining to how to dis- cussion/conclusion section should state the
close potential conflicts of interest from the spe- summary of the study. A brief discussion fol-
cific meeting/organization. lows. The focus of the discussion should be on
The first text box should discuss the objective/ clinical relevance of the work presented, limita-
hypothesis of the study. A short background may tions, and future implications. Similar to the
be provided if relevant for understanding the goal introduction section, the use of references is
of the study. However, care must be taken to generally considered optional, and references
avoid making the poster to wordy and lose the should be used sparingly to decrease unneces-
attention of the reader. References are considered sary wordiness of the poster distraction from the
optional but again can be used sparingly if this is message of the research.
felt to be fundamental in understanding the ratio-
nale of the study. The methods section should be
brief but present enough detail to understand the 23.3 Technical Aspects
study design, nature, population, data collection,
and statistical analysis. If tables or figures can be Perhaps equally important to the content of the
used instead of text, this should be strongly con- poster is how well this content is presented [2–4,
sidered (Fig. 23.3). 7, 10, 11, 13, 15, 21, 22]. To start, the preparer of
222 B. Kocaoglu et al.
Fig. 23.3 With regard to the methods section of the the example on the right, the same information is con-
poster, use tables or figures instead of text where possible. veyed using text only. Text is less appealing and harder to
The example on the left shows how figures are used to interpret
gain the readers’ interest as well as to reduce the text. In
the poster should check the specific meeting to your institution. This can include the organiza-
find out the dimension of the poster, which are tion logo, picture, or slogan (Fig. 23.5).
allowed and recommended. Most posters are cre- Conversely, this may benefit you as much as it
ated using PowerPoint [Microsoft, Redmond, benefits your institution, as the reputation of your
WA, USA] or something comparable. If you are institution alone may attract viewers to your
part of a larger academic institution, hospital sys- poster. If no such template exists and you are the
tem, or research group, it may be worth checking first person making it, try to pick a calm back-
if poster templates are in existence of your insti- ground color, perhaps matching your institution’s
tution, which you can then utilize. logo colors, combined which a text color which
stands out from the background [4]. For example,
avoid yellow text on a white background. Refrain
Fact Box 23.3 from using colors, which may be unconsciously
Optimal visual poster presentation perceived to be offensive such as red text. There
includes a calm background color and a is a fine line between attracting attention and the
neutral font which stands out from the poster being a visual overload. The size and font
background and is large enough to read of the text are also very important. It is best to use
from a distance. a neutral font that is easy to read from a distance,
such as Arial or Sans Serif. Generally, the size of
the letter is the largest for the title and section
headings (at least >62), medium sized for the text
The optimal format to present and promote (at least >44), and the smallest for the references
your organization is to have a unanimous format and corresponding authors’ contact information
that is easily recognized by others as belonging to (>36) (Fig. 23.5) [2–4, 7, 10, 11, 13, 15, 21, 22].
Fig. 23.5 The official template for ESSKA (European have a unanimous format, which includes your logo. Use
Society Sports Traumatology, Knee Surgery and a calm background color and a neutral font with a color
Arthroscopy) congress. The optimal way to present and that stands out from the background
promote yourself, your poster, and your organization is to
224 B. Kocaoglu et al.
Lastly, if your poster is a printed poster, it will usually based on the overall presentation, includ-
need to be printed ahead of the conference. ing the abstract, the poster itself, and the atten-
Several companies are available online which tiveness and presence of the presenting author.
perform scientific poster printing. You will need The latter includes proper attire following the
to upload your presentation file to their website, established dress code for the meeting. If unsure
and generally you can select if you would like to about the dress code, ask someone who has been
see proofs before printing. This may be worth- to the meeting before or contact the meeting
while when the poster contains pictures or graphs directly. It is always better to air on the side of
to ensure the resolution is high enough to provide being overdressed. Remember, you are represent-
a good-quality poster at the size that it needs ing yourself, your research, and your institution.
printed on. Take into account that receiving the
printed poster may take several weeks [3]. Allow
extra time to reprint the poster if upon its arrival Fact Box 23.4
misprints or mistakes are identified and a new Proper dress code is important when pre-
copy needs to be printed. Many large meetings senting a poster as it reflects on youself,
now offer poster-printing services, which will your research, and your institution.
have your poster ready for you at the meeting
site. The major benefit is that this avoids the bur-
den of traveling with several posters. However,
the major limitation is that if you arrive to find
there is something wrong with your poster, there Meeting guidelines should be checked as
will likely not be sufficient time for revisions. poster presenters can be disqualified for
These pros and cons will have to be considered. submission during future meeting if rules
are not followed.
• The person in charge of making the poster rates of podium versus poster presentations at the
Arthroscopy Association of North America Meetings
should check the guidelines for the meeting 2008-2012. Arthroscopy. 2017;33:6–11.
the poster is being presented. 9. Gundogan B, Koshy K, Kurar L, Whitehurst K. How
• Although the content of the poster is impor- to make an academic poster. Ann Med Surg (Lond).
tant, so is the quality of the visual 2016;11:69–71.
10. Hamilton CW. At a glance: a stepwise approach to suc-
presentation. cessful poster presentations. Chest. 2008;134:457–9.
• Choose previously used templates from your 11. Hand H. Reflections on preparing a poster for an RCN
institution to ensure uniformity and easy iden- conference. Nurse Res. 2010;17:52–9.
tification of presentations related to the 12. Kleine-Konig MT, Schulte TL, Gosheger G, Rodl
R, Schiedel FM. Publication rate of abstracts pre-
institution. sented at European Paediatric Orthopaedic Society
• Ensure that the presenter of the poster is avail- Annual Meetings, 2006 to 2008. J Pediatr Orthop.
able to stand with the poster during the man- 2014;34:e33–8.
datory time slots. 13. Lourie RJ. Preparing a poster presentation. Nurse
Educ. 1989;14:10, 18, 23.
• Reviewing the poster with colleagues prior to 14. Matsen FA 3rd, Jette JL, Neradilek MB. Demographics
the scientific meeting may help generate com- of disclosure of conflicts of interest at the 2011 annual
ments and improve the poster (presentation). meeting of the American Academy of Orthopaedic
• The presenter should be dressed in proper Surgeons. J Bone Joint Surg Am. 2013;95:e29.
15. Miller JE. Preparing and presenting effective research
attire. posters. Health Serv Res. 2007;42:311–28.
• It is always better to air on the side of being 16. Naziri Q, Mixa PJ, Murray DP, Grieco PW, Illical
overdressed. EM, Maheshwari AV, Khanuja HS. Adult reconstruc-
• You are representing yourself, your research, tion studies presented at AAOS and AAHKS 2011–
2015 Annual Meetings. Is there a difference in future
and your institution [7]. publication? J Arthroplasty. 2018;33(5):1594–7.
17. Ohtori S, Orita S, Eguchi Y, Aoki Y, Suzuki M,
Kubota G, Inage K, Shiga Y, Abe K, Kinoshita H,
References Inoue M, Kanamoto H, Norimoto M, Umimura T,
Furuya T, Masao K, Maki S, Akazawa T, Takahashi
K. Oral presentations have a significantly higher
1. Abicht BP, Donnenwerth MP, Borkosky SL, Plovanich
publication rate, but not impact factors, than poster
EJ, Roukis TS. Publication rates of poster presenta-
presentations at the International Society for Study
tions at the American College of Foot and Ankle
of Lumbar Spine meeting: review of 1126 abstracts
Surgeons annual scientific conference between 1999
from 2010 to 2012 meetings. Spine (Phila Pa 1976).
and 2008. J Foot Ankle Surg. 2012;51:45–9.
2018;5:1347–54.
2. Beal JA. Preparing for a poster session—some practi-
18. Preston CF, Bhandari M, Fulkerson E, Ginat D, Koval
cal suggestions. Mass Nurse. 1986;56:5.
KJ, Egol KA. Podium versus poster publication rates
3. Boullata JI, Mancuso CE. A “how-to” guide in pre-
at the Orthopaedic Trauma Association. Clin Orthop
paring abstracts and poster presentations. Nutr Clin
Relat Res. 2005;(437):260–4.
Pract. 2007;22:641–6.
19. Schulte TL, Trost M, Osada N, Huck K, Lange T,
4. Briscoe MH. Preparing scientific illustrations: a guide
Gosheger G, Holl S, Bullmann V. Publication rate
to better posters, presentations, and publications.
of abstracts presented at the Annual Congress of the
New York: Springer; 1996.
German Society of Orthopaedics and Trauma Surgery.
5. Daruwalla ZJ, Huq SS, Wong KL, Nee PY, Murphy
Arch Orthop Trauma Surg. 2012;132:271–80.
DP. “Publish or perish”-presentations at annual
20. Voleti PB, Donegan DJ, Baldwin KD, Lee GC. Level
national orthopaedic meetings and their correlation
of evidence of presentations at American Academy of
with subsequent publication. J Orthop Surg Res.
Orthopaedic Surgeons annual meetings. J Bone Joint
2015;10:58.
Surg Am. 2012;94:e50.
6. Donegan DJ, Kim TW, Lee GC. Publication rates of
21. White A, White L. Preparing a poster. Acupunct Med.
presentations at an annual meeting of the american
2003;21:23–7.
academy of orthopaedic surgeons. Clin Orthop Relat
22. Wipke-Tevis DD, Williams DA. Preparing and present-
Res. 2010;468:1428–35.
ing a research poster. J Vasc Nurs. 2002;20:138–42.
7. Erren TC, Bourne PE. Ten simple rules for a good
23. Zelle BA, Zlowodzki M, Bhandari M. Discrepancies
poster presentation. PLoS Comput Biol. 2007;3:e102.
between proceedings abstracts and posters at
8. Frank RM, Cvetanovich GL, Collins MJ, Arns TA,
a scientific meeting. Clin Orthop Relat Res.
Black A, Verma NN, Cole BJ, Forsythe B. Publication
2005;(435):245–9.
How to Prepare a Paper
Presentation?
24
Timothy Lording and Jacques Menetrey
24.1 Abstract Submission the abstract may be edited and repurposed for
your submission. Use the word limit as much as
The first step in presenting your research at a possible, as it can be difficult to distil complex
meeting is submitting an abstract. The call for work into only a few hundred words. For each
abstracts often occurs quite early, especially for section of your abstract, determine the crucial
larger meetings, and can close a considerable points you want to convey. From this core infor-
time before the conference is to take place. For mation, expand your text to the word limit in a
example, the call for abstracts for the 2019 comprehensive manner. The submission process
ISAKOS congress closes in September 2018, a is competitive, and if English is not your first lan-
full 9 months before the meeting takes place. guage, it may be worth asking a colleague to
Submission is usually online. Required infor- review your abstract prior to submission, to make
mation will include the author’s details and affili- sure your message is clear.
ations, the title of your project, the abstract itself,
as well as any financial disclosures.
The word limit for the abstract text can vary 24.2 Presentation Structure
widely, from as few as 300 to as many as 800 and Preparation of Slides
words. Some meetings will dictate abstract sub-
headings, but most are variations of the standard Most free paper presentations are 6 min in length.
scientific structure: background, aims, methods, Careful preparation is important, to ensure that that
results, and conclusions. You may have already the premise, findings, and relevance of your work
prepared a manuscript for your research, and if so are successfully conveyed in this short timeframe.
Preparing your talk and preparing your slides
T. Lording (*) go hand in hand and for simplicity are considered
Melbourne Orthopaedic Group, here together. A paper examining the importance
Windsor, VIC, Australia of tibial bony and meniscal slopes in anterior cru-
The Alfred Hospital, Melbourne, VIC, Australia ciate ligament injury is used as an example [1].
e-mail: tlording@iinet.net.au Similar to your abstract, most paper presenta-
J. Menetrey tions will follow a standard structure, including
Centre de Médecine du Sport et de l’Exercice, an introduction, methods, results, discussion, and
Hirslanden Clinique la Colline, Geneva, Switzerland conclusion. Due to the time constraints inher-
Service de Chirurgie Orthopédique et Traumatologie ent in a standard 6-min conference presenta-
de l’Appareil Moteur, University Hospital of Geneva, tion, it is important to convey the most important
Geneva, Switzerland
information clearly and concisely. Again, if you Most meetings mandate the second slide to be
have already prepared a manuscript for your a disclosure slide, outlining any potential con-
work, this will simplify the process of preparing flicts of interest in the work.
your talk. Usually, we would estimate three slides
per minute to be an appropriate pace for a scien-
tific presentation. 24.2.2 Introduction
Many institutions have slide templates avail-
able for presentations. If not, it is important to The introduction is critical in setting the context
choose a template, colour scheme, and font that of your research. In two or three slides, or roughly
is easy for your audience to read. As a general 90 s, you need to convey to the audience the
rule, try to keep slides uncluttered, with a few background to and purpose of your work, as well
main points per slide and clear, illustrative dia- as your aim and hypothesis. More than any other
grams. Many presenters use slides with huge section of your presentation, this should be tai-
amounts of text or statistical information, often lored to your audience. For example, the audi-
accompanied by an apology for using such a ence at a general meeting may require more
“busy slide”. The message is clearer if just the context to understand a paper about tibial slope in
relevant information is presented. Try not to anterior cruciate ligament (ACL) reconstruction
have more than eight lines of text on any slide than would be required at an ACL-specific sub-
and avoid small font sizes. speciality meeting. Briefly, the current state of
the science and the context of your research
should be presented.
24.2.1 Title Slide The scientific method involves generating a
hypothesis and working to prove or disprove this
The title slide should include, at a minimum, hypothesis. It is important to clearly state the aim
the title of your work and the full authors’ of your work as well as the hypothesis at the end
names. Most would include the details of the of your introduction.
conference, as well as the institutions from
which the work arose (Fig. 24.1). Remember to
acknowledge your co-authors when you intro- 24.2.3 Methods
duce your work, and it is courteous to thank the
conference organisation for the opportunity to In this section, you should outline the materials
present. and methods of your project. Due to the inherent
time constraints, this section must necessarily be clarity to your slides and to avoid detailed expla-
a summary of rather than a fully detailed descrip- nations, do not hesitate to add landmarks if you
tion of the methods used. At the same time, impor- present anatomical or histological pictures. It is
tant points, such as demographic differences reasonable to offer to discuss your results further
between treatment groups that might influence with interested parties after the presentation.
your results, should be highlighted and not glossed
over. Photographs of testing rigs for biomechani-
cal studies, or illustrations of radiological mea- 24.2.5 Discussion
surements, are particularly helpful in aiding the
audience to understand your work (Fig. 24.2). The discussion section gives the greatest licence
Finally, any illustrations may help you to present to the presenter to get the message of their work
the methods used in a succinct manner. across. How does your work fit in with the find-
ings of others? What do you think the main impli-
cations of your work are? Does your work help to
24.2.4 Results explain the questions raised by other authors?
Does your work raise new and interesting ques-
The results section is often the shortest in a paper tions? These are the themes that may be explored
presentation. Frequently two or three slides is all in the time available and will vary from paper to
that is required to present the main findings. paper. Briefly, your work should be compared to
Graphical representation is often helpful for your the current body of knowledge, using previous
audience and more intuitive to understand than works, comparable investigations, followed by
tables of numbers. Consider the data carefully its originality, novelty, and clinical relevance.
when choosing the most appropriate illustrative Your findings should be critically addressed point
style. For the tibial slope paper, a bar graph high- by point and hypothetical explanations explored.
lights the differences between groups well One trick is to write on post-it notes the different
(Fig. 24.3). Don’t present busy tables with doz- findings you may have to discuss and stick them
ens of numbers. Focus on the most important and on your computer or a board. It gives you a good
pertinent findings of your results and present it in overview and allows you to properly organise
a way that leads on to your discussion. To add your discussion.
Fig. 24.2 Methods slide demonstrating measurement technique in the tibial slope study. Includes material from
Elmansori et al. [1] (Reproduced with permission from Knee Surgery, Sports Traumatology and Arthroscopy)
230 T. Lording and J. Menetrey
0
LTS MTS LMS MMS
Medial
Lateral Plateau
Plateau
For example, in the tibial slope presentation, one slide examined the emerging evidence link-
one slide was dedicated to the results of similar ing lateral meniscal injury to rotational instabil-
studies in the literature and highlighted the appar- ity, and one slide discussed the potential role for
ent importance of the lateral tibial slope, one and current evidence for osteotomy to modify
slide discussed a postulated mechanism by which tibial slope in the ACL-injured knee (Figs. 24.4,
lateral compartment slope may cause ACL injury, 24.5, 24.6, and 24.7).
24 How to Prepare a Paper Presentation? 231
Fig. 24.8 Limitations
slide
Limitations
• Recumbent scanning
• Effect on meniscal slope
• Control group with PFJ pain
• May have different slope
characteristics to true
asymptomatic population
The discussion should conclude with a slide should be introduced at this stage, and ensure
and brief mention of the limitations of your study your conclusions are supported by your results
and how these might influence your results (Fig. 24.9). Do not overstate findings you were
(Fig. 24.8). expecting but you have not found. You have to
find the right balance between advertising your
results without overdoing it.
24.2.6 Conclusions In a basic science presentation, a slide on
which you report on future and ongoing works
Clearly state the conclusions of your work in at may prevent questions with respect to further
most two slides. No new information or ideas development of your research.
232 T. Lording and J. Menetrey
Fig. 24.9 Conclusion
slide with three clear
points Conclusion
24.4 On the Day self with how to advance slides. If you need a laser
of the Presentation pointer, it is a good idea to bring one as one is not
always provided. Adjust the microphone so it is
24.4.1 Dress close enough to pick up your voice but not so close
that your speech is muffled. Inexperienced pre-
It is important to dress appropriately for your pre- senters tend to rush and speak quietly, so be con-
sentation. In most circumstances, this would scious of speaking up, slowly and clearly.
mean suit and tie for males and equivalent for Similarly, when answering questions take a
females. Some meetings, such as smaller, sub- moment to consider before responding.
speciality meetings or meetings held at summer
or winter resort locations, may have a more Take-Home Message
relaxed dress code. If in doubt, always err towards • Presenting a conference paper is an excellent
dressing conservatively. way to share your research and ideas and to
generate discussion with your colleagues.
• Presenting is intimidating, particularly for
24.4.2 Audio-Visual inexperienced authors and at large meetings,
but experience is the best teacher and it gets
You will need to find the speaker’s room or audio- easier with time.
visual personnel to provide and upload your • Preparation, rehearsal, and anticipation are
slides. In smaller meetings this may be as simple essential to get your message across effectively
as going to the desk at the back of the room at the and to portray your work in the best light.
beginning of the session. Larger meetings often
have a central speaker’s room you need to locate.
This is not always easy, especially at large meet- References
ings with many concurrent sessions, so make
sure to arrive early to leave yourself ample time 1. Elmansori A, Lording T, Dumas R, Elmajri K, Neyret
P, Lustig S. Proximal tibial bony and meniscal slopes
for this task. are higher in ACL injured subjects than controls: a
Once your slides have been uploaded, review comparative MRI study. Knee Surg Sports Traumatol
them to ensure that you have uploaded the correct Arthrosc. 2017;25:1598–605.
version and that no formatting or layout errors 2. Lording T, Corbo G, Bryant D, Burkhart TA, Getgood
A. Rotational laxity control by the anterolateral liga-
have occurred. Always check that your videos are ment and the lateral meniscus is dependent on knee
properly displayed on the organisational support flexion angle: a cadaveric biomechanical study. Clin
and find out if they are automatically or manually Orthop Relat Res. 2017;90:1922–8.
played. Be prepared to comment on the content 3. Shybut TB, Vega CE, Haddad J, Alexander JW,
Gold JE, Noble PC, Lowe WR. Effect of lateral
of a video if there are playback issues at the time meniscal root tear on the stability of the anterior
of the presentation, which is not infrequent. cruciate ligament-deficient knee. Am J Sports Med.
2015;43:905–11.
4. Simon RA, Everhart JS, Nagaraja HN, Chaudhari
AM. A case-control study of anterior cruciate liga-
24.4.3 Delivery ment volume, tibial plateau slopes and intercondylar
notch dimensions in ACL-injured knees. J Biomech.
Take a moment to stand at the podium before the 2010;43:1702–7.
start of your session to get your bearings. As a rule, 5. Sonnery-Cottet B, Mogos S, Thaunat M, Archbold
P, Fayard JM, Freychet B, Clechet J, Chambat
on the podium you will have a view of your slides, P. Proximal tibial anterior closing wedge osteotomy
as well as a clock or countdown for the time limit. in repeat revision of anterior cruciate ligament recon-
Usually you will not be able to use a presenter’s struction. Am J Sports Med. 2014;42:1873–80.
view showing the next slide and prompts, so if you 6. Stijak L, Herzog RF, Schai P. Is there an influ-
ence of the tibial slope of the lateral condyle on the
feel written prompts will aid your presentation ACL lesion? Knee Surg Sports Traumatol Arthrosc.
then prepare some small cards. Familiarise your- 2007;16:112–7.
How to Write a Clinical Paper
25
Brendan Coleman
25.1 Why Publish? reach the final difficult stages. Don’t be afraid of
controversial topics as they can produce some of
There are many reasons why surgeons want to the most interesting papers [7]. Once you have
publish their research. For some, it is a require- chosen your topic, you need to formulate your
ment of their position, or they wish to enhance research question and the hypothesis. The
their career and gain promotion. For others, there hypothesis is what you think the answer to your
is a moral obligation to the participants of the question will be. It does not matter if this is right
study to disseminate the knowledge gained and or wrong, as the study is going to determine the
improve the knowledge of the orthopaedic com- answer.
munity. There is an expectation for surgical train- Once a good research project has been identi-
ees to undertake research. Publishing research fied, a thorough literature review should be
can be helpful to learn the structure of scientific undertaken. It is important to read the classic
papers and enable surgeons to critique and iden- papers on the topic, but knowledge moves rap-
tify the main points of other research. Although idly, and the relevant articles published over the
the prospect of writing a manuscript for publica- last few years should also be reviewed. You need
tion can be daunting, the satisfaction of seeing to determine if the research question has been
your research published is worth the effort. investigated previously and what you are adding
to the knowledge base on the topic.
It is helpful to seek the guidance of a col-
25.2 Before You Start league who has published scientific research in
the early stages of planning your research. This
The best approach to research is to choose a topic will help in avoiding the common mistakes that
that you are passionate about and that has rele- are made in undertaking and publishing scientific
vance to your clinical practice. Writing a manu- research. Guidance could include identifying the
script is hard, and it is the last 10% of effort that right journal to submit your manuscript to. Once
makes the difference in producing a publishable you have identified the journal, it is worthwhile
paper. If the topic does not interest you, it is easy reading the journal to understand the style of
to let the manuscript preparation slide when you research and papers published [7].
B. Coleman (*)
Department of Orthopaedic Surgery, Middlemore
Hospital, Auckland, New Zealand
e-mail: Brendan.Coleman@middlemore.co.nz
challenging element of the study, and it can be should sound like they were hastily translated
easy not to complete the manuscript and pub- from Icelandic by a non-native speaker” [8].
lish the results. Revising is a difficult skill, but the success of
The goal of the manuscript is to present the your paper is dependent on your ability to revise
importance of your research and your findings. and edit. Experienced researchers make three
Writing needs to be precise and compact, utiliz- times as many changes to the paper as novice
ing short, simple sentences whilst avoiding com- writers [5]. One revision strategy is to learn your
plex scientific jargon [3]. You should be explicit common errors and do a targeted search for them
and clear in describing the benefit of your paper. [2]. All writers have idiosyncrasies, and using the
The title of the paper should convey the impact search function in your writing application may
of the results in a succinct manner. It should eliminate issues and improve the writing style
avoid reporting the results within the title. [2]. Performing numerous drafts is typical when
Randomized controlled trials, meta-analyses and preparing a manuscript. Feedback from experi-
systematic reviews require this to be in the title. enced researchers both within and outside your
“Five-year follow-up of unicompartmental knee specialty field is crucial to improving your paper.
replacement” is not very informative compared Once feedback has been received, you can decide
to “Increased risk of early revision in unicom- to adjust your paper as necessary.
partmental knee replacement”.
Writing a scientific manuscript is not like
writing a novel. It requires a standardized manner 25.5 Abstract
of construction—introduction, methods, results
and discussion (IMRAD) [3]. Although the The abstract needs to be a stand-alone record of
majority of scientific manuscripts are written in your research to entice potential readers into
English, many readers will not have English as reading the full paper. With the prevalence of
their primary language [3]. It is imperative that online publication databases, the abstract is more
the manuscript is easy to read and well organized important than ever as a means of selling your
and the language is simple and clear. paper. For many readers, the paper does not exist
beyond the abstract [1]. Most journals require
abstracts to conform to a formal structure and
Fact Box 25.2 with a defined word count, typically 250 words.
IMRAD method for writing manuscripts— A structured abstract usually has sections on
introduction, methods, results and background, methods, results and conclusions.
discussion The background section gives a very brief out-
line of why the subject is important and why the
reader should care about the problems and the
Writing the manuscript is hard and requires a results. This section is typically two to three
rigid writing schedule to be put in place to ensure sentences.
progress and completion. Follow the advice of The methods section should provide the reader
George M. Whitesides, and start with a blank with information to understand what was done.
piece of paper, and write down, in any order, all This should include study design, study popula-
the important ideas that occur to you concerning tion, treatment groups and a brief description of
the paper [10]. Important elements may include the treatment if needed and the primary outcome
the research topic and why is it important, the measure. The method section does not have to be
hypothesis, the major finding and other important in great detail.
findings. The results section is the key element of the
When you create your first draft, let your ideas abstract, because readers of the abstract want to
flow leaving any revision and editing for future learn the findings of the study. The results section
drafts. As Paul Silvia advises, “your first drafts should contain as much detail as the word count
238 B. Coleman
Table 25.1 Author checklist author should outline the research question or
Introduction hypothesis [9]. This needs to be explicit and clear
Are the objectives clear? in describing the hypothesis and where the poten-
Is the importance of the study adequately emphasized? tial benefit of the paper is to the reader. Although
Is the subject matter of the study new?
a relatively brief section, the introduction is vital
Is previous work on the subject adequately cited?
Materials and methods
to set the scene for the paper. Stay focused and
Is the study population detailed adequately? concise—you do not want to answer questions of
Are the methods described well enough to reproduce controversy in the introduction; leave that for the
the experiment? discussion.
Is the study design clear?
Are statistical methods included?
Are ethical considerations provided?
Results
25.7 Methods
Can the reader assess the results based on the data
provided? The methods section of the paper should be the
Is the information straightforward and not confusing? easiest part of the manuscript to write as it records
Are there adequate controls? what you have done during the study. The details
Are statistical methods appropriate? of the study should have all been known prior to
Discussion
the commencement of the study. This section
Have you commented adequately on all their results?
Have you explained why and how their study differs
should include method of analysis and statistics
from others already published? but should not include the results. The methods
Have you discussed the potential problems and section should be like a cookbook providing step-
limitations with their study? by-step instructions that can be repeated by oth-
Are the conclusions supported by the results? ers [7]. It requires enough accuracy and clarity to
enable another researcher to repeat the study.
permits. It should include results of statistical Previously published research methods should be
analysis and secondary measures. referenced if relevant [6].
The conclusion section should describe the Unless you have expertise in statistical analy-
most important take-home message of the study. sis, advice should be sought from a statistician
Other findings of importance can also be prior to commencing the study [3]. The most
described. It is important that the conclusions common error in research is to recruit too few
match the findings of the paper and that the find- patients leading to a beta error where no differ-
ings are not overstated. ence may be demonstrated [7]. It is critical to per-
Further information about writing of an form a power calculation prior to starting the
abstract is contained in Chap. 22 (Table 25.1). study, ensuring you recruit adequate participants
to demonstrate a difference in the treatment
groups. Once the number of participants has been
25.6 Introduction determined, an additional number of patients
should be recruited to allow for loss to follow
The aim of the introduction is to present your (transfer bias). The arbitrary minimum accept-
research question and the importance of your able transfer bias determined by high-powered
research to the current literature. It should begin journals is 20% at 2 years [7].
with a brief thought-provoking review of previ- Statistical methods are a necessary subhead-
ous research on the subject, demonstrating why ing within the methods section. You need to
the topic is important, interesting or problematic explain the statistical analysis used and informa-
in some way. It should create a niche for your tion regarding the power calculation.
study by indicating a gap in the current knowl- The study population needs to be clearly
edge or how you are extending the knowledge on defined for the reader. It should be defined by
the subject. Once you have created the niche, the providing clear inclusion and exclusion criteria.
25 How to Write a Clinical Paper 239
The definition should include the time frame dur- outcomes follow, presenting early then late out-
ing which the study was undertaken. This should comes and complications.
begin with the total number of patients eligible In writing the results, the presentation of data
for the study followed by the number who met should be mixed with text, tables and figures.
the inclusion/exclusion criteria. The number of “A picture is worth a thousand words” also applies
patients followed up and the period of follow-up to scientific papers, and a well-constructed figure
need to be described. Ideally, a maximum of 20% or table is an excellent method to present mul-
of patients should be lost to follow-up to make tiple data points in a simple and effective manner.
the data robust [7]. A table or figure should be a stand-alone repre-
If there are diagnostic criteria for the inclusion sentation of data. Although a succinct summary
in the study, these should be defined, e.g. patients of the meaningful data presented in figures or
were identified with rheumatoid arthritis, meeting tables should be included but only to highlight
the diagnostic criteria established by the American the important points rather than repeating the
College of Rheumatology. Four of these seven cri- data already presented.
teria must be met for a patient to be classified as During research, often excessive amounts of
having rheumatoid arthritis: (1) morning stiffness, data are collected. It is vital that the author is
(2) arthritis of three or more joint areas, (3) arthri- selective in the data that is presented. It must
tis of hand joints, (4) symmetric arthritis, (5) rheu- show the major findings of study but should dis-
matoid nodules, (6) serum rheumatoid factor and card any excessive data that does not add to the
(7) radiographic changes. quality of the paper. Excessive data can be con-
Common problems with the method section fusing and distracting for the reader, taking away
include too long, too vague and not descriptive from the important findings. Do not mistake this
enough. Fluency of the text can be aided through for falsification or manipulation of data which is
consistency in the point of view in which it is unethical [6]. Rather you do not wish to present
written: first person “we” or passive voice. If results that will not add to the current knowledge,
writing in the first person, it can be easy to guide clinical practice or are inconsequential.
become repetitive in beginning sentences with Avoid repetition in reporting results.
“we did…” [13]. When revising the draft, ensure The perception of complications is different
that there is variation in the beginnings of the amongst surgeons, in part due to the impression
sentences [13]. In writing the paper, the methods that the treatment has failed but also due to the
are often presented directly from the research legal consequences in some jurisdictions. Good
protocol. This results in the methods being pre- clinical practice means that complications or
sented in the future tense yet the paper is being adverse events during a study should be reported.
written on completion of the study and needs to This enables information upon which shared
be written in the past tense. decision-making can occur between patient and
Approval from the appropriate institutional surgeon. Reporting of complications can be a
review board or ethical committee is an impor- vital element in altering techniques or implant
tant check on the validity of the research design. choices. It is imperative that complications are
In the methods section, ethical board approval reported in an unbiased manner.
should be documented. In accordance with the good clinical practice
guidelines of the International Conference on
Harmonization, a serious adverse event is clearly
25.8 Results defined as any untoward medical occurrence that:
• Results in persistent or significant disability/ erature but should compare and contrast your
incapacity findings with those of other published studies,
• Necessitates medical or surgical intervention explaining any differences to previous research
to prevent permanent impairment to a body [4]. An explanation of the meaning and impor-
structure or a body function tance of the findings of your study should be
• Leads to foetal distress, foetal death or con- presented. Consideration should also be given
genital abnormality or birth defect to alternative explanations for the results, par-
ticularly in relation to any unexpected or differ-
Complications can be classified into treatment ent findings. Although the importance of the
related or patient related. Treatment related can major findings of the study may be clear to you,
be due to surgical technique or a device or treat- it may not be obvious to the reader [6].
ment. Patient related can be due to a local tissue Providing context of your findings in the wider
condition or to the overall patient condition. knowledge of the subject gives direction to
A statistically significant difference between future research.
treatment groups shows your study was powered The third part of the discussion is to address
sufficiently to determine a difference. However a the strengths and weaknesses of your study. It
statistical difference does not necessarily corre- is especially important to address the weak-
late to a clinical difference. For example, a statis- nesses of your study as the reviewer and read-
tically significant difference of three points on ers will be aware of the limitations. You should
the Constant Shoulder score does not equate to a provide an honest critique of your paper’s limi-
clinically relevant difference to the patients. tations, addressing the potential impact this
Overlapping confidence intervals between groups has on your study’s results and the relevance of
indicate there is no clinically significant the findings. Addressing potential issues with
difference. your paper will save you from comments from
reviewers and leading to a negative impression
of your paper by the reviewer or editor. It pres-
25.9 Discussion ents you as a thoughtful scientist. Where weak-
nesses exist, provide solutions or alternative
The purpose of the discussion is to place the find- explanations if possible. Discussing the limita-
ings of your research into context with the wider tions can create interest and pose further ques-
knowledge of the subject. Due to the differing tions pointing to opportunities for future
nature of studies, the discussion section will vary research.
in its length and composition although the gen- In discussing the limitations of your study, it is
eral structure remains similar. The structure of important to recognize any bias within the study.
the discussion should start by recording the major Bias occurs in all scientific papers to some
findings of the study. The major findings should degree, and it is difficult to eliminate all bias.
include the results of the hypothesis outlined in Selection bias occurs when the treatment groups
the introduction. The findings should be the mes- are dissimilar. Prospective randomization mini-
sage that you want the readers to take away from mizes selection bias as do strict inclusion and
the paper—do not repeat every finding of the exclusion criteria. Performance bias occurs due
results section. to the person performing the study or treatment.
The second part presents a summary of the An example is a single surgeon study introduces
current literature. This mirrors the introduction performance bias by the subtleties of their tech-
but should not be repetitive. In the discussion nique and experience that may not be reproduc-
section, the current literature expands on the ible to other surgeons. It is impossible to eliminate
major findings of the current study and helps performance bias, as someone has to perform the
place it within the wider body of knowledge on research. Recording bias occurs if the data col-
the topic. It is not a complete review of the lit- lector does not exhibit objectivity. This can be
25 How to Write a Clinical Paper 241
clear, focused and organized to allow the reader 3. Earnshaw JJ. How to write a clinical paper for publi-
cation. Surgery. 2012;30(9):437–41.
to gather the key points of your research that are 4. Faber J. Writing scientific manuscripts: most common
relevant to the reader. The IMRAD style provides mistakes. Dental Press J Orthod. 2017;22(5):113–7.
a format for the writer to follow. Ensure that dis- 5. Faigley L, Witte SP. Analyzing revision. Coll Compos
cussion around the topic is relevant to the hypoth- Commun. 1981;32(4):400–14.
6. Kallestinova ED. How to write your first research
esis and findings of your study. Be sure that the paper. Yale J Biol Med. 2011;84(3):181–90.
conclusions of your paper match the results of 7. Karlson J, Marx RG, Nakamura N, Bhandari M, edi-
your study. Stay disciplined, work hard and you tors. A practical guide to research: design, execution
will be able to write a winning paper. and publication. Art Ther. 2011;27(Suppl 2):1–112.
8. Silvia PJ. How to write a lot. Washington, DC:
American Psychological Association; 2007.
9. Swales JM, Feak CB. Academic writing for graduate
students. 2nd ed. Ann Arbor: University of Michigan
References Press; 2004.
10. Whitesides GM. Writing a paper. Adv Mater.
1. Andrade C. How to write a good abstract for a sci- 2004;16(15):1375–7.
entific paper or conference presentation. Indian J 11. www.prisma-statement.org.
Psychiatry. 2011;53(2):172–5. 12. www.strobe-statement.org.
2. Belcher WL. Writing your journal article in 12 weeks: 13. Zeiger M. Essentials of writing biomedical research
a guide to academic publishing success. Thousand papers. 2nd ed. San Francisco: McGraw-Hill
Oaks: SAGE Publications; 2009. Companies, Inc; 2000.
How to Write a Book Chapter
26
Thomas R. Pfeiffer and Daniel Guenther
Table 26.1 Important information to consider prior to Identify the leading author of the chapter and the
writing of a book chapter
role of the co-authors. The leading author should
Target • Students guide the group through the writing process.
readership • Residents
• Fellows
Summarize the expectations of your co-authors
• Consultants with appropriate time lines. All authors should be
• Researchers on the same page and should be updated regu-
• Patients larly through the writing process. At this point
Book style • Primer/student teaching textbook
two approaches are feasible:
• Specialist book
• Surgical atlas
Book • Chapter with subheadings to represent a 1. “The chain letter approach”: One author
outline group of themes (e.g., The Anterior writes a first draft of the chapter. This draft is
Cruciate Ligament (ACL)) forwarded to the next author, who adds con-
• Chapter about a special aspect of a
topic (e.g., the double bundle technique tent, revises the draft, and forwards the draft
of ACL reconstruction) to another author. The order of authors can be
adapted to experience and seniority. This
reduces the number of turnarounds and saves
styles in the field of orthopedic surgery and sports valuable time.
medicine are, but not limited to: 2. “The melting pot approach”: The work of
writing is equally divided through the authors.
1. Primer/student teaching textbook: A compen- The leading author should incorporate the dif-
dium of the most important facts and related ferent parts to the chapter and adapt the style
clinical knowledge on the teaching topic is to ensure the reading flow.
requested. The book should focus on the most
important essentials. A short repetition of rel- No matter which approach you choose, the
evant anatomy and physiology is helpful. final version should be double-checked and
2. Specialist book: Specialist books provide in- approved by all authors.
depth information on a certain topic. Based on Talk about authorship early on in the process.
most important current literature, state-of-the- Define who should be the first, second, or senior
art diagnostic tools or treatment methods are author. Sometimes, selecting the order can be
presented. A trend of increasing specialization uncomfortable. However, it will be even more
can be observed over the last decade. While uncomfortable if discussion about authorship
years ago, books with titles like Sports starts after all the work is done.
Medicine were published, nowadays more and Define deadlines for yourself and your co-
more books with titles like The Pediatric authors. Missing deadlines will result in a rush
Anterior Cruciate Ligament are published. prior to submission, which can ultimately lead to
3. Surgical atlas: In the field of clinical orthope- decreased quality of the chapter.
dics, surgical textbooks with step-by-step The author has to keep in mind that the book
instructions are a popular format. High-quality will compete on the market against comparable
images with virtual explanations, pitfalls, tips, titles. Knowledge about what has already been
tricks, and solutions from daily practice are published and what is lacking in current literature
requested. is inevitable. This knowledge can be achieved by
reviewing published articles in journals and text-
It is helpful to know the book’s outline and the books, scientific exhibits at conferences, and
other chapters of the book to prevent providing online versions. A good screening tool is PubMed
the same information twice. (www.pubmed.gov) provided by the US National
As soon as you informed yourself about the Library of Medicine and the National Institutes
frame conditions, contact your co-authors. of Health.
26 How to Write a Book Chapter 245
26.3 H
ow to Fill the First Page(s) 26.4 T
he Writing Process or “How
or “The First Step Is Always to Get the Job Done”
the Hardest”
A good balance between theory and practice is
The beginning of the writing process often poses a key to an interesting and successful book chapter.
challenge and contains sticking points that young Chapters consisting of mostly theoretical topics
writers should be aware of. Frequently, one is should contain implications for daily practice,
tempted to write the first lines of a book chapter tips, tricks, or annotations. In contrast to a
intuitively or based on personal strengths and descriptive review, a book chapter does not nec-
interests. However, this approach involves the risk essarily follow the regular outline of a journal’s
of losing the common thread. It is essential to article. Usually, there is no material and methods
come up with an outline of the book chapter before or results section [2]. Further, extensive statistical
starting the writing process. Structuring should not analysis to support the presented message is not
be limited to subchapters and paragraphs, but the needed. In general, at the end of a chapter, main
structure of each paragraph and the line of argu- findings, conclusions, and consequences for clin-
ments should be drafted in the beginning. Based ical practice should be presented.
on personal preferences, the outline can be elabo- The production of a book may take 2 or more
rated as bullet points, catch phrases, or flowcharts. years, and the reader wants a long-lasting “prod-
It is helpful to have all authors involved in the pro- uct”; therefore, only a selection of the most sig-
cess of tailoring the line of arguments in order to nificant research performed in the past decade
maximize the clinical experience and to include a should be included. In contrast to a scientific
broader spectrum of opinions. All authors will journal article, expert opinions and clinical-based
benefit from brainstorming, and everybody will be practices, as opposed to scientific or evidence-
on the same page. This way, the structure of the based practices, should be discussed. A book
book chapter will be maintained even if sections chapter does not require a full and complete rep-
are written by different authors. If you do not resentation of the existing literature [3].
approach your book chapter as a team, it is useful Keep the terminology consistent to increase
to involve other clinicians to make sure you do not readability. In general, present tense and third
get off track. person are used but can be altered in special
Once the outline of the book chapter is set, needs. There is no need for alteration or use of
each aspect of the outline can be addressed by synonyms for medical terms. A book chapter is
adding the corresponding information in the still a scientific manuscript. Authors should avoid
desired number of sentences. wordy sentences and relative clauses.
246 T. R. Pfeiffer and D. Guenther
Usually, write the conclusion at the very end. sponding author are responsible for keeping track
It should include the most important take-home of all changes and staying organized.
message of the article. Then, write the abstract. Read the final version of the book chapters
Afterward, double-check the title of the chapter. multiple times before finalizing it. If the book
Does it still match the content of the chapter? chapter is not in your first language, a revision by
Sometimes during the writing process, the central a native speaker is highly recommended to ensure
theme of the chapter changes stepwise, and at the a professional and scientific writing.
very end, rewording of some initial parts is
necessary.
Keep track of your references during writing. Fact Box 26.3
Sample reference programs are EndNote 1. Chapters consisting of mostly theoreti-
(Clarivate Analytics, Philadelphia, USA), Papers cal topics should contain implications
(ReadCube, Labtiva Inc., Cambridge, USA), or for daily practice, tips, tricks, or
RefWorks (ProQuest LLC, Ann Arbor, USA). annotations.
Travel libraries can enable the different authors 2. A selection of the most significant
to write on the same reference list. research performed in the past decade
Do not underestimate the impact of well- should be included.
chosen figures. A good text is accentuated by the- 3. Keep the terminology consistent.
matically suitable figures. High-quality figures 4. At the very end, rewording of some ini-
attract the reader’s attention. In addition, they are tial parts is necessary.
a valuable tool to emphasize the take-home mes- 5. Keep track of your references during
sage. Especially for the purpose of presenting writing.
diagnostic and therapeutic algorithms, flowcharts 6. High-quality figures attract the reader’s
and diagrams are useful to transfer theoretical attention.
knowledge to clinical work. Make sure that you
provide figures with sufficient resolution and
appropriate text reference. The related legend
must explain the figure sufficiently. The reader 26.5 Technical Considerations
should not have to return to the main text to find
explanations. All abbreviations used in the figure Important technical points to consider are:
must be defined in the legend.
Clear and consistent naming is needed to keep • Word count
track of drafts. The naming should include year, • Number and requirements of tables, figures,
month, day, running title, name, and version (e.g., and references
2017-11-11-ISAKOSResearchBook-PfeifferV1. • If applicable, online version/videos
docx). Each new draft should be saved. This • Author instructions
enables authors to backtrack the development of
the manuscript. In addition, make sure to save Most book chapters contain approximately
your work regularly during the writing process to 10–15 manuscript pages. A manuscript page con-
prevent losing data. tains about 4500 characters. Editor and publisher
Dropbox (Dropbox Inc., San Francisco, USA) plan total number of pages and assign different
can be used to share your drafts. Another option number of pages to the authors. This needs to be
is to work in a cloud. Google Cloud (Google strictly adhered to. Double spaced, font size 12,
LLC, Mountain View, USA) or iCloud (Apple and Times New Roman or Arial are regularly
Inc., Cupertino, USA) provide options to work requested styles. Do not spend time on excessive
online on one version with multiple authors. formatting as this will be performed by the pub-
However, keep in mind that you as the corre- lisher [4].
26 How to Write a Book Chapter 247
Tables and figures are usually saved as sepa- • Keep in mind that a book should be a long-
rate files. The technical requirements of figures lasting product and competes on the market
differ distinctly between different publishers. against comparable products.
Generally, a resolution greater than 300 dpi for • In general, a reader is attracted by scientifi-
figures is requested. As aforementioned, high- cally written text with consistent terminology
quality figures enhance the quality of a that is supported by well-chosen, meaningful
manuscript. figures.
Before final submission, all technical • A coherent train of thoughts is key to a suc-
requirements should be double-checked. When cessful book chapter.
the editor requests revisions and makes sug- • Chapters consisting of mostly theoretical top-
gestions to improve the draft, all changes ics should contain implications for daily prac-
should be highlighted. Proofreading is the tice, tips, tricks, or annotations.
final step of the writing process. The manu- • Finally, a printed version of the book will
script should be checked for misspellings and reward authors for the sometimes intense, but
proper grammar. Additionally, particular always interesting, work of chapter
attention should be turned to authors’ names preparation.
and institutions.
clinical realm; the key word is “translation”. This sometime less, and sometimes they can be a quad
word should be identified and clearly emphasized chart only. While the actual writing process for
in the research description.) those kinds of proposals may look easy, you may
find out quickly that a shorter proposal is not
• The science is meritorious; the story, however, necessarily easier to write. In fact, the old saying
needs to be flexible. by Blaise Pascal, “if I had more time, I would
have written a shorter letter”, holds true also for
The scientific question that you are asking grant proposals [1].
may not be completely aligned with the question It is hard to suggest an ideal amount of time
that is being asked by the RFP. However, most for a proposal, as this always depends on how
research proposals ask more than one scientific much previous work you have done and how
question, i.e. there is more than one specific aim. much pre-existing language and sections you
If an RFP asks for “injury prevention techniques already may have in your portfolio. If you are
in female football players” and your research writing your first proposal in your specialty, give
proposal is targeting all football players, it is easy yourself a minimum of 2–3 months for a small
to include a specific aim that specifically identi- grant funding mechanism. Larger foundation
fies factors for female soccer players rather than grants and NIH-style grant mechanisms or coop-
the entire population. Data collection may still be erative grant mechanisms asking for 6 or 7$ fig-
carried out on the entire population, but the pro- ure grant support over multiple years require
posal will seek funding for just a subgroup of sometimes several years for a team of writers to
your overall scientific aim and is focused on the prepare a competitive proposal. A good rule of
RFP. This flexibility may help you to broaden thumb is to identify the shortest amount of time
your relevance and impact. you think it will take you to write a complete
first draft and double that number. It should also
• Read the RFP carefully and with attention to be kept in mind that a good and winning pro-
detail. posal requires review by yourself as well as by
peers. This process may take 4–6 weeks. One
Every RFP is written differently and asks for strategy may be to push as hard as possible
different elements, page lengths, addenda and pre- towards a deadline, intentionally miss the dead-
requisites. Some require ethics committee propos- line in the last minute and then spend the next
als to be submitted prior to submission; some funding period fine-tuning and polishing the pro-
require investigators to be tied to a university; some posal for a submission to the next RFP cycle.
require “citizenship” (in the USA); some require There is no shame in not submitting a proposal
certain membership status (foundation grants); and to a first deadline. Resubmission into the next
some do not. It is important to read these descrip- cycle is usually possible, and a well-written pro-
tions carefully prior to embarking with a proposal. posal that has been reviewed by peers and pol-
Writing a successful grant proposal is hard work, ished is a win-win for everybody even if it gets
and the last thing you want is to find out in the 11th rejected. It can serve as a starting point for a new
hour that you are not eligible. proposal or may be adapted into a paper or pre-
sentation. An unfinished proposal is a waste of
• Look at the deadlines and be realistic, and time, as it will not likely be funded or serve
create a master timetable. future proposals.
To track your timelines, it is advisable to cre-
Generally, smaller grant funding mechanisms ate a master timetable with soft deadlines for cer-
will ask for short, brief proposals, 1–2 pages, tain sections.
27 How to Write a Winning Clinical Research Proposal? 251
A few bullet points about this topic are as the topic or your proposal, that they did not like
follows: you or your team and that the critique is not fair.
A seasoned researcher will step away for a few
1 . Use a reference managing software. days and then come back to the letter and dissect
2. Be current (i.e. last 3–5 years unless citing a the critique carefully. It is important to under-
landmark paper that is relevant). stand that this critique has been written for two
3. Do not inundate the reviewer in unnecessary reasons. First, it is to point our scientific weak-
references. nesses in the proposal. These are usually fixable
problems and give you an opportunity to substan-
tially improve your proposal. Second, the review-
27.3.6 Not Funded and What Now?
ers are trying to guide you towards what they
expected and would like to see return to the study
Finally, a word about how to deal with the deci-
section or grant review committee. This insight is
sion of your proposal. There are two forms of
extremely valuable and allows the researcher to
rejection:
adjust a future proposal according to those rec-
1. Technical rejection for failure to comply with ommendations. In certain cases, the proposal was
submission guidelines, etc. identified as simply misguided for the mecha-
2. Failure to convince the reviewers of the value nism in which case this is good advice, as a
of your work resubmission will not likely be successful.
literature. The reviewer should critically check study’s methods. The design of the study should
the following points: be clearly described and justified: Was the trial
carried out prospectively or retrospectively? Was
• Does the introduction provide the necessary the design a randomized trial, a case control
background; in other words, why is it neces- design, or a case series? Furthermore, any
sary to do this study? obtained institutional review board or ethics
• What is known/unknown about the topic? committee approvals should be declared if appli-
• Is the research question of interest? Is it origi- cable. Similarly, any study registries such as clin-
nal or merely a repetition of existing knowl- ical trial registrations or systematic review
edge? Does the manuscript itself have a new registration should be stated.
message? The eligibility criteria of the patients or sub-
• Is the purpose clear? Is the hypothesis given? jects in the study should be clearly delineated. All
Is there any clinical relevance in a basic sci- inclusion criteria such as patient demographic
ence study? factors, diagnoses, etc. should be specified.
• Can the introduction be shortened? More Inclusion criteria based on diagnoses should be
often than not the introduction can be short- defined by specific symptoms, objective clinical
ened without losing the main message. findings, or radiographic parameter. In a study
including patients with femoroacetabular
If you are not familiar with the topic of the impingement, the author’s definition for a posi-
paper, you should search the previous literature. tive diagnosis should be stated; for example,
This will provide you with information about patients were diagnosed with cam-type impinge-
what is known/unknown and what is controver- ment in the presence of hip/groin pain, positive
sial about to the topic of the paper. As reviewer it flexion, adduction, internal rotation impingement
is important to assure that pertinent and up-to- testing, and positive radiographic parameters
date literature is cited while at the same time such as alpha angle greater than 50°. Similarly,
respecting some of the classic literature. all relevant exclusion criteria such as age, comor-
Additionally, it is important here to assure that bidities, and concurrent diagnoses should be
the current study is truly novel and not a duplica- determined a priori and clearly stated.
tion of a previously published study. A study’s interventions must also be well
described to allow reproducibility. In the case of
a surgical study, a detailed, step-by-step descrip-
28.3 Reviewing the Methods tion of the surgical technique similar to a com-
of a Clinical Research Paper prehensive dictated operative report should be
provided. Furthermore, prior conservative man-
The methods section is perhaps one of the most agement, surgical indications and decision-
important sections to assess when reviewing a making, postoperative rehabilitation, and
clinical research paper. Poor methodology can restrictions should be reported. Similarly, stud-
lead to suspect or flawed results and conclusions. ies with medical interventions must include
The fundamental principles of critically apprais- reporting of all dosages, frequency of therapy,
ing this section are ensuring transparency, clarity, and duration of therapy.
and reproducibility of the methods and maintain- The outcome measures used in the study
ing internal validity by minimizing any sources should be described by the authors, with clear
of bias in the study. delineation of the primary outcomes that the study
Authors should describe in detail and in a seeks to assess versus secondary outcomes.
clear and logical sequence how the study was Authors must not only explain what the outcomes
designed, executed, and analyzed. A reader are but how they are assessed, as well as the tim-
should be able to accurately reproduce the study ing at which they are assessed. If clinical outcome
based on the author’s description within the measure or scores are used, these measures should
28 How to Review a Clinical Research Paper? 257
be validated for the corresponding diagnoses and corrections should be performed to adjust for the
patient population. For example, in early hip multiple comparisons.
arthroscopy research, some patient-reported out-
come (PRO) scores initially designed and vali-
dated for hip arthroplasty were used to assess 28.3.2 Study Design Specific
surgical outcomes in younger, hip arthroscopy Methodology
patients. With development of novel patient-
reported outcome scores such as HAGOS (Hip Core principles in assessing study methodology
and Groin Outcome Score) or IHOT (International have been discussed above; however, there are
Hip Outcome Tool), these newer measures vali- different salient points to assess depending on the
dated for use in young to middle-aged adults with type and design of study being critically
hip and groin pain may be more appropriate for appraised. For example, when analyzing a ran-
use in the femoroacetabular impingement and hip domized controlled trial, it is important to scruti-
arthroscopy demographic [4, 7, 8]. nize allocation concealment and blinding,
whereas in a retrospective cohort study, this may
not be relevant. The following section will dis-
28.3.1 Statistical Analyses cuss important methodological considerations
specific to different study designs.
The assessment of appropriate statistical analy-
ses can often be overwhelming. The reporting of
descriptive statistics is important to consider 28.3.3 Randomized Controlled Trials
when reviewing a paper. Typically in studies that
have a large sample size without outliers, the The key feature of randomized controlled trials is
mean and standard deviation should be reported. that their purpose is to control bias through ran-
On the other hand, if the study sample size is domization and blinding. The difference between
small and there is outlier with the sample, a an effective randomized controlled trial and a
report of the median and range of results is pre- poorly executed one can be attributed to its meth-
ferred to provide the reader with a better under- odology in implementing these features. The exe-
standing of the sample. Calculations to determine cution of the randomization process should be
sample size should be included in relevant clini- clearly defined. The specific method used to gen-
cal trials to ensure the study is adequately pow- erate the randomization sequence should be dis-
ered. The authors should disclose the statistical cussed, as well as the type of randomization and
software that was used for analyses. The type of relevant methodological details such as blocking.
tests used for statistical analyses for each com- The method of allocation concealment is an
parison should be discussed, and the reviewer important factor to consider when appraising the
should determine if these are appropriate for the methods of a randomized controlled trial. Many
type of variables analyzed in the study. Analyses strategies of allocation concealment can be
of continuous variables should be performed fraught with potential for selection bias. Sealed
using the appropriate tests such as student’s envelopes with randomization data within it can
t-test, whereas analyses of parametric variables be subverted by investigators by holding it to the
should be performed using tests such as the Χ2 light to look through it or even by discretely
test or Fisher’s exact test. Studies comparing the opening the envelopes. Using seemingly unbi-
means of more than two groups should use anal- ased determinants like randomizing chart num-
ysis of variance (ANOVA) testing if comparing bers can cause bias if investigators, knowing
one independent variable and two-way ANOVA their treatment allocation, choose to include or
or multivariate ANOVA testing for multiple exclude patients based on their perceived out-
independent variables. If multiple secondary comes. The literature cites many ways investiga-
comparisons are made, appropriate Bonferroni tors can decipher the randomization results if
258 N. K. Patel et al.
inadequate allocation concealment strategies are and Meta-Analyses (PRISMA) guidelines also
used [5]. Centralized randomization is the gold recommend a flow diagram to summarize the
standard method of allocation concealment. In inclusion and exclusion process, including the
this method, the central office is called to estab- number of studies included and excluded at each
lish study eligibility, at which time the random- step of the review and the reasons for such. The
ization allocation for the included patient is review process to screen studies for inclusion can
given to the investigator. include a title and abstract review, followed by a
Blinding is an important process to protect mandatory full-text screening. This process should
randomization after the assignments to interven- be done in duplicate (by two independent review-
tions has been made and is another consideration ers) in order to reduce errors of excluding studies
in assessing a randomized controlled trial. that in fact meet the study criteria. Intra-reviewer
Blinding of subjects to their assigned interven- agreement statistics should be divulged to the
tion can remove any bias of preconceived notions reader to disclose how often consensus or dis-
or subject behavior that could affect the results of agreements occurred, and subsequently, methods
the study. Blinding can be extended to the care of resolving any conflict surrounding inclusion/
providers, outcome assessors, and other study exclusion should be reported. Any algorithms or
personnel to further reduce bias. The paper attempts to identify potentially overlapping study
should explain which parties were blinded fol- data should be discussed.
lowing assignment to intervention and how this Data extraction preferably should be per-
was managed and maintained. However, one formed in duplicate to minimize error. Any
must consider that sometimes it is near impossi- attempts to obtain data from original authors not
ble to blind certain parties such as blinding a sur- published in the included studies should be
geon to the intervention they are performing—a reported. All data points or variables that authors
characteristic that is unique to surgical studies. sought to extract from the included studies should
The CONSORT (Consolidated Standards of be stated, even if they were not available or pres-
Reporting Trials) guidelines and checklist are an ent in all studies.
invaluable resource to direct critical appraisal of Risk of bias and study quality of the included
randomized controlled trials (http://www.con- studies should be assessed by the authors. This
sort-statement.org/consort-2010) [6]. can be performed using various tools, scales, and
checklists. A commonly used tool is the Cochrane
Risk of Bias tool, which assesses various domains
28.3.4 Systematic Reviews of possible bias in the included studies, including
selection bias, performance bias, detection bias,
The search strategy is an important key to repro- attrition bias, reporting bias, and other biases [2].
ducibility of the systematic review. The strategy If a meta-analysis is performed, the choice of
can be quite complex in order for the authors to the summary measure (e.g., relative risk) should
thoroughly, yet efficiently identify all relevant be stated and justified. The statistical method
studies to include in the systematic review. The should be discussed, including the decision of
authors’ search strategy, including all search terms use of random effects models versus fixed effects
and search language, should be outlined in a way model, with authors providing rationale for their
that it can be repeated. Furthermore, all informa- choices. Heterogeneity or consistency of the
tion sources (databases, etc.) used should be results between studies should be analyzed.
described, including the date that the search was The PRISMA statement and checklist are an
last performed. With regard to study selection, invaluable resource to direct critical appraisal of
clear inclusion and exclusion should be delineated. systematic reviews (http://www.prisma-state-
Preferred Reporting Items for Systematic Reviews ment.org) [3].
28 How to Review a Clinical Research Paper? 259
Please assure the authors do not report “trends” in must assess if this agreement/disagreement
cases where statistical significance could not be favors the author’s interpretation of the results.
reached. This usually leads to confusion and bet- As a reviewer, it is important to have some base-
ter be addressed by increasing the sample size or line knowledge about other studies done on the
not reporting. In summary, we should scrutinize same topic in order to determine if the author’s
the results closely and interpret them carefully by line of thought makes sense. It is difficult to have
checking the abovementioned points. a depth of knowledge of certain topics, especially
as a young investigator. Thus, in those cases it
can be useful to read the papers that are being
28.5 I mportant Points to Evaluate referenced by the author in the introduction and
in the Discussion discussion to gain a better understanding of the
literature. With better knowledge of the topic, it
In general, the discussion of a paper serves as a will be become easier to follow the author’s logic
section to interpret the results and present the in the discussion and will allow the reviewer to
meaning of the findings. This section typically properly assess if that logic is appropriate.
includes the following: The strengths and limitation of a study design
play an important role when determining the
• Summary of the main findings of the paper. clinical relevance of the paper and also the poten-
• Focus on the main findings of the paper, and tial impact the paper can have on clinical prac-
avoid generalization to a point where the data tice. Many of the points that will be mentioned in
presented in the current study cannot support the strengths and limitations section of the dis-
the discussion. cussion are already mentioned earlier in
• Correlation of the main findings with previous “Reviewing the Methods of a Clinical Research
literature. Paper” section of the chapter. However, it is
• Strengths and weakness of the study design. important to highlight the factors that will most
• Future directions. directly influence the potential impact of the find-
• Conclusion/take-home message. ings of the study. The number of patients in a
study is an important factor since it directly
Each one of these components of the discus- affects the power of the conclusions made in the
sion should be evaluated to determine the poten- study. For example, if treatment A was shown to
tial clinical impact of the findings presented in have a decrease in length of hospital stay by
the paper. 3 days compared to treatment B, but this was
First, the summary of the main findings of the based on a trial of only ten patients, it does not
study should be comprehensive and set the stage influence clinical practice until a larger study is
for the discussion about these findings. There able to show the same effect. Additionally, mean
should be a balance between simply restating all follow-up time and loss to follow-up are also
of the results and omitting findings that might be important factors to consider when evaluating the
important for interpretation of the results in a impact of the findings of a clinical research paper.
clinical context. It is important for the reviewer to Future direction of a study provides informa-
ask whether this balance was achieved and to tion regarding the potential for the findings of the
make sure that findings from the results that study to influence additional research on the
could possibly add or negate from the author’s topic. This may be used further in the assessment
interpretation are not omitted. of the impact of the findings of a paper. If the
The main findings that are presented in the findings will serve as the basis for many future
beginning of the discussion are typically corre- studies, the influence of the paper may be greater
lated with previous findings in the literature. The than if it was just an isolated finding with no clear
findings of the current study can agree or dis- path of additional research that leads to clinical
agree with those in the literature, but the reviewer relevance.
28 How to Review a Clinical Research Paper? 261
3. Moher D, Shamseer L, Clarke M, et al. Preferred report- group randomised trials. BMC Med. 2010;8(1):18.
ing items for systematic review and meta- analysis https://doi.org/10.1186/1741-7015-8-18.
protocols (PRISMA-P) 2015 statement. Syst Rev. 7. Stone AV, Jacobs CA, Luo TD, et al. High degree
2015;4:1. https://doi.org/10.1186/2046-4053-4-1. of variability in reporting of clinical and patient-
4. Ramisetty N, Kwon Y, Mohtadi N. Patient-reported reported outcomes after hip arthroscopy. Am J
outcome measures for hip preservation surgery— Sports Med. 2017;46(12):3040–6. https://doi.
a systematic review of the literature. J Hip Preserv org/10.1177/0363546517724743.
Surg. 2015;2(1):15–27. https://doi.org/10.1093/jhps/ 8. Thorborg K, Tijssen M, Habets B, et al. Patient-
hnv002. Reported Outcome (PRO) questionnaires for young
5. Schulz KF. Subverting randomization in controlled to middle-aged adults with hip and groin disability:
trials. JAMA. 1995;274(18):1456–8. https://doi. a systematic review of the clinimetric evidence. Br J
org/10.1001/jama.1995.03530180050029. Sports Med. 2015;49(12):812. https://doi.org/10.1136/
6. Schulz KF, Altman DG, Moher D. CONSORT 2010 bjsports-2014-094224.
statement: updated guidelines for reporting parallel
Part V
How to Perform a Clinical Study: A Case
Based Approach
Level 1 Evidence: A Prospective
Randomized Controlled Study
29
Seper Ekhtiari, Raman Mundi, Vickas Khanna,
and Mohit Bhandari
solution (soap vs. normal saline) and their effects Feasibility refers to the practical and logisti-
on reoperation rate in open fractures. The study cal aspects of conducting a study. There are vari-
randomized a total of 2551 patient, of whom ous considerations within this component.
2447 were included in the final analysis. The Sample size is an important consideration: a sam-
investigators found that reoperation rate was ple size calculation (discussed later in this chap-
similar across irrigation pressures but was sig- ter) can help to estimate the number of subjects
nificantly lower for normal saline compared to required for adequate statistical power. A review
soap [61]. of hospital data can provide a rough estimate of
the potential number of subjects that can be
expected to be available for recruitment. These
29.2 Planning for a Randomized numbers should be treated cautiously, and factors
Controlled Trial such as declining participation, loss to follow-up,
ineligibility for inclusion, and random fluctua-
The adage “if you fail to plan, you are planning to tions in patient volumes should be considered.
fail” certainly applies when it comes to conduct- The number of subjects required/available will
ing RCTs. There are several crucial steps that also inform the timeframe of the study. It is
need to be taken before embarking on a full-scale important to ensure that adequate technical
RCT. These preliminary steps serve many func- expertise is available for the completion of the
tions, including setting the foundation for the project. Multicentre collaboration may be neces-
project, revealing technical and logistical chal- sary if the subject matter or outcome is highly
lenges with conducting the study, and verifying subspecialized and relatively rare. Funding is an
that the research question is indeed feasible, important issue, and a preliminary budget outline
novel, and interesting. These steps include defin- can be helpful in projecting the expected costs of
ing the research question(s), performing literature all aspects of the study. If external funding (e.g.
reviews, conducting surveys, executing pilot government research grants) is required, poten-
studies, calculating required sample sizes, and tial funding bodies should be identified early to
assembling the necessary support structures. ensure the project is be designed in a way that
meets their requirements and mandates. Finally,
the scope of the project should be broad enough
29.2.1 Defining the Research to answer the research question, but not too broad
Question as to cause confusion and risk losing focus on the
underlying question [29].
An appropriate, clearly defined research question The research question should be interesting.
is an indispensable part of conducting research at This may sound self-explanatory but can be dif-
all levels of evidence. As the foundation of the ficult to accomplish in actual practice. What may
entire project, this is arguably the most important seem interesting to a group of surgeons at one
step of the research process. As Kumar (2005) institution may be of limited or no relevance to
puts it, “it is like the identification of a destina- most other surgeons. This does not preclude the
tion before undertaking a journey…in the absence execution of the study, but does impact its appli-
of a destination, it is impossible to identify the cability and generalizability. Discussions with
shortest—or indeed any—route”. There are cer- experts in the field and potential funding bodies
tain requirements that should be considered in are good starting points to assess interest in the
the development of a good research question. idea [29]. Ultimately, a large survey of surgeons
Hulley et al. have suggested the use of the acro- in the relevant field is an effective way to objec-
nym FINER, which has become widely used and tively gauge interest in the research question and
accepted as important criteria for a good research assess its potential to change practice [46].
question [29]. The acronym stands for: feasible, Novelty is another important component of a
interesting, novel, ethical, and relevant. good research question. This is perhaps the most
268 S. Ekhtiari et al.
nuanced and least universal of the FINER crite- Whereas the FINER criteria provide useful
ria. Hulley et al. (2007) state that “good clinical general requirements for a good research ques-
research contributes new information” [29]. tion, the PICOT format presents a practical way
While this is certainly true, the concept of “new for defining and communicating the research
information” can be considered in various ways. question. This format was first outlined by
The research question does not need to be entirely Richardson et al. as “PICO”: population, inter-
new. A previous research question can be applied vention, comparator, and outcome [48]. It was
to a new population, a new tool or measurement later expanded to PICOT to include the time-
can be used to assess a previous research ques- frame over which the study would be conducted
tion, and so on. As well, the importance of repli- [24]. Like the FINER criteria, the PICOT format
cation as part of the scientific method cannot be is applicable to all levels of evidence. The indi-
overstated. A recent analysis of education vidual components of the PICOT format are out-
research found that only 0.13% of publications in lined below.
the top 100 education journals were dedicated to The population of interest should be well
replicating previous research [37]. Thus, while defined. This is mostly accomplished through
novelty is important, it can be interpreted in many clear and well-defined inclusion and exclusion
different ways and should not be taken to mean criteria [60]. Important components include basic
that every research project has to have an entirely demographic information (age range, sex, etc.),
original research question. the nature of the injury/disease (e.g. acute vs.
While it may seem rather obvious that a subacute/chronic), and special populations (e.g.
research question should be ethical, the history paediatric or pregnant patients, etc.). Some com-
of clinical research contains many cautionary ponents of the target population may need to be
tales of unethical research, including some rela- considered even though they are not formally
tively recent examples. The turning point for part of the inclusion and exclusion criteria. For
ethics as a cornerstone of human research
example, two hypothetical studies conducted in
occurred following World War II, with the estab- Brazil and China may have identical inclusion/
29 Level 1 Evidence: A Prospective Randomized Controlled Study 269
exclusion criteria, but will inevitably have some Finally, the planned timeframe is important
inherent differences in the populations they to identify. Again, as with costs, this will not be
recruit. an exact projection, so some buffer room should
The intervention of interest is often a new be accounted for in case some steps take longer
technique, implant, or adjunct that is the primary than expected. The timeframe also has significant
focus of the study. The opportunity should be logistical implications: budgeting for research
taken at this early stage to define the intervention and administrative staff can vary significantly
clearly and in detail. All pertinent and applicable depending on the timeframe, and the graduation
details should be recorded. If a new surgical tech- and potential relocation of learners may need to
nique is being studied, a detailed description of be accounted for.
the operation, often published separately as a The FINER and PICOT criteria can be
technique article, is important. The comparator applied to the development of primary and sec-
is essentially the control group[s]. Again, as ondary research questions. The primary ques-
much detail as possible should be included. The tion should be a single, clearly stated question
type of comparator (e.g. no treatment vs. placebo driven by a hypothesis that forms the basis for
vs. alternate treatment) should be explicitly stated the design of the study and the choice of data
and clearly defined [62]. collection instruments [21]. Secondary research
The outcome is at the heart of why the study questions are other outcomes related to the
is being conducted. What is expected to improve intervention. Any results or answers to second-
(or not) with this new or different intervention? ary questions should be considered preliminary
Generally, in orthopaedics, there are three broad rather than definitive [64]. As well, secondary
categories of outcomes that are important to eval- questions should be limited in number to avoid
uate: generic outcomes, condition-specific out- the trap of many comparisons. When many
comes, and utility outcomes. comparisons (i.e. statistical tests) are made, the
chances of finding a spurious result increase,
unless the appropriate post hoc statistical cor-
Case-Based Example 2: Excerpt from FLOW rections are performed to account for the multi-
Protocol ple comparisons [21].
The FLOW investigators described all
intervention arms in great detail in their
original study protocol, which was pub- 29.2.2 Literature Review
lished [62]. For example “…will use ster-
ile technique to inject 80 mL of the clear A detailed description of how to conduct litera-
liquid soap (Castile Soap, Triad Medical ture reviews, including systematic reviews, can
Inc. Franklin, Wisconsin—17% concentra- be found later in this book. Briefly, a thorough
tion in de-ionized water preserved in review of the literature is important to ensure that
90 mL bottles)”. Irrigation pressures were the research question is novel and that clinical
also defined objectively with PSI cut-offs. equipoise exists. Clinical equipoise refers to the
This level of detail is important for multi- fact that for a given comparison of treatment
ple reasons: it allowed the study to be con- arms (e.g. operative vs. nonoperative, drug vs.
ducted consistently across 41 sites on four placebo, operation X vs. operation Y, etc.), a con-
different continents, simplifies any future sensus of experts would not consider one arm
repetition of the study, it made for clear clearly superior to the other [25]. Clinical equi-
and consistent communication in subse- poise is an important requirement of an RCT,
quent publications, and it minimized the both from an ethical standpoint and in terms of
effects of varying protocols and practice potential clinical applicability. In addition to pub-
patterns. lished literature, it is important to consider
sources of unpublished and in-progress literature,
270 S. Ekhtiari et al.
29.3.1 REB Approval and Trial physicians are often the first members of the
Registration orthopaedic surgery team to meet a patient and
thus can play an important role in maximizing
Prior to conducting an RCT, institutional REB recruitment by identifying eligible patients. If a
approval should be obtained. In addition, any trial is time-sensitive (e.g. time from presentation
necessary funding should be applied for and to operating room is a primary outcome or an
secured. Both processes will generally require a independent variable), it may be necessary to
detailed protocol of the study, including the lit- train triage and other emergency department staff
erature review, objectives, methods, hypotheses, to identify and flag eligible patients for
timelines, and detailed budget. Chapter 8 in this recruitment.
book provides a detailed overview of how to pre- What will the recruitment process entail? A
pare a study protocol. Once the protocol is com- detailed informed consent form, which includes a
plete, it should be registered with the appropriate patient copy, is necessary but not sufficient in
trial registry. In North America, this is most com- patient recruitment. It is important that the indi-
monly www.clinicaltrials.gov and in the viduals obtaining informed consent are well
European Union www.clinicaltrialsregister.eu. informed about the purpose of the study, the vari-
Prospective registration of all clinical trials is ous treatment arms, and the potential risks and
important because it ensures transparency, pre- benefits of each. Copies of the informed consent
vents duplication, reduces publication bias, and form should be readily accessible and available
lessens the likelihood of selective reporting [6]. to members of the research team at key patient
encounter locations (e.g. emergency department,
clinic, etc.). As well, if any incentives are being
29.3.2 Patient Recruitment provided for participation (e.g. gift cards, etc.),
these should also be available in advance in a
A useful way to think about patient recruitment is secure location.
to think about the “who, what, where, when, why, Where and when will the recruitment take
and how”. A detailed recruitment plan is impor- place? Depending on the details of the RCT,
tant to have at the protocol stage, as adequate recruitment may take place in the emergency
sample size is integral to RCT success. An unsuc- department, on inpatient wards, or in outpatient
cessful recruitment strategy can serve as a bottle- clinics. In the case of multicentre studies, the
neck that slows down or altogether halts an ideal recruitment location may vary between the
otherwise well-planned trial. different sites. Thus, consultation with staff at
Who will recruit the subjects? It is impor- each site is important in the planning and pilot
tant to identify and train the individuals who will stages to ensure that the protocol is consistent but
be recruiting patients. Care should be taken to adaptable to each site. It is also important to
ensure that these individuals are not those directly define a priori the “hours” of the trial. This is
responsible for the care of the patient. Involving particularly applicable if the trial is focused on
the patient’s care team (physicians, nurses, etc.) emergent and/or traumatic conditions, as they
can be an important starting point as there will may present at any hour. Realistic goals should
already be an established rapport. However, it is be set depending on the availability of research
preferable that the role of the patient’s care team support staff. For example, the HIP ATTACK
be limited to informing the patient about the trial is an ongoing, international RCT evaluating
research project and evaluating whether the accelerated (within 6 h of diagnosis) vs. standard
patient would be willing to meet with a member care for hip fracture patients. For the enrolment
of the research team [55]. An independent mem- of each patient, a research staff is required to
ber of the research team should then obtain enrol and randomize the patient in a timely man-
informed consent, and it should be made clear ner to achieve the target time for patients ran-
that the decision to not participate in the trial will domized to accelerated treatment. Thus, in this
not unduly affect the patient’s care. Resident trial, subjects are only recruited during daytime
29 Level 1 Evidence: A Prospective Randomized Controlled Study 273
working hours, although each recruitment site recruitment issues. Thoma et al. provide a detailed
may have differing definitions of working hours guide to troubleshooting each type of recruitment
depending on staff availability [28]. challenge [63].
Though patient recruitment in surgical trials
most often occurs in person, there are multiple
other methods for patient recruitment. These 29.3.3 Randomization
include media outlets, physician referrals, cold
calls/mailings, and online recruitment [63]. Randomization, as defined above, is essentially
Previous literature has identified physician refer- the process by which subjects are allocated to the
rals and online recruitment as the most cost- various groups of an RCT. It should be noted that
effective strategies [63]. Local and institutional methods such as the use of chart numbers, birth
regulations should be consulted before public dates, or alternating allocation are not truly ran-
recruitment strategies are considered. dom and thus considered “quasi-randomization”;
How will the recruitment be performed and they should be avoided if possible [17]. Even
why? There are multiple different recruitment truly random methods that are manually per-
strategies, and the most appropriate method formed (e.g. coin flips, dice throwing, etc.) are
should be chosen to fit the study design. These vulnerable to user error and/or interference. The
strategies include [63]: use of secure, computer-generated randomization
software is usually considered the best method
1. All patients recruited simultaneously and
for random allocation [17].
begin trial at the same time. There are at least four different types of ran-
2. Patients enter trial in a “batched” fashion. dom allocation, all of which may be acceptable
3. Continuous recruitment until target sample
for an RCT depending on the specific scenario:
size is reached.
4. Continuous recruitment until a target end date • Simple randomization is most commonly
is reached. performed using a random table number or
computer-generated randomization software.
Each strategy is better suited for certain study This is the easiest randomization technique to
designs than others. For example, recruiting all employ, but should generally be reserved for
patients and beginning a trial simultaneously large clinical trials. In smaller clinical trials,
works best for nonoperative trials, where many this method could result in groups that are
patients can be instructed to begin a therapy on or unbalanced in terms of total numbers and/or
about the same date. Batch enrolment can work baseline characteristics [59].
well for common, elective procedures, such as • Block randomization is useful in smaller tri-
total joint arthroplasty. For example, all eligible als to ensure similar allocation between
patients undergoing a total joint arthroplasty in groups. The “block size” refers to the number
each week can be recruited and enrolled in the of patients that are randomized at a time and
trial, and this process can be repeated as neces- should be a multiple of the number of treat-
sary. Continuous recruitment is the most common ment groups (e.g. if two treatment groups
recruitment strategy in surgical trials, particularly exist, block size can be 4, 6, 8, etc.). Subjects
when it comes to trauma. For most RCTs, recruit- in these small blocks are then randomized in a
ment is continued until the target sample size is balanced fashion to the treatment groups. For
reached. If there is a clear rationale to stop recruit- example, in a study with two treatment groups
ment on a specific date (e.g. seasonal conditions/ (A and B) and a block size of four, the first
injuries), this strategy may be considered. four subjects can be randomized in any of the
Patient recruitment difficulties may be encoun- following combinations: AABB, ABAB,
tered with various aspects of the RCT. These BBAA, ABBA, BAAB, and BABA. One of
include the protocol itself, staff- or site-specific these six combinations is then selected at ran-
issues, surgeon-related issues, and patient-related dom and applied to the first four patients. The
274 S. Ekhtiari et al.
process is then repeated for each subsequent are overestimated by 41% in trials that do not
group of four subjects [4, 59]. The block sizes perform allocation concealment [52].
can be constant or variable for each Traditionally, a common technique for allocation
allocation. concealment was the use of opaque, sealed enve-
• Stratified randomization is a strategy that lopes, which a member of the research team
attempts to minimize the likelihood of hav- would open for each patient at the time of ran-
ing significantly different covariates between domization. The Cochrane Handbook of
patient groups. In this type of randomiza- Systematic Reviews of Interventions designates
tion, patients are grouped based on select, the use of sealed opaque envelopes as carrying a
predetermined prognostic factors before “low risk of bias” [27]. Empirical data, however,
being randomized. For example, in a study suggests that trials utilizing sealed opaque enve-
on the wound complications of total joint lopes are prone to tampering [33] and are more
arthroplasty, the presence of diabetes melli- likely to demonstrate statistically significant
tus (DM) is an important prognostic factor. results compared to those using distance random-
Thus, patients could be stratified based on ization [26]. Distance randomization occurs
DM status before being randomized to a when the randomization process is completely
treatment arm. Though this type of random- removed from the control of the individual(s)
ization is appealing due to its potential to enrolling the subjects. Usually, this occurs either
control for covariates, it can become compli- via telephone or the Internet, whereby the
cated if many covariates need to be con- research team member calls or logs into a secure,
trolled. As well, this type of randomization centralized service that then randomizes the sub-
requires that all participants be identified ject and records their allocation. Distance ran-
before randomization occurs [59]. While domization, particularly through third-party,
this may be well suited for some elective secure websites, is the preferred method for ran-
procedures (e.g. by identifying patients on a domizing subjects in today’s RCTs.
waiting list for the same operation), it would
not be applicable for any emergent or
trauma-related RCT. 29.3.5 Blinding
• Covariate-adaptive randomization is a
strategy that attempts to address some of the Blinding is a distinct concept from allocation
limitations of stratified randomization. concealment; it refers to the process by which
Essentially, subjects are enrolled sequentially one or more of the following groups are kept
in real time, and the covariates of interest (e.g. unaware of the subject’s assigned treatment
age, comorbidities, etc.) are entered for each group: (1) the subjects, (2) the caregivers, (3) the
patient. Covariate-adaptive randomization outcome adjudicators, (4) the data collectors, and
attempts to randomize new subjects into (5) the data analysts. Depending on the study
groups by considering the previously design, any or all of these five groups may be
randomized subjects and correcting for any blinded [31]. According to the Consolidated
imbalances between groups [59]. Standards of Reporting Trials (CONSORT) state-
ment, the commonly used terms “single blinded”,
“double blinded”, and “triple blinded” should be
29.3.4 Allocation Concealment avoided, as they are ambiguous and uninforma-
tive. Rather, investigators should specifically
Allocation concealment refers to a situation in state which, if any, of the five groups outlined
which the individual enrolling the patients and above were blinded [43]. As well, it is important
performing the randomization does not know to clarify if the same person or group fulfilled
which group the next subject will be randomized more than one of the roles. For example, in a sur-
to. Previous literature has shown that effect sizes gical trial, the primary surgeon can potentially be
29 Level 1 Evidence: A Prospective Randomized Controlled Study 275
the caregiver, the outcome adjudicator, and the discussed in greater detail below, but this is
data collector. essentially the only way to truly blind subjects in
As a rule, as many groups as possible should an operative vs. nonoperative trial.
be blinded to the group allocation. In practice, Blinding surgeons in an operative trial is
particularly in the context of surgical trials, this is almost always impossible or highly impractical
not always possible. For example, it is impossible unless the intervention is an adjunct that can be
to blind the surgeon to an operative intervention easily masked (e.g. intraoperative local anaes-
[44]. Inability to blind one or more groups, how- thetic injection). Blinding of the remaining three
ever, should not automatically result in a com- groups, however, can and should be done in many
plete lack of blinding. A systematic review of cases. Blinding of outcome adjudicators and data
orthopaedic trauma RCTs found that less than collectors can be performed in several different
10% of RCTs reported blinding outcome adjudi- ways. Surgical scars can be concealed with
cators. Interestingly, up to 96% of outcome adju- patient clothing or large dressings [38]. Even
dicators could have been blinded with simple radiographs showing different implants can
methods not requiring major changes to the RCT sometimes be masked using creative digital alter-
protocol [30]. A more recent review of all surgi- ation, as demonstrated by Karanicolas et al. [31].
cal literature found that 52% of RCTs blinded the Radiographs and other imaging modalities can be
outcome adjudicators, compared to 67% of de-identified prior to being provided to the out-
RCTs, which could have done so [56]. This come adjudicator. In the case of patient-reported
review, however, was not systematically per- outcomes, an extra, independent individual may
formed and only examined the ten highest-impact be required to collect data from the patient, de-
journals in medicine and surgery and thus may identify it, and then provide it to the data assess-
paint an overly optimistic picture. ment team. Blinding of the data assessment team
Blinding is particularly important in orthopae- is perhaps the simplest form of blinding to
dic RCTs, as many outcome measures are focused achieve, only requiring that data is coded in such
on patient-reported levels of function, pain, and a way that allows comparison between groups
quality of life. Thus, creative strategies may be but does not allow the data analyst to decipher
required to blind each group of individuals. each subject’s allocation. Despite this, no ortho-
Blinding subjects is a simple task in some medi- paedic RCTs published between 1988 and 2000
cal and procedural trials. For example, a trial of specified their use of this technique [8].
intra-articular knee injections comparing cortico-
steroids to placebo was performed with the sub-
jects blinded [47]. By having the syringes Case-Based Example 6: Randomization,
prepared by an independent healthcare provider, Allocation Concealment, and Blinding in the
the caregivers could have also been blinded in FLOW Study
this trial. Similar strategies can be used for other The FLOW study randomized subjects
nonoperative trials that do not require a visible using a custom-built web-based random-
construct such as a specific sling or brace type. ization system using variable block ran-
Blinding of subjects can also be relatively domization. Given that this was a ‘distance
straightforward for trials comparing two different randomization’ technique, allocation con-
types of operations, particularly if both opera- cealment was ensured. Patients, outcome
tions use similar surgical approaches. In trials adjudicators, and data analysts were all
comparing operative to nonoperative manage- blinded to subject allocation. In addition, a
ment, however, blinding becomes more challeng- central adjudication committee, also
ing. Blinding of subjects can be performed blinded to group allocation, assessed all
through sham surgery, which also provides an subjects for eligibility before or shortly
excellent control for the placebo effect of having after randomization.
undergone an operation [44]. Sham surgery is
276 S. Ekhtiari et al.
29.3.6 Control Groups and most appropriate choices for control groups.
The decision between the inclusion of one or both
An appropriate choice of control group(s) is of groups depends on the pre-existing evidence on
paramount importance in the success and impact the current standard of care. If there exists true
of an RCT. There are many different types of clinical equipoise about the effectiveness of a
control groups that can be used in surgical trials. given operation, for example, it would be reason-
In general terms, these include no treatment con- able to randomize patients to either operative or
trols, placebo controls, and active treatment con- conservative treatment and compare their out-
trols. From a strictly scientific standpoint, the comes. However, if there is a well-established
best control group is a placebo group—i.e. a treatment that is known to be effective, it would
group that receives no active intervention, but is be unethical to withhold that treatment altogether.
blinded to this fact. This type of control can be Thus, the new intervention should be compared to
implemented with relative ease in trials of oral or the standard of care [29]. It can sometimes be dif-
injected medications. As discussed earlier, pla- ficult to determine whether it would be ethical to
cebo groups in the context of operative trials withhold the current standard of care. A treatment
essentially amount to sham surgery. A recent sys- may have become established as the “standard of
tematic review found six orthopaedic RCTs that care” based on little or no high-quality evidence.
utilized sham surgery. Interestingly, all six stud- Legally, a determination of “standard of care” is
ies found that sham surgery was as effective as often made based on the practice patterns of simi-
therapeutic surgery. The most recent of these six lar physicians. Thus, even if scientific evidence is
trials was published in 2012 [36]. lacking, it may be legally perilous to withhold a
Thus, sham surgery is clearly possible in commonly practiced standard of care [40].
orthopaedic RCTs and has the potential to pro-
vide very useful information. There are, however,
significant ethical concerns with randomizing 29.3.7 Follow-Up
patients to receive sham surgery. Patients are
exposed to risk, with no therapeutic intervention, The intended length of follow-up should be
which may violate the ethical principles of non- decided a priori (see the discussion on PICOT
maleficence and beneficence [44]. On the other format, Sect. 29.2). Of course, follow-up can
hand, the argument could be made that by reveal- then be extended beyond that timeframe, but the
ing some procedures to be no better than placebo, target follow-up is a decision made based on the
future patients can avoid undertaking unneces- known or expected natural history of the disease
sary risks. This does not, however, account for or injury. Arrangements should be made in
the fact that performing sham surgery also threat- advance to minimize loss to follow-up, such as
ens the trust in the doctor-patient relationship asking patients if they plan to relocate within the
because sham surgery status must be kept blinded study time period and obtaining multiple alter-
even at follow-up. Overall, sham surgery may be nate contacts for each subject. All loss to follow-
appropriate in select cases, particularly when up should be carefully recorded, including the
there is true clinical equipoise about the utility of reason for loss (e.g. loss of contact, death, with-
a surgical intervention and when that intervention drawal of consent, etc.). As well, all adverse
is minimally invasive, low risk, and common events, even those fully unrelated to the interven-
enough to confer the results with significant tion, should be carefully tracked.
potential impact [44]. Extra attention should be
paid to the informed consent process to ensure
that patients truly understand the rationale, logis- 29.3.8 Statistical Analysis
tics, and implications of sham surgery.
For most orthopaedic trials, active treatment Part III of this book contains a basic but thorough
and/or no treatment is the most commonly used approach to statistics. Therefore, only concepts
29 Level 1 Evidence: A Prospective Randomized Controlled Study 277
controlled trial design. Trials. 2015;16(1):241. https:// Joint Surg Am. 2008;90(5):1026–33. https://doi.
doi.org/10.1186/s13063-015-0739-5. org/10.2106/JBJS.G.00963.
17. Dettori J. The random allocation process: two
31.
Karanicolas PJ, Farrokhyar F, Bhandari
things you need to know. Evid Based Spine M. Blinding: who, what, when, why, how? Can J
Care J. 2010;1(3):7–9. https://doi.org/10.105 Surg. 2010;53(5):345–8. https://doi.org/10.1007/
5/s-0030-1267062. s10269-005-0145-9.
18.
Eagleman DM. Why public dissemination 32. Keating JF, Grant A, Masson M, Scott NW, Forbes
of science matters: a manifesto. J Neurosci. JF. Displaced intracapsular hip fractures in fit, older
2013;33(30):12147–9. https://doi.org/10.1523/ people: a randomised comparison of reduction and fixa-
JNEUROSCI.2556-13.2013. tion, bipolar hemiarthroplasty and total hip arthroplasty.
19. Euser AM, Zoccali C, Jager KJ, Dekker FW. Cohort Health Technol Assess. 2005;9(41):iii–iv, ix–x, 1–65.
studies: prospective versus retrospective. Nephron 33. Kennedy A, Grant A. Subversion of allocation in
Clin Pract. 2009;113(3):c214–7. https://doi. a randomised controlled trial. Control Clin Trials.
org/10.1159/000235241. 1997;18(Suppl 3):S77–88.
20. Eysenbach G. Can tweets predict citations? Metrics 34. Leon AC, Davis LL, Kraemer HC. The role and
of social impact based on twitter and correlation with interpretation of pilot studies in clinical research.
traditional metrics of scientific impact. J Med Internet J Psychiatr Res. 2011;45(5):626–9. https://doi.
Res. 2011;13(4):e123. https://doi.org/10.2196/ org/10.1016/j.jpsychires.2010.10.008.
jmir.2041. 35. Lochner HV, Bhandari M, Tornetta P. Type-II error
21. Farrugia P, Petrisor BA, Farrokhyar F, Bhandari
rates (beta errors) of randomized trials in orthopaedic
M. Practical tips for surgical research: research trauma. J Bone Joint Surg Am. 2001;83(11):1650–5.
questions, hypotheses and objectives. Can J Surg. https://doi.org/10.2106/00004623-200111000-00005.
2009;53(4):278–81. https://doi.org/10.1503/cjs.036311. 36.
Louw A, Diener I, Fernández-de-las-Peñas C,
22.
Freedman KB, Back S, Bernstein J. Sample Puentedura EJ. Sham surgery in orthopedics: a
size and statistical power of randomised, con- systematic review of the literature. Pain Med.
trolled trials in orthopaedics. J Bone Joint Surg 2017;18(4):736–50. https://doi.org/10.1093/pm/
Br. 2001;83(3):397–402. papers3://publication/ pnw164.
uuid/611B66F4-34E7-4087-878C-527960117276. 37. Makel MC, Plucker JA. Facts are more important than
23. Gofton WT, Solomon M, Gofton T, et al. What do novelty: replication in the education sciences. Educ
reported learning curves mean for orthopaedic sur- Res. 2014;43(6):304–16. https://doi.org/10.3102/001
geons? Instr Course Lect. 2016;65:633–43. 3189x14545513.
24. Haynes R. Forming research questions. In: Haynes 38. Malavolta EA, Demange MK, Gobbi RG, Imamura
R, Sacket D, Guyatt G, Tugwell P, editors. Clinical M, Fregni F. Randomized controlled clinical trials
epidemiology: how to do clinical practice research. in orthopedics: difficulties and limitations. Rev Bras
Philadelphia: Lippincott Williams & Watkins; 2006. Ortop. 2011;46(4):452–9. https://doi.org/10.1016/
p. 3–14. s2255-4971(15)30261-5.
25. Helmy A, Timofeev I, Santarius T, Hutchinson
39. Mandal J, Acharaya S, Parija SC. Ethics in human
P. What constitutes clinical equipoise. Br J research. Trop Parasitol. 2011;1(1):2–3.
Neurosurg. 2009;23(5):564–5. https://doi. 40.
Marchant GE, Scheckel K, Campos-outcalt
org/10.1080/02688690903029760. D. Contrasting medical and legal standards of evi-
26. Hewitt C. Adequacy and reporting of allocation con- dence: a precision medicine case study. J Law
cealment: review of recent trials published in four gen- Med Ethics. 2016;44(1):194–204. https://doi.
eral medical journals. BMJ. 2005;330(7499):1057–8. org/10.1177/1073110516644210.
https://doi.org/10.1136/bmj.38413.576713.AE. 41. Mehta S, Myers T, Lonner J, Huffma R, Sennett
27. Higgins J, Green S. Cochrane handbook for systematic BJ. The ethics of sham surgery in clinical
reviews of interventions version 5.1.0 [updated March orthopaedic research. J Bone Joint Surg Am.
2011]. Cochrane Collab. 2011;0(March):10–11. http:// 2007;89(7):1650–3.
www.cochrane.org/training/cochrane-handbook. 42. Misra S. Randomized double blind placebo control
28. Hip Fracture Accelerated Surgical Treatment and Care studies, the “gold standard” in intervention based stud-
Track (HIP ATTACK) Investigators. Accelerated care ies. Indian J Sex Transm Dis AIDS. 2012;33(2):131–
versus standard care among patients with hip fracture: 4. https://doi.org/10.4103/0253-7184.102130.
the HIP ATTACK pilot trial. CMAJ. 2014;186(1):E52– 43. Moher D, Hopewell S, Schulz KF, et al. CONSORT
60. https://doi.org/10.1503/cmaj.130901. 2010 explanation and elaboration: updated guidelines
29. Hulley SB, Cummings SR, Browner WS, Grady
for reporting parallel group randomised trials. Int J
DG, Newman TB. Designing clinical research, vol. Surg. 2012;10(1):28–55. https://doi.org/10.1016/j.
78. Philadelphia: Lippincott Williams & Wilkins; ijsu.2011.10.001.
2007. 44. Mundi R, Chaudhry H, Mundi S, Godin K, Bhandari
30. Karanicolas PJ, Bhandari M, Taromi B, et al. Blinding M. Design and execution of clinical trials in orthopae-
of outcomes in trials of orthopaedic trauma: an oppor- dic surgery. Bone Joint Res. 2014;3(5):161–8. https://
tunity to enhance the validity of clinical trials. J Bone doi.org/10.1302/2046-3758.35.2000280.
29 Level 1 Evidence: A Prospective Randomized Controlled Study 283
45. Nissen T, Wynn R. The history of the case report: a Intensive Crit Care Nurs. 2013;29(6):300–9. https://
selective review. JRSM Open. 2014;5(4):5. https:// doi.org/10.1016/j.iccn.2013.04.006.
doi.org/10.1177/2054270414523410. 56. Speich B. Blinding in surgical randomized clinical tri-
46. Petrisor B, Jeray K, Schemitsch E, et al. Fluid lavage als in 2015. Ann Surg. 2017;266(1):21–2. https://doi.
in patients with open fracture wounds (FLOW): org/10.1097/SLA.0000000000002242.
an international survey of 984 surgeons. BMC 57. Sprague S, Quigley L, Bhandari M. Survey design
Musculoskelet Disord. 2008;9(1):7. https://doi. in orthopaedic surgery: getting surgeons to respond.
org/10.1186/1471-2474-9-7. J Bone Joint Surg Am. 2009;91(Suppl 3):27–34.
47. Raynauld J-P, Buckland-Wright C, Ward R, et al.
https://doi.org/10.2106/jbjs.h.01574.
Safety and efficacy of long-term intraarticular steroid 58. Stieglitz S, Dang-Xuan L. Social media and political
injections in osteoarthritis of the knee: a random- communication: a social media analytics framework.
ized, double-blind, placebo-controlled trial. Arthritis Soc Netw Anal Min. 2013;3(4):1277–91. https://doi.
Rheum. 2003;48(2):370–7. https://doi.org/10.1002/ org/10.1007/s13278-012-0079-3.
art.10777. 59. Suresh K. An overview of randomization tech-
48. Richardson WS, Wilson MC, Nishikawa J, Hayward niques: an unbiased assessment of outcome in clinical
RS. The well-built clinical question: a key to evidence- research. J Hum Reprod Sci. 2011;4(1):8. https://doi.
based decisions. ACP J Club. 1995;123(3):A12–3. org/10.4103/0974-1208.82352.
https://doi.org/10.7326/ACPJC-1995-123-3-A12. 60. Thabane L, Thomas T, Ye C, Paul J. Posing the research
49. Röhrig B, du Prel J-B, Wachtlin D, Kwiecien R,
question: not so simple. Can J Anesth. 2009;56(1):
Blettner M. Sample size calculation in clinical trials. 71–9. https://doi.org/10.1007/s12630-008-9007-4.
Dtsch Arztebl Int. 2010;107(31–32):552–6. https:// 61. The FLOW Investigators. A trial of wound irrigation
doi.org/10.3238/arztebl.2010.0552. in the initial management of open fracture wounds. N
50. Roman H, Marpeau L, Hulsey TC. Surgeons’ expe- Engl J Med. 2015;373:2629–41.
rience and interaction effect in randomized con- 62. The FLOW Investigators. Fluid lavage of open
trolled trials regarding new surgical procedures. Am wounds (FLOW): design and rationale for a large,
J Obstet Gynecol. 2008;199(2):108.e1–6. https://doi. multicenter collaborative 2 × 3 factorial trial of irri-
org/10.1016/j.ajog.2008.03.002. gating pressures and solutions in patients with open
51.
Rothschild PC. Social media use in sports fractures. BMC Musculoskelet Disord. 2011;11:85.
and entertainment venues. Int J Event Festival 63. Thoma A, Farrokhyar F, Mcknight L, Bhandari
Manag. 2011;2(2):139–50. https://doi. M. How to optimize patient recruitment. Can J
org/10.1108/17582951111136568. Surg. 2010;53(3):205–10. https://doi.org/10.1503/
52.
Schulz KF, Chalmers I, Hayes RJ, Altman cjs.018012.
DG. Empirical evidence of bias. Dimensions of 64. Thoma A, McKnight L, McKay P, Haines T. Forming
methodological quality associated with estimates the research question. Clin Plast Surg. 2008;35(2):189–
of treatment effects in controlled trials. JAMA. 93. https://doi.org/10.1016/j.cps.2007.10.009.
1995;273(5):408–12. https://doi.org/10.1001/ 65. Vavken P. Rationale for and methods of superior-
jama.273.5.408. ity, noninferiority, or equivalence designs in ortho-
53. Schwitzer G. How do US journalists cover treat-
paedic, controlled trials. Clin Orthop Relat Res.
ments, tests, products, and procedures? An evaluation 2011;469(9):2645–53. https://doi.org/10.1007/
of 500 stories. PLoS Med. 2008;5(5):e95. https://doi. s11999-011-1773-6.
org/10.1371/journal.pmed.0050095. 66. Viswanath K, Blake KD, Meissner HI, et al.
54. Selvaraj S, Borkar DS, Prasad V. Media coverage Occupational practices and the making of health news:
of medical journals: do the best articles make the a national survey of U.S. health and medical science
news? PLoS One. 2014;9(1):e85355. https://doi. journalists. J Health Commun. 2008;13(8):759–77.
org/10.1371/journal.pone.0085355. https://doi.org/10.1080/10810730802487430.
55. Smith OM, McDonald E, Zytaruk N, et al. Enhancing 67. WMA. Declaration of Helsinki. Vol. 353. 1974.
the informed consent process for critical care https://doi.org/10.2471/BLT.08.057737.
research: strategies from a thromboprophylaxis trial.
Level 1 Evidence: Long-Term
Clinical Results
30
Daisuke Araki and Ryosuke Kuroda
this context, we will focus on level 1 studies of evaluation of adverse event/effectiveness moni-
the long-term clinical results. toring, and (6) statistical analysis. In particular,
the research design should fully consider the
ethical aspects of patient care, as well as the
Clinical Vignette 1
scientific aspects to accomplish the research
Study title (e.g.,): 20 years’ clinical follow-
and to properly verify the clinical hypothesis.
up and osteoarthritic change after anterior
Investigators formulated the study question
cruciate ligament (ACL) reconstruction.
before the first patient was enrolled. Next, iden-
The title should include that the topic is
tify the optimal number of cases. If the optimal
important, relevant, and innovative. It is a
number of cases is not determined, it may fail to
summary of the abstract and should be
obtain significant results (i.e., underpowered).
short, descriptive, and interesting.
Clinical Vignette 3
First, it is unethical to design the use of pla-
Has informed consent been obtained? Is
cebo and randomization regarding this topic.
the study approved by the institutional
Patients assigned to the placebo arm of a clinical
review board or ethical committee?
trial must be made to believe they are receiving a
working treatment, even though they are not, for
the placebo effect to play a role at all. However, a 30.1.2 Informed Consent
more serious issue with the use of placebo remains
the possibility that participants are harmed by One of the most important ethical constructs of
receiving a placebo instead of an active treatment clinical ethics, informed consent, is an essential
[6]. If the patients are divided into two groups, (1) condition both for therapy and research.
arthroscopic inspection and ACL reconstruction Information that makes consent valid is generally
and (2) arthroscopic inspection only, the placebo thought to include the understanding of the risks
may expose the patients to higher levels of pain and benefits of the treatment(s) that patients may
and aggravate their condition. receive; understanding of the procedures that the
Second, concepts of study methodology are participant may undergo, including, in the case of
crucial to consider when placing a study into the RCTs, blinding and randomization; understand-
levels of evidence. There are some that advocate ing that participation in research is voluntary;
dividing the hierarchy levels into sub-levels and, finally, understanding the purpose of the
based, in part, on study methodology. Others sug- research [6].
gest that poor methodology will take a study
down a level [1]. Therefore, in starting clinical Clinical Vignette 4
research, researchers first create a “clinical Who performed the treatment? Who evalu-
research protocol.” ated the patient outcome? Who performed
the statistical analysis?
Clinical Vignette 2
Is the patient cohort adequately reported,
e.g., age, sex distribution, concomitant 30.1.3 Blinding
injuries, surgery, and sufficient power?
Blinding is most often performed in prospective
studies. Personnel who evaluate the outcomes of
30.1.1 Study Design interest may have a belief or suspicion of which
treatment offers the best outcome. They may
The research protocols include (1) principal interpret marginal results in a way that favors
investigator who has the responsibility of the their presupposition if they are privy to the treat-
research, (2) clinical hypothesis, (3) research ment administered. It is important to understand
design, (4) assessment of progress status, (5) who is performing the data collection. Therefore,
30 Level 1 Evidence: Long-Term Clinical Results 287
patient and intervention from the beginning of Prospective cohort studies also allow investi-
the study. Outcomes that occur over the course of gators to examine multiple outcomes simultane-
the study period are assessed [3]. ously [16].
cal, as in classes of obesity; or based on a genetic outcome of interest. Pre-existing injuries sus-
trait. Prognostic studies answer the question tained by participating athletes were disregarded
“What is the effect of a patient characteristic on in the injury rate calculation. The presence of
the natural history of the condition?” [9]. varying levels of exposure to injury at the begin-
ning of the study indicates level two evidence [9].
10. Kapadia BH, Boylan MR, Elmallah RK, Krebs VE, 14. Skolasky RL, Maggard AM, Wegener ST, Riley Iii
Paulino CB, Mont MA. Does hemophilia increase the LH. Telephone-based intervention to improve reha-
risk of postoperative blood transfusion after lower bilitation engagement after spinal stenosis surgery: a
extremity total joint arthroplasty? J Arthroplast. prospective lagged controlled trial. J Bone Joint Surg.
2016;31:1578–82. https://doi.org/10.1016/j. 2018;100:21–30. https://doi.org/10.2106/JBJS.17.00418.
arth.2016.01.012. 15. Slobogean G, Bhandari M. Introducing levels of evi-
11. Koh KH, Ahn JH, Kim SM, Yoo JC. Treatment of dence to the journal of orthopaedic trauma: imple-
biceps tendon lesions in the setting of rotator cuff mentation and future directions. J Orthop Traumatol.
tears: prospective cohort study of tenotomy versus 2012;26(3):127–8.
tenodesis. Am J Sports Med. 2010;38(8):1584–90. 16. Song JW, Chung KC. Observational studies: cohort
https://doi.org/10.1177/0363546510364053. and case-control studies. Plast Reconstr Surg.
12. Sedgwick P. Prospective cohort studies: advantages 2010;126(6):2234–42. https://doi.org/10.1097/
and disadvantages. BMJ. 2013;347:f6726. https://doi. PRS.0b013e3181f44abc.
org/10.1136/bmj.f6726. 17.
Wright JG, Swiontkowski MF, Heckman
13. Sedgwick P. What is recall bias? BMJ. 2012;
JD. Introducing levels of evidence to the journal. J
344:e3519. Bone Joint Surg. 2003;85(1):1–3.
Level III Evidence: A Case-Control
Study
32
Andrew D. Lynch, Adam J. Popchak,
and James J. Irrgang
the provider. Medical records are not exhaustive prospective cohort or randomized controlled
records that are monitored for completeness in the trial, but this is balanced by the inexpensive and
same fashion that prospective study case-report potentially rapid completion of the study. These
forms are monitored. Therefore, there may be should not be considered “fatal flaws” but should
information bias associated with the reliance on be factored into the analysis and interpretation of
the medical record as the source of data for a the data and results, as well as the overall discus-
study—more information may be available for a sion [12, 13]. These can be accounted for with
patient who is having negative repercussions good research practice, including the creation of
associated with treatment [11]. detailed medical chart abstraction forms, struc-
Additionally, anything that happens to the tured design of retrospective surveys, adhering to
patient outside of the medical encounter will not strict inclusion and exclusion criteria for select-
be documented in the medical record. For this ing cases and controls, and ensuring that data col-
reason, participants are frequently asked to com- lectors are blinded to case vs. control status
plete surveys about their experiences and expo- whenever possible [2, 11, 12].
sures. The recall of the cases and controls may be
biased differently—a case participant may have
thought intensely about the scenarios that con- Fact Box 32.3: Sources and Types of Bias
tributed to the development of the condition, Information/observation bias occurs
while the control participant may not, leading the when information is gathered differently
case subject to provide more detailed and accu- for the case participants and control partici-
rate information or information that has been pants. This may be due to differences in
overanalyzed by the case participant [2, 9, 11, interview techniques, chart review meth-
12]. Additionally, survey data relies on the par- ods, or survey design that introduces bias in
ticipants being willing to accurately report their support of the research hypothesis.
experiences. In both cases and controls, recall Selection bias occurs when case partic-
bias could introduce potential error into the data ipants and control participants are chosen
set [7, 11]. via different selection criteria. Specifically,
There is also the potential for selection bias case participants and control participants
on the part of the researcher. The most simple must be selected without consideration of
design to execute involves cases and controls their exposure history or any variable asso-
who have received care from a single surgeon or ciated with the exposure.
group practice. However, in selecting cases from Recall bias occurs when something
a single surgeon or group practice, the researcher particular to the cases or controls influ-
only has access to potential cases who chose to ences the recall such that more detailed
follow up with that surgeon or group. The information is gathered from one group.
researcher may not have access to a case who
chose to go elsewhere for further care. In the case
of a practice that specializes in the rare outcome
of interest, it is unlikely that the practice will 32.4 Sample Selection
have performed the original care and may not
have complete access to the medical record, lead- When selecting cases and controls, the researcher
ing to an incomplete data set. Cases and controls must select participants who are comparable in
must be selected from a similar population and terms of both baseline risk of developing the out-
not in a way that introduces unwanted bias into come and the potential to have a complete data
the data set. set related to that individual. It is important that
The case-control design clearly lacks the rig- they come from the same general population and
orous attention to detail in data collection of a have the same general characteristics.
298 A. D. Lynch et al.
32.5 Statistical Analysis used in smaller samples where the true incidence
of the condition of interest is rare (generally less
A case-control study seeks to understand whether than 5%) but still produces an estimate of risk [4,
some exposure (disease, procedure, condition, or 11]. An odds ratio is determined with the follow-
patient characteristic) has any effect on the proba- ing formula:
bility of developing an outcome of interest [7, 9].
a
When reviewing the history of the cases and con- ad
trols, the presence or absence of an exposure OR = c =
b bc
should be obtained from the medical record or via
d
participant survey/interview. Then, each case (out-
come positive) and control (outcome negative) can Because the formula reduces down to a multi-
be classified as having been exposed (exposure plication of opposite corners, it is sometimes
positive) or not (exposure negative). This allows us referred to as a cross product. An odds ratio or
to create a simple 2 × 2 table or contingency table relative risk greater than 1 would indicate that
to illustrate the difference between the exposed exposure increases the risk of developing the out-
and unexposed (Fig. 32.1). We will discuss some come of interest, while an odds ratio or relative
simple statistics that can be used with a 2 × 2 table risk less than 1would indicate that exposure
but recommend the researcher follow-up with a decreases the risk of developing the outcome of
biostatistician for more complex, multivariable interest (i.e., the exposure is protective).
analysis. Additionally, methods for sensitivity
analysis and issues related to sample size estima-
tion are available but are beyond the scope of this 32.6 R
eporting of Case: Control
chapter [3, 4]. Studies
Typically, contingency tables are used to calcu-
late a risk ratio or relative risk (RR) to convey how When reporting results of a case-control study,
exposure to a predictor puts someone at risk for and even in the planning of a case-control study,
developing the outcome of interest. This is done researchers are recommended to consider the
by dividing the population incidence of the condi- checklist from the Strengthening the Reporting
tion of interest in those exposed [i.e., the propor- of Observational Studies in Epidemiology
tion of exposed individuals who developed the (STROBE) consortium, a standardized guideline
condition (e.g., # of cases/# of exposed or a/a + b)] for the reporting of epidemiological studies in
by the population incidence of the condition of medicine [12, 13]. The STROBE statement has
interest in nonexposed individuals [the proportion guidelines for report titles, abstracts, introduc-
of unexposed individuals who developed the con- tion, methods, results, and discussion sections.
dition (e.g., # of cases/# of nonexposed or c/c + d)].
However, the relative risk is not appropriate in a
case-control study because the selection of cases 32.7 Summary
and controls does not accurately reflect the true
population incidence [11]. Case-control studies should be used to retrospec-
In the case-control study, an odds ratio (OR) tively determine the role of an exposure in the
can be used. Note that this is a less precise method etiology of an outcome or condition of interest
consistent of treatment safety and diagnostic patient expectations by range of motion, stiff-
accuracy [10]. Case series are descriptive, and ness, and/or pain, thereby failing to discriminate
therefore findings should be exclusively pre- between improved and worsened patients [13].
sented by descriptive statistics, because compar- System-specific outcomes are localized by a
ative tests supported by p-values are irrelevant in specific body region, while disease-specific out-
the matter and should be avoided, along with comes are localized by the specific disease
conclusive reports; the treatment of interest and encountered by the patient [13]. Frequently,
its relative efficacy would be supported by an patients can be evaluated by both approaches.
immaterial hypothesis. The variable potential for Patients suffering from osteoarthritis of the knee
bias is also important to explicitly state for future can be assessed by instruments related to well-
prospective studies to ascertain validity of the being of the knee and/or instruments related to
treatment [10]. well-being of osteoarthritis, increasing the avail-
The most important outcomes that can be able comparative measures and sensitivity to
determined by clinical research are measures of change [13].
physical function and well-being. These are most General and overall health-related quality-of-
often deemed significant by their measures of life instruments mainly pertain to the health and
validity and reliability. Validity and reliability happiness, respectively, of a patient to encounter
refer to the ability of the study to measure the daily activities. The former is a multifactorial
variable of interest and the extent that repeated concept comprising physical, mental, and social
measures display similar results, respectively. factors that deal with a broad range of activities
The Western Ontario and McMaster Universities including work, hobbies, and social interactions;
(WOMAC) Osteoarthritis index score is an these factors thus measure the patient’s ability to
example of an outcome that correlates well with carry out everyday life [13]. The latter instrument
previously described instruments of pain, stiff- is similar but primarily concerns the patient’s sat-
ness, and physical function [2]. isfaction with their ability to participate in daily
A classification scheme for variable measures activities [13].
of outcomes has previously been proposed by
Wilson and Cleary, in which the higher-level out-
comes were increasingly influenced by outside, 33.4 Level IV Case Series Example
uncontrollable factors of individuals, environ-
ments, and/or nonmedical factors that increase A great example of case series research is a
the difficulty in measurement and ultimately the study performed by Geeslin and LaPrade, in
definition for the patient population [16]. These which they aimed to report objective stability
five levels of outcomes were suggested as bio- and subjective outcomes for a prospective series
logical and physiological variables (Level I), of patients with an acute grade III posterolateral
symptom status (Level II), functional status corner (PLC) knee injury treated with anatomic
(Level III), general health perceptions (Level IV), repair and/or reconstruction of all injured struc-
and overall quality of life (Level V) [16]. tures [6]. At this time, there was a lack of cur-
Instruments for reporting outcomes among ortho- rent reports regarding surgical treatment and
pedic research can be utilized in many forms outcomes of acute PLC injuries, with most of
including mixed clinician-based and functional the literature having been published a decade
outcome; system-specific, disease-specific, and prior [5, 15].
general health-related quality of life; and overall Upon completion of this case series, the
health-related quality of life. authors were able to report significantly improved
Mixed outcomes occasionally yield clouded objective stability, relative to preoperative condi-
findings due to the obstacle of interobserver reli- tions, that resulted from treatment of grade III
ability across physical examinations [14]. PLC injuries by acute repair of avulsed fractures,
Additionally, the broad report of measures into a reconstruction of midsubstance tears, and con-
summarized score deters from the specificity of current reconstruction of any cruciate ligament
304 M. I. Kennedy and R. F. LaPrade
tears [6]. This case series adds significant value to patient follow-up measures, and measurement
to the literature in showing the high yield of out- bias may exist with the absence of a standardized
comes by acute repair of the PLC structures. protocol [10]. Patients are expected to display
worse outcomes when they have become
deceased or switched to another hospital, which
33.5 Strengths and Pitfalls creates an instance of selection bias [10].
Measurement bias also becomes prevalent, as dif-
Case series, and all observational studies, are fering methods of outcome measurements are
uncontrolled, meaning the physician does not used in a study [10].
choose the treatment for the research participant.
This method carries both a positive and negative Take-Home Message
aspect in the research process. Results of uncon- • Case series provide a basis for designing fur-
trolled studies more often closer resemble routine ther research studies.
clinical practice than randomized trials and can • In the instance of unusual occurrences of a
often be better applied to clinical practice [1, 8]. disease, case series are effective for designing
With a diverse range of patients, a high external hypotheses for future prospective studies,
validity brings greater clinical relevance to differ- although no hypothesis is tested within the
ing medical centers and can better represent the case series itself.
population of interest; strict inclusion criteria • It is important to remember that case series are
established by randomized controlled trials unable to make comparisons; rather they
severely reduce the extent that findings can be exclusively report on the outcome findings
epitomized with common practice [4, 11]. relative to the treatment of the patients
Furthermore, from the financial and ethical involved in the series through descriptive
standpoint, the lack of comparison and random- statistics.
ization yields a cost-effective study design, and
choice treatment by the patient and physician
maintains a consistency for common standards of References
orthopedic practice [10]. 1. Audige L, Hanson B, Kopjar B. Issues in the plan-
However, an absent comparison group does ning and conduct of non-randomised studies.
provide limitations. Causal inferences cannot be Injury. 2006;37(4):340–8. https://doi.org/10.1016/j.
made, restricting findings to apparent relation- injury.2006.01.026.
2. Bellamy N. Pain assessment in osteoarthritis: expe-
ships, because the study design lacks an indepen- rience with the WOMAC osteoarthritis index. Semin
dent variable to differentiate outcomes by Arthritis Rheum. 1989;18(4 Suppl 2):14–7.
treatment protocol rather than to patient charac- 3. Carey TS, Boden SD. A critical guide to case series
teristics [10]. Analyzing a condition of interest to reports. Spine (Phila Pa 1976). 2003;28(15):1631–4.
https://doi.org/10.1097/01.BRS.0000083174.84050.
a control group provides possible correlations for E5.
either the presence or degree of exposure to a 4. Dalziel K, Round A, Stein K, Garside R, Castelnuovo
specific risk factor and collectors may take mea- E, Payne L. Do the findings of case series studies
surements unaware of the patient characteristics, vary significantly according to methodological char-
acteristics? Health Technol Assess. 2005;9(2):iii–v.
reducing the risk of bias [9]. 1-146.
Study design for case series studies that lack a 5. DeLee JC, Riley MB, Rockwood CA Jr. Acute
comparison group eliminates the potential for posterolateral rotatory instability of the knee. Am
confounding bias but increases the potential for J Sports Med. 1983;11(4):199–207. https://doi.
org/10.1177/036354658301100403.
selection and measurement bias which can be 6. Geeslin AG, LaPrade RF. Outcomes of treatment of
dependent by the approach of a prospective ver- acute grade-III isolated and combined posterolateral
sus retrospective study. Prospective studies more knee injuries: a prospective case series and surgical
regularly embody a consistent manner of study technique. J Bone Joint Surg Am. 2011;93(18):1672–
83. https://doi.org/10.2106/JBJS.J.01639.
design ranging from inclusion and data collection
33 Level 4 Evidence: Clinical Case Series 305
7. Hanson BP. Designing, conducting and reporting 12. McMaster WC, Sale K, Andersson GB, Bostrom MP,
clinical research. A step by step approach. Injury. Gebhardt MC, Trippel SB, Clark DC. The conduct
2006;37(7):583–94. https://doi.org/10.1016/j. of clinical research under the HIPAA Privacy Rule.
injury.2005.06.051. J Bone Joint Surg Am. 2006;88(12):2765–70. https://
8. Hartz A, Marsh JL. Methodologic issues in doi.org/10.2106/JBJS.F.00794.
observational studies. Clin Orthop Relat Res. 13. Poolman RW, Swiontkowski MF, Fairbank JC,
2003;(413):33–42. https://doi.org/10.1097/01. Schemitsch EH, Sprague S, de Vet HC. Outcome
blo.0000079325.41006.95. instruments: rationale for their use. J Bone Joint Surg
9. Hess DR. Retrospective studies and chart reviews. Am. 2009;91(Suppl 3):41–9. https://doi.org/10.2106/
Respir Care. 2004;49(10):1171–4. JBJS.H.01551.
10. Kooistra B, Dijkman B, Einhorn TA, Bhandari
14. Pynsent P, Fairbank J, Carr A. Outcome measures
M. How to design a good case series. J Bone Joint Surg in orthopaedics and orthopaedic trauma. New York:
Am. 2009;91(Suppl 3):21–6. https://doi.org/10.2106/ Oxford University Press; 2004.
JBJS.H.01573. 15. Stannard JP, Brown SL, Farris RC, McGwin G Jr,
11. Lloyd-Williams F, Mair F, Shiels C, Hanratty
Volgas DA. The posterolateral corner of the knee: repair
B, Goldstein P, Beaton S, Capewell S, Lye M, versus reconstruction. Am J Sports Med. 2005;33(6):
McDonald R, Roberts C, Connelly D. Why are 881–8. https://doi.org/10.1177/0363546504271208.
patients in clinical trials of heart failure not like 16. Wilson IB, Cleary PD. Linking clinical variables with
those we see in everyday practice? J Clin Epidemiol. health-related quality of life. A conceptual model of
2003;56(12):1157–62. patient outcomes. JAMA. 1995;273(1):59–65.
How to Perform a Clinical Study:
Level 4 Evidence—Case Report
34
Andrew J. Sheean, Gregory V. Gasbarro,
Nasef M. N. Abedelatif, and Volker Musahl
34.1 The Case for Case Reports have been published in the last 10 years [2].
Considering the current state of orthopaedic lit-
The advent of evidenced-based medicine has rev- erature, one cannot help but to ask: “Is there still
olutionized orthopaedic clinical research. a place for case reports?”
Surgeons, now more than ever, seek answers to The utility of case reports is derived from the
diagnostic and therapeutic dilemmas in the pages novelty of both musculoskeletal conditions and
of medical journals, which have raised the bar for their treatments. New conditions are discovered,
what constitutes a “publishable” study. Study previously described conditions present them-
methodology is now routinely scrutinized with selves in unique ways, and modifications to exist-
various quality assessment tools, and, depending ing therapies or new therapies altogether can
upon the type of study, authors are required to enhance patient care. The publication of a well-
describe their methods using standardized flow conceived case report need not be at odds with
diagrams and predetermined checklists. Better contemporary evidence-based medicine [1, 6].
methods yield more precise results, which sup- Rather, the case report should occupy a special
port stronger conclusions. These conclusions are place within the orthopaedic literature. The case
more likely to alter practice habits and improve report allows its author to alert readers to a novel
care. Moreover, authors recognize readers’ (and aspect of medicine, which can improve the clini-
journal editorial boards’) appetites for higher lev- cian’s diagnostic acumen and/or stimulate larger
els of evidence and now increasingly strive to scale initiatives to better understand the benefits
complete prospective initiatives that routinely of a particular innovation. The purpose of this
compare unique patient cohorts or different treat- chapter is to provide a blueprint for the publica-
ment strategies. These observations are substanti- tion of a case report that is focused, informative,
ated by a recent systematic review of available and capable of reaching the widest audience
randomized controlled trials (RCT) pertaining to possible.
anterior cruciate ligament (ACL) reconstruction,
which noted that 60% of the 412 relevant studies
34.2 Case Report Taxonomy
A. J. Sheean (*) · G. V. Gasbarro · V. Musahl The case report is a retrospective analysis of one,
Department of Orthopaedic Surgery, two, or three clinical cases. These types of stud-
University of Pittsburgh, Pittsburgh, PA, USA
ies in the orthopaedic literature can generally be
N. M. N. Abedelatif classified as one of three types based upon the
Beni Suef University, Beni Suef, Egypt
Fact Box 34.1: Last Published Case Report in Orthopaedic Subspecialty Journals.
Impact Factor According to the 2016 Journal Citation Report (Clarivate Analytics)
Fact Box 34.2: Example of Author Instructions for Journals That Have Published a Case Report
After January 2017 and Still Currently Accept Submissions
Orthopaedic
journal name Instructions for author
Knee Surgery, • Short description of one (or few) case(s) that are more or less unique in their presentation
Sports and/or a new unique form of treatment
Traumatology • Please note that we only publish unique and rare case reports and generally discourage
and them
Arthroscopy • Specifications: 100-word abstract, 1000-word text including references
• http://www.kssta.org/authors/instructions/casereport/
The Journal of • Pre-submission approval with a proposal or formal invitation is not required
the American • Summarize pertinent unusual or unexpected elements of case with discussion reviewing
Academy of scientific/educational value
Orthopaedic • Informed consent must be obtained from participating subject(s)
Surgeons • Specifications: 150-word abstract, 1500-word text, 3 figure panels, and 10 references
• http://edmgr.ovid.com/jaaos/accounts/authinst.pdf
Journal of • Encourage submission to JSES open access, a quarterly, online publication
Shoulder and • $750 submission fee
Elbow Surgery • Minimum 2-year follow-up is not required
• Specifications: No abstract, include keywords at the end of the introduction, and
2250-word text
• http://www.jshoulderelbow.org/content/authorinfo
Foot and Ankle • Very few case reports are accepted for publication
International • Case reports must offer either new information that has been previously unpublished and
offer completely new information or information that will change the current practice
patterns of our readers
• Entities that are unique in and of themselves bizarre, or common, will not be accepted as
case reports
• Sections should include introduction, case report, discussion, and summary/conclusion
• Specifications: no abstract
• https://us.sagepub.com/en-us/nam/journal/foot-ankle-international#CaseReports
34 How to Perform a Clinical Study: Level 4 Evidence—Case Report 311
espousing the utility of novel treatments without clinical presentations, and treatment approaches.
sufficient clinical follow-up are unlikely to com- Through a concise, focused description of a
pel readers to seriously consider the implementa- unique clinical scenario, authors can employ the
tion of the proposed therapy. case report to share important details about
diagnosis and treatment. The dissemination of
this information has the potential to sharpen
34.5 Discussion diagnostic practices and catalyze treatment
innovation. In these ways, the case report
The discussion section summarizes the key remains a worthwhile addition to the orthopae-
aspects of the clinical scenario that explain its dic literature.
worthiness of a case report. For anatomic varia-
tions, “normal” should be defined based upon Take-Home Message
prior epidemiologic descriptions, and the impli- • In the era of evidence-based medicine, the
cations of the variation should be discussed. For importance of case reports stems from their
entirely new conditions, the clinical presentation utility in describing new conditions, commu-
should be detailed and possible treatment strate- nicating a novel presentation of a previously
gies proposed. Novel presentations of known described conditions, or detailing new treat-
musculoskeletal conditions should be discussed ment approaches.
in relation to the modalities typically used to
make the diagnosis. Was there something unique
about the current case that obscured the diagno- References
sis, and, if so, how was the diagnosis made? For
new treatments, the rationale for a unique 1. Friedman JN. The case for ... writing case reports.
approach should be explained. Does the new Paediatr Child Health. 2006;11:343–4.
2. Kay J, Memon M, Sa D, Simunovic N, Musahl V, Fu
treatment address deficiencies in more traditional FH, et al. A historical analysis of randomized con-
therapies? Additionally, the proposed advantages trolled trials in anterior cruciate ligament surgery. J
of the new treatment should be clearly described. Bone Joint Surg Am. 2017;99:2062–8.
In each of these circumstances, a focused 3. Schmitz MR, Jenne J. Acute tetraparesis caused by
a cervical spine synovial cyst associated with an os
review of the relevant literature is critical in order odontoideum: a case report. JBJS Case Connect.
to place the case report in a broader context. The 2012;2:e17.
literature review serves the purpose of explaining 4. Sheean AJ, Barrow AE, Burns TC, Schmitz
how the case report differs from what has already MR. Iatrogenic hip instability treated with peri-
acetabular osteotomy. J Am Acad Orthop Surg.
been described. However, only the most pertinent 2017;25:594–9.
references should be cited, as most case reports 5. Sun Z. Tips for writing a case report for the novice
are limited to including between 10 and 15 author. J Med Radiat Sci. 2013;60:108–13.
references. 6. Vandenbroucke JP. In defense of case reports and case
series. Ann Intern Med. 2001;134:330–4.
7. Warner JJ, Paletta GA, Warren RF. Accessory head of
the biceps brachii. Case report demonstrating clinical
34.6 Conclusion relevance. Clin Orthop Relat Res. 1992;179:181.
where the evidence is gathered in real-life clinical criteria, protocol registration, and classification
settings and there is greater emphasis on the of research quality to name a few [7].
external validity of the evidence (generalizabil- Considering the obvious limitations of level 5
ity) rather than on its internal validity (validity of evidence-based methods, we can recommend the
causal inference) in order to develop better inter- utilization of these methods in contributing to
ventions [1, 2]. informed clinical decision-making and informing
Despite the obvious benefit of expert consen- clinical practice. An important counter-question
sus and guidelines, these methods are not without to ask is: How were some types of evidence
their limitations. assigned to be higher in the hierarchy than oth-
Firstly, the methods of developing expert ers? The answer is “expert consensus.” Expert
opinions or consensus statements are often not consensus is seen as a suitable method to validate
reported clearly, and thus, one cannot be certain hierarchies of evidence or indeed other compo-
how the evidence has been collected or assessed. nents of the evidence-based medicine enterprise,
For example, the recent “World Heart Expert such as the CONSORT Statement for reporting
Consensus Statement on Antiplatelet Therapy in randomized controlled trials and principles for
East Asian Patients with acute coronary syn- developing practice guidelines [8, 9]. More gen-
drome (ACS) or undergoing percutaneous coro- erally, consensus has an important role in the sci-
nary intervention (PCI)” summarized the latest entific process. New theories gain ground as
data on the use of antiplatelet in the East Asian more members of the scientific community see a
populations with ACS or for whom PCI is per- new theory as giving a better account of the evi-
formed [3]. An expert panel of a wide array of dence than older ones and thus act as a funda-
experiences and knowledge writes the document, mental starting point in the evidence-based
and yet, the methodology of the process summa- medicine approach.
rizing the evidence cited in this statement remains
unreported [4].
A second limitation of level 5 evidence is that 35.2 C
onsensus Group Methods:
recommendations and consensus statements are Definition and Rationale
influenced by the opinions, clinical experience,
and composition of the group that develops them. Level 5 evidence-based methods may be
The beliefs and opinions to which experts sub- achieved through many formal methods, the
scribe can be based on misconceptions and per- most common of which are expert opinions or
sonal bias recollections that misrepresent consensus group methods such as the Delphi or
population norms or may not reflect the view of nominal group technique (NGT) [2]. The goal of
their respective profession as a whole [5, 6]. these methodologies is to establish how well
Finally in contrast to a formal evidence-based experts agree on a particular topic or issue, with
medicine (EBM) approach, consensus statements the premise that accurate and reliable assessment
and guidelines may be unduly influenced by can be best achieved by consulting a panel of
ascertainment bias (inclusion of subjects not rep- experts and generating a group consensus on that
resentative of the general population or selective particular issue. Consensus group methods have
analysis of groups at opposite extremes) or an demonstrated acceptable construct validity and
emotional case study (more recent or more publi- reliability [6, 10]. Utilizing these methods may
cized stories or successful patient encounters) to assist in the development of an evidence base
influence the final recommendations. Hence that can guide decisions, policy makers, and
higher-level evidence-based methodologies are practitioners, thus ensuring practitioners can
usually free of such influences attributed to move beyond relying on their own individual
requirements and guidelines for eligibility experiences by drawing on the accumulated
35 Level 5: Evidence 315
body of knowledge of a larger, expert group [2]. point, five-point, and seven-point scales have
Common features necessary for carrying out also been used [14–17]. Respondents are asked,
consensus group methods include anonymity, to rate both items, as well as to compose free-
iteration, controlled feedback, statistical group text comments that, for example, explain their
response, and structured interaction all of which rating and express disagreement with the state-
differentiate informal group meetings from for- ment’s relevance [18]. While systematic litera-
mal consensus methods [11]. Consensus tech- ture searches and focus groups are often used for
niques such as the NGT and Delphi technique the initial Delphi questionnaire, expert panel
are similar to focus groups. A key strength of members are given the opportunity to suggest
consensus methods is the balanced participation additional items when completing the question-
from group members, unlike a focus group, naire [19].
whereby the facilitator must control for, and Members of the research team analyze, col-
minimize the risk of, a dominant participant late, and compile an inclusive list of responses
influencing the discussion [12]. for subsequent resubmission to the expert
panel in the form of a second-round question-
naire. In all subsequent rounds of the Delphi
35.3 The Delphi Method process, the panel members are provided with
feedback pertaining to the individual responses
The Delphi technique involves the gathering of provided and those of the other panelists. The
expert opinion by means of structured or semi- process of questionnaire circulation and con-
structured rounds of questioning subsequently trolled opinion feedback is continued until
leading to a consensus of opinion on a particular consensus has been reached. Ordinarily the
topic [13]. The Rand Corporation first developed Delphi process requires between two and four
the Delphi technique in 1953 [14]. There is no rounds to gather a consensus of judgment,
standard method to calculate a panel size for the opinion, or choice.
Delphi technique with a sample of about 15 com- The results of the multiple rounds of a Delphi
monly accepted in the literature [12]. study may be difficult to communicate due to the
Carrying out a Delphi study involves a series complexity. One common method involves the
of steps and choices. The first step in the Delphi use of flowcharts showing the fate of items at
technique involves a sufficient review of the each round. The simplest output method from a
available literature in order to generate and clar- Delphi study is a list of accepted items or alterna-
ify a clear research question. The next step tively where the number of accepted items is
involves the selection of a suitable expert panel small, simply reporting the list may in fact be suf-
or individuals with expertise relevant to the ques- ficient [19].
tion. Following the formation of a suitable panel Some methodological concerns regarding the
of individuals, a questionnaire is constructed by Delphi method that should be taken under con-
the coordinator, which will then be used to gather sideration before carrying out such a study
opinions from the expert panel. include the definition of an expert panel member,
The expert panel is then provided with an ini- the potential bias of selecting panel members,
tial questionnaire (round one) that uses open- anonymity of individual responses, questionnaire
ended questions to generate a list of ideas or design, and scoring methods and, therefore,
concepts related to the research question. The should all be considered and addressed prior to
first-round questionnaire will present a series of undertaking the study [20].
statements that the respondent is asked to rate on A graphical representation of the multiple
a clearly defined Likert scale. Often a nine-point stages involved in the Delphi process is outlined
Likert scale is used for the rating, although three- in Fig. 35.1.
316 S. Mc Auliffe and P. D’Hooghe
35.4 Nominal Group Technique ous rounds, as well as providing ample opportu-
(NGT) nity for a grouping step, where similar ideas are
grouped together with agreement from all par-
The NGT method incorporates a structured face- ticipants. Participants are also afforded the
to-face group interaction. The NGT involves a opportunity to exclude, include, or alter ideas, as
four-stage process, namely, silent generation, well as generate grouping themes. Both the
round robin, clarification, and voting (ranking or round robin and clarification phases may take up
rating) [21]. to 30 min, respectively [12].
To initiate the NGT process, one to two ques- The next stage of the NGT method is known
tions are sent to participants in advance of a as the “voting or ranking” stage where partici-
face-to-face group meeting. Subsequently, at the pants are provided with a ranking sheet, requiring
beginning of the meeting, participants are given them to identify their top preferences from the
time to reflect or record their individual ideas in generated ideas in the previous stages with larger
response to a question, in a process known as numbers reflecting greater importance. The num-
“silent generation.” The next stage of the NGT ber of items chosen by participants may vary
process is known as the “round robin” process, according to the topic of interest, but the ranking
in which the facilitator requests one participant of five ideas is common in the literature [12, 22–
at a time to state a single idea to the group. 27]. The final stage of the NGT process is referred
Participants are afforded as much time as to as the “discussion stage,” where the scores for
required until no new further ideas are gener- each idea are summed and presented to the group
ated, with ideas recorded verbatim on, for exam- for discussion. The timing for this stage is likely
ple, a flipchart or white board. The third stage of to depend on a number of factors, including the
the NGT process is known as the “clarification complexity of the topic and how many items need
stage” [21]. This stage provides the opportunity to be prioritized. A graphical illustration of the
for clarification of the ideas generated in previ- NGT method is outlined in Fig. 35.2 [12].
35 Level 5: Evidence 317
Silent Generation
The consensus meeting which took place
Idea generation
on the 17th–18th of November 2017 at the
Round Robin (Literature or University of Pittsburgh (Fig. 35.3) repre-
Surveys) sents a culmination of the Delphi process
methodology which began in early 2017 and
Clarification ultimately aims to help clinicians provide
appropriate treatment of ankle articular carti-
lage injuries based on expert-level opinion.
Voting (Ranking)
Discussion
35.5 Conclusion
Fig. 35.3 A practical example on the Delphi method used at a surgical meeting in Pittsburgh on cartilage repair of the
ankle
35 Level 5: Evidence 319
35.6 Website 12. McMillan SS, Kelly F, Sav A, Kendall E, King MA,
Whitty JA, et al. Using the nominal group technique:
how to analyse across multiple groups. Health Serv
• Oxford Centre for Evidence-based Medicine Outcomes Res Method. 2014;14:92–108.
Levels of Evidence (May 2001). http://www. 13. O’Neill S, Watson PJ, Barry S. A Delphi study
cebm.net/oxford-centre-evidence-based- of risk factors for achilles tendinopathy- opinions
of world tendon experts. Int J Sports Phys Ther.
medicine-levels-evidence-march-2009/ 2016;11(5):684–97.
14. Linstone HA, Turoff M. The Delphi survey: method
techniques and applications. Reading: Addison-
Wesley; 1975.
15. Cantrill JA, Sibbald B, Buetow S. Indicators of the
References appropriateness of long term prescribing in general
practice in the United Kingdom: consensus develop-
1. Green LW. Making research relevant: if it is an ment, face and content validity, feasibility and reli-
evidence-based practice, where’s the practice-based ability. Qual Health Care. 1998;7:130–5.
evidence? Fam Pract. 2008;25(Suppl 1):20–4. 16. Cassar Flores A, Marshall S, Cordina M. Use of the
2. Minas H, Jorm AFO. Where there is no evidence: Delphi technique to determine safety features to be
use of expert consensus methods to fill the evidence included in a neonatal and paediatric prescription
gap in low-income countries and cultural minori- chart. Int J Clin Pharmacol. 2014;36(6):1179–89.
ties. Int J Ment Health Syst. 2010;4:33. https://doi. 17. Chan A, Tan SH, Wong CM, Yap KY, Ko Y. Clinically
org/10.1186/1752-4458-4-33. significant drug–drug interactions between oral anti-
3. Levine G, Jeong Y, Goto S, Anderson J, Huo Y, Mega cancer agents and nonanticancer agents: a Delphi sur-
J, Taubert K, Smith S. Expert consensus document: vey of oncology pharmacists. Clin Ther. 2009;31(Pt
World Heart Federation expert consensus statement on 2):2379–86; 59(4):334–41.
antiplatelet therapy in East Asian patients with ACS or 18. McMillan SS, King M, Tully MP. How to use the
undergoing PCI. Nat Rev Cardiol. 2014;11(10):597– nominal group and Delphi techniques. Int J Clin
606. https://doi.org/10.1038/nrcardio.2014.104. Epub Pharmacol. 2016;38:655.
2014 Aug 26. 19. Jorm A. Using the Delphi expert consensus method
4. Kwong JSW, Chen H, Sun X. Development of in mental health research. Aust N Z J Psychiatry.
evidence- based recommendations: implications for 2015;49(10):887–97.
preparing expert consensus statements. Chin Med J 20. Thangaratinam S, Redman CW. The Delphi tech-
(Engl). 2016;129:2998–3000. nique. Obstet Gynaecol. 2005;7:120–5. https://doi.
5. Kane RL. Creating practice guidelines: the dangers of org/10.1576/toag.7.2.120.27071.
over-reliance on expert judgment. J Law Med Ethics. 21. Delbecq AL, van de Ven AH, Gustafson DH. group
1995;23:62–4. techniques for program planning, a guide to nominal
6. Pacini D, Murana G, Leone A, Di Marco L, Pantaleo group and Delphi processes. Scott, Foresman and
A. The value and limitations of guidelines, expert Company: Glenview; 1975.
consensus, and registries on the management of 22. Aljamal M, Ashcroft DM, Tully MP. Development of
patients with thoracic aortic disease. Korean J Thorac indicators to assess the quality of medicines reconcili-
Cardiovasc Surg. 2016;49(6):413–20. https://doi. ation at hospital admission: an e-Delphi study. Int J
org/10.5090/kjtcs.2016.49.6.413. Pharm Pract. 2016;24(3):209–16.
7. Echemendia RJ, Giza CC, Kutcher JS. Developing 23. Cross H. Consensus methods: a bridge between clini-
guidelines for return to play: consensus cal reasoning and clinical research? Int J Lepr Other
and evidence- based approaches. Brain Inj. Mycobact Dis. 2005;73(1):28–32.
2015;29(2):185–94. 24. Dening KH, Jones L, Sampson EL. Preferences for
8. Schulz KF, Altman DG, Moher D. CONSORT 2010 end-of-life care: a nominal group study of people
statement: updated guidelines for reporting parallel with dementia and their family careers. Palliat Med.
group randomized trials. BMJ. 2010;340:32. 2012;27(5):409–17.
9. Shekelle PG, Woolf SH, Eccles M, Grimshaw 25. Kuhn TS. The structure of scientific revolutions. 2nd
J. Developing clinical guidelines. West J Med. ed. Chicago: University of Chicago Press; 1970.
1999;170(6):348–51. 26. Sandrey MA, Bulger SM. The Delphi Method: an
10. Hutchings A, Raine R, Sanderson C, Black N. A com- approach for facilitating evidence based practice in
parison of formal consensus methods used for devel- athletic training. Athl Train Educ J. 2008;3(4):135–42.
oping clinical guidelines. J Health Serv Res Policy. 27. Humphrey-Murto S, Varpio L, Gonsalves C, Wood
2006;11(4):218–24. TJ. Using consensus group methods such as Delphi
11. Jones J, Hunter D. Consensus methods for medical and Nominal Group in medical education research,
and health services research. BMJ. 1995;311:376–80. Medical Teacher; 2016.
Part VI
How to Perform a Review Article?
Type of Review and How to Get
Started
36
Matthew Skelly, Andrew Duong,
Nicole Simunovic, and Olufemi R. Ayeni
Narrative reviews, also known as literature confounding factors [19]. In 2004 a systematic
reviews, provide an overview of the current state review was published on the same topic. It spe-
of knowledge on a given topic. For example, a cifically explored factors impacting the health-
narrative review may look at the broad topic of related quality of life of patients undergoing total
humeral fractures. They do not necessarily follow hip and knee arthroplasty [5]. It not only exam-
a systematic methodology and typically focus on ined which patient group had the best functional
key articles or even expert opinions in a given outcomes but also took into account confounding
topic area. Narrative reviews sometimes use non- factors. Among its conclusions were that patient
peer reviewed sources such as editorials, book age was not an obstacle to effective surgery, men
chapters and interviews. Narrative reviews are improved more than women from these total
not required to describe the methodology by arthroplasties, and after a follow-up time of
which authors assembled the literature cited in 1 year, there was no difference in health-related
the review, where findings can be heavily depen- quality of life between weight groups [5]. This
dent on the literature that authors chose to helped to dispel prior thoughts of refusing to per-
include. This might introduce a significant selec- form hip arthroplasty on obese patients on the
tion bias and creates the potential for indepen- basis of weight [5]. The systematic review also
dent authors to arrive at different conclusions, used its data to examine the effect of comorbidi-
despite seeking to answer the same research ties, levels of preoperative function, wait time for
question. surgery and procedure type [5]. The difference in
findings and the extent to which each factor was
explored can be directly attributed to method-
• Systematic reviews follow a stringent, ological and source differences between narrative
planned and reproducible process of and systematic reviews [8].
searching and identifying relevant arti- Overall it could be said that systematic reviews
cles to answer their proposed research fall on the more objective end of the spectrum,
question. while narrative reviews are often more subjective
• Narrative reviews do not necessarily fol- [8, 16, 17].
low a systematic methodology and typi-
cally focus on key articles or even expert
opinions in a given topic area. 36.3 W
hy Conduct a Narrative
Review over a Systematic
Review?
36.2.1 Practical Example
Narrative reviews can be written relatively
In 1998, a narrative review was performed to quickly and provide readers with current knowl-
determine what patient-related factors affect the edge about a certain topic. They tend to be writ-
functional outcome of total hip arthroplasty [19]. ten by authors who are experts in the field and are
This review conducted a brief literature search of therefore able to elaborate on their conclusions
one database and included other additional arti- through their own personal experiences, theories
cles identified as relevant by the authors. The or models and educated opinions. This additional
authors concluded that the best functional out- insight can be invaluable in new areas of research
comes were reported by patients between the that are lacking a sufficient body of literature. In
ages of 45 and 75, who weighed less than 70 kg comparison, systematic reviews are time-
and who had a better preoperative functional sta- intensive to perform, making them most useful
tus, with few to no baseline comorbidities [19]. when there is a large body of primary research
The review also indicated that women had better studies available to address a specific research
functional outcomes and prosthesis survival rates question (Table 36.1). In the event that the
than men but stated that this may be the result of reviewed studies share a common outcome, a
36 Type of Review and How to Get Started 325
meta-analysis may be performed. A m eta-analysis Most authors begin with a literature search.
involves the careful consideration of the quality PubMed, Embase, MEDLINE and Google
and strength of each study included in the sys- Scholar are databases for scientific articles where
tematic review to create a larger and more accu- most medical published studies can be found. In
rate pooled estimate of the effect of a treatment. addition, authors can reach out to subject matter
experts (SME) for their thoughts on the topic.
Although difficult to cite, SMEs have an under-
• Narrative reviews can be written quickly, standing of current evidence and the latest ongo-
they tend to by written by experts of the ing research and can provide feedback regarding
field they concern, and they are invalu- the planned search and direct one towards appro-
able in new areas of research with little priate resources.
available research. One often overlooked area for understanding
• Systematic reviews are more appropri- emerging topics of interest in a given field is
ate when there is a large body of evi- through the ‘grey literature’. This includes confer-
dence that could benefit from ence proceedings, reports and other documents
summarization or pooling of results to that are not published in scientific journals. In
answer the research question. most cases these sources are not peer-reviewed,
and thus their level of quality can be quite varied.
Conference proceedings can be reviewed prior to
acceptance, but it can be based on incomplete
36.4 How to Get Started data, as the authors can choose to present prelimi-
nary findings before the study is complete [12].
36.4.1 Background Research
The first step before choosing both a research 36.4.2 The Outline
topic/question and type of review to address it is
to get a preliminary sense of the available litera- After deciding on an appropriate topic, the next
ture. Background research can inform how best step is to design an outline for the review. There
to identify more sources of information and what is no set structure for designing a narrative
search terms may be relevant. The amount and review, but typically narrative reviews of medical
quality of literature can also help determine literature include introduction and discussion
whether one should write a narrative review or sections [7]. In contrast, systematic reviews have
systematic review. a well-established structure. They require a con-
326 M. Skelly et al.
densed abstract, an introduction, a reproducible Another strategy is the order in which the
description of the methodology, a summary of review is written. The important sections of a nar-
the available literature in the results and a discus- rative review of medical literature include the fig-
sion section drawing overall conclusions from ures and tables (when appropriate), the abstract,
the findings. Guidelines, such as the Preferred the title and the main text. The main text of litera-
Reporting Items for Systematic Reviews and ture reviews can be further broken down into the
Meta-Analyses (PRISMA) or the Consolidated introduction and discussion sections, with some
Standards of Reporting Trials (CONSORT), pro- authors choosing to include methods and/or
vide reporting standards that many journals now results sections. In comparison, systematic
require be implemented in a published systematic reviews are required to report all of these compo-
review to aid in their critical appraisal and inter- nents in order to remain transparent.
pretation [14, 15, 18]. More recently, journals Figures and tables can also be drafted early on
have encouraged the registration of systematic to help organize the structure of the review and
reviews. The Cochrane Library (http://cochraneli- focus the main findings. It is essential to create
brary.com) and PROSPERO (http://crd.york.ac. these early in the writing process, as they will be
uk/prospero/) offer repositories for ongoing and referred to throughout the review. Systematic
completed reviews. reviews typically include a flow diagram depict-
Regardless of how the review is structured, the ing the screening process (following PRISMA
outline should be as detailed as possible. It should guidelines), a study characteristics table and the
address the planned structure of the paper and detailed search terms in an appendix to ensure the
what information will be addressed in each sec- results can be reproduced. Narrative reviews may
tion. Creating an outline early on allows the paper or may not include these figures or tables.
to be revised without interruption to the literary
flow and to coordinate what key messages are to
be conveyed throughout each section. Ideally, 36.6 Reviews in the Orthopaedic
when using this outline to write the article, the Literature: A Cautionary Tale
author should be able to complete the paper with-
out having to conduct additional research. More Narrative reviews are not typically relied upon
details about conducting a systematic review can within orthopaedics to guide clinical decisions.
be found in the next chapter. This weight falls upon systematic reviews, which
are historically cited more than their nonsystem-
atic counterparts [2]. In the past, surgical litera-
36.5 Writing the Review ture, including orthopaedic literature, has been
found to be lacking in quality [4]. Bhandari et al.
With the outline completed, the next step is to applied the Detsky scale to assess the reporting
write the review. While this may seem like a dif- quality of 72 randomized controlled trials (RCTs)
ficult task at first, there are strategies that can published in the Journal of Bone and Joint
make the job easier. Surgery from 1988 to 2000. Only 32 (43%) of
The first is the use of reference management those studies were found to be of high quality [3].
software that can be found online [e.g. Mendeley The poor quality of this literature was attributed
(Elsevier, 2017), BibMe (Chegg, 2017)]. to two reasons. The first was a reliance by sys-
Systematic reviews in particular often require an tematic reviews on low-quality evidence, such as
extensive number of articles to be screened and case studies and case series. These are inherently
referenced in the final paper. The software can act limited by their retrospective nature, potential for
as an organized repository of information during bias and lack of comparative groups. The second
manuscript preparation. The majority of refer- is that high quality systematic reviews regularly
ence management software can also automate failed to sufficiently report the study parameters,
formatting of references to save time when it including methodological parameters of the
comes time to submit the review to a journal. study, their sources of funding and of potential
36 Type of Review and How to Get Started 327
conflicts, as well as the quality of their evidence 4. Chaudhry H, Mundi R, Singh I, Einhorn TA, Bhandari
M. How good is the orthopaedic literature? Indian J
[4]. In the last decade, there has been a dramatic Orthop. 2008;42:144–9.
growth in the number of randomized controlled 5. Ethgen O, Bruyère O, Richy F, Dardennes C,
trials and systematic reviews published in ortho- Reginster J-Y. Health-related quality of life in total
paedic literature. Publications in orthopaedics on hip and total knee arthroplasty. A qualitative and sys-
tematic review of the literature. J Bone Joint Surg Am.
the whole are estimated to have doubled between 2004;86:963–74.
2000 and 2011 [9]. However, the quality of these 6. Gagnier JJ, Kellam PJ. Reporting and methodologi-
publications and reviews were still poor, despite cal quality of systematic reviews in the orthopaedic
publishing in top journals [6]. This reduces the literature. J Bone Joint Surg Am. 2013;95:e771–7.
7. Green B, Johnson C, Adams A. Writing narrative lit-
impact of published systematic reviews and their erature reviews for peer-reviewed journals: secrets of
ability to aid in clinical decision making. By the trade. J Chiropr Med. 2006;5:101–17.
using the strategies described in this chapter to 8. Hitchcock M. Review vs systematic review
carefully plan any type of review, authors can vs ETC. In: LibGuides Nurs. Resour. 2017.
http://researchguides.ebling.library.wisc.edu/c.
minimize bias and ensure adequate reporting to php?g=293229&p=1953452. Accessed 6 Oct 2017.
demonstrate higher quality. 9. Hui Z, Yi Z, Peng J. Bibliometric analysis of the ortho-
pedic literature. Orthopedics. 2013;36:e1225–32.
10. Hurwitz S, Slawson D, Shaunessy A. Orthopaedic
information mastery: applying evidence-based infor-
36.7 Conclusion mation tools to improve patient outcomes while
saving orthopaedists’ time. J Bone Joint Surg Am.
Narrative reviews can be written relatively 2000;82(6):888–94.
quickly and often provide the most current and 11. Hussain N, Turvey S, Bhandari M. Keeping up
with best evidence: what resources are available? J
up-to-date information on a chosen topic. They Postgrad Med Edu Res. 2012;46:4–7.
are commonly written by authors who are experts 12. Kay J, Memon M, Rogozinsky J, de Sa D,
in the discussed topic, allowing those experts to Simunovic N, Seil R, Karlsson J, Ayeni OR. The
put forward their educated opinions and ideas. rate of publication of free papers at the 2008 and
2010 European Society of Sports Traumatology
Systematic reviews are conducted using a Knee Surgery and Arthroscopy congresses. J Exp
planned methodological structure, they attempt Orthop. 2017;4:15.
to encompass all available literature for a chosen 13. Miller LE, Gondusky JS, Bhattacharyya S, Kamath
topic, and they provide reproducible and typi- AF, Boettner F, Wright J. Does surgical approach
affect outcomes in total hip arthroplasty through 90
cally less biased results. This chapter has eluci- days of follow-up? A systematic review with meta-
dated the differences between these two types of analysis. J Arthroplasty. 2017;33(4):1296–302.
reviews and serves to provide a starting point for 14. Moher D. Consort: an evolving tool to help improve
any author thinking of conducting a narrative the quality of reports of randomized controlled trials.
JAMA. 1998;279:1489–91.
review of their own. 15. Moher D, Shamseer L, Clarke M, Ghersi D, Liberati
A, Petticrew M, Shekelle P, Stewart LA. Preferred
reporting items for systematic review and meta-
References analysis protocols (PRISMA-P) 2015 statement. Syst
Rev. 2015;4:1.
1. Alper B, Hand J, Elliott S. How much effort is needed 16. Riemsma RP, Pattenden J, Bridle C, Sowden AJ,
to keep up with the literature relevant for primary Mather L, Watt IS, Walker A. Systematic review of the
care. J Med Libr Assoc. 2004;92:429. effectiveness of stage based interventions to promote
2. Bhandari M, Montori VM, Devereaux PJ, Wilczynski smoking cessation. Br Med J. 2003;326:1175–7.
NL, Morgan D, Haynes RB, Hedges Team. Doubling 17. Rother ET. Systematic literature review X narrative
the impact: publication of systematic review arti- review. Acta Paul Enferm. 2007;20:5–6.
cles in orthopaedic journals. J Bone Joint Surg Am. 18. Schulz KF, Altman DG, Moher D. CONSORT 2010
2004;86:1012–6. statement: updated guidelines for reporting parallel
3. Bhandari M, Richards RR, Sprague S, Schemitsch group randomised trials. BMC Med. 2010;8:18.
EH. The quality of reporting of randomized trials 19. Young NL, Cheah D, Waddell JP, Wright JG. Patient
in the Journal of Bone and Joint Surgery from 1988 characteristics that affect the outcome of total hip
through 2000. J Bone Joint Surg Am. 2002;84:388–96. arthroplasty: a review. Can J Surg. 1998;41:188–95.
Part VII
How to Perform a Systematic Review or
Meta-analysis?
What Is the Difference Between
a Systematic Review
37
and a Meta-analysis?
the most effective and efficient method in synthe- e ligibility criteria, are reported. Narrative reviews
sizing available data in order to make evidence- employ a qualitative approach to critically
based decisions is by conducting systematic appraise literature from a specific context or the-
reviews and meta-analyses. These methods iden- oretical perspective. A narrative review is highly
tify, critically appraise, and evaluate several stud- susceptible to bias and lacks reproducibility.
ies pertaining to a single prespecified research Researchers supplement narrative reviews with
question [21]. Both methodologies allow expert intuitive and experiential evidence, reflect-
researchers to combine large portions of litera- ing the qualitative approach [20]. The lack of a
ture and produce results that are widely general- systematic process predisposes this type of
izable. Additional benefits include the ability to review to bias, specifically subjective selection
limit biases found in individual studies, which bias, which has implications on the validity and
increases the reliability of the findings. Although generalizability of the study.
the systematic review and meta-analysis method- A systematic review is governed by carefully
ologies synergistically analyse numerous studies, constructed stages which guide the cumulative
they both serve different yet complementary search method, screening, reviewing, catalogu-
roles. Systematic reviews are the cornerstone in ing, and reporting process of the selected studies
evidence-based medicine as the meticulous, to answer a research question of interest. The
exhaustive, systematic, and structured approach search is intended to provide an exhaustive
is effective in summarizing and critically analys- review of the current literature, which captures
ing the findings of relevant literature pertinent to all appropriate studies ensuring the research
a research questions. A meta-analysis may be question to be answered to its fullest extent. The
conducted in conjunction with a systematic carefully crafted and exhaustive methodology
review to enhance the credence of findings by minimizes bias and provides reliable and accu-
increasing the level of evidence. Through com- rate conclusions to be drawn. It has facilitated an
bining and analysing several studies, researchers improved dissemination of information to health-
aim to extrapolate results that more accurately care providers, the public, policymakers, and the
reflect true effects. Synthesizing data from mul- scientific community. A notable benefit is the
tiple studies allows researchers to achieve a resulting mitigation in the delay in bringing
greater level of statistical power, which is pre- research from bench to bedside leading to well-
cluded in individual studies. Researchers practis- informed policy decisions and direction for future
ing this methodology are encouraged to follow research. Although systematic reviews rank high
criteria outlined by research quality improvement in the hierarchical chain of evidence, they are not
bodies, such as the Preferred Reporting Items for without flaws. Depending on the construction of
Systematic Reviews and Meta-analysis the search and screening criteria, a researcher
(PRISMA) statement [16]. may inadvertently exclude relevant studies.
However, given the systematic, explicit, and
comprehensive methodology of the review, the
37.3 Systematic Review risk of missing relevant studies is decreased in
comparison to a narrative review. Additionally,
37.3.1 What Is a Systematic Review? the generalizability of the findings may be lim-
ited as patients included in the review must be
Literature reviews are commonly classified as homogenous to the target population the
either a narrative or systematic review. A narra- researcher attempts to generalize. Heterogeneity
tive review is a synthesis of literature in a specific is a marked concern among researchers as a bal-
field constructed using a specific contextual or ance between internal and external validity must
theoretical point of view. The method is not be prioritized; although stringent eligibility crite-
nearly as robust or rigorous as no methodological ria produce homogenous data, generalizability is
approaches, such as the search strategy or affected as patients with differing characteristics
37 What Is the Difference Between a Systematic Review and a Meta-analysis? 333
may be excluded [28]. The systematic review may have significant implications to the validity
methodology is indicated in this context as it can of the review. An extensive search without lan-
provide a descriptive and analytical summary of guage restrictions is recommended [12]. A com-
heterogeneous and dissimilar studies, which usu- prehensive search often has three or more online
ally preclude the use of a meta-analysis [28]. databases included. Examples of commonly used
databases include MEDLINE, EMBASE,
CENTRAL, CINAHL, PUBMED, and others.
37.3.2 A Guide to Performing These platforms allow the researcher to conduct a
a Systematic Review comprehensive and exhaustive search of litera-
ture published on the World Wide Web, download
Six essential steps in conducting a systematic results, and perform a citation analysis [17]. A
review are described. critical tool for success, although not necessary,
Step 1: The Research Question—The first and but highly recommended, is seeking a profes-
foremost step in the process of performing qual- sional librarian to assist in the search strategy
ity research is formulating appropriate research development and implementation [25]. The terms
questions. Commonly underestimated, formulat- should also be reflective of the PICO format to
ing the research question requires careful consid- produce an effective search strategy that identi-
eration of a multitude of factors and can fies all relevant studies and balances sensitivity
potentially consume a notable amount of time. and specificity [25]. In this context, good sensi-
The research question should include the prob- tivity and specificity are reached when the search
lem or question of interest, population of interest, produces a high number of relevant studies and a
intervention, and comparator, along with the out- low number of irrelevant studies, respectively.
comes of interest [1]. Using the population, inter- With an increasing trend in technological domi-
vention, comparator, and outcomes (PICO) nance in research, many authors chose to solely
format helps ensure the question is direct and rel- search electronic databases, but searches of refer-
evant. Although sometimes loosely stated, ence lists on articles and specific journals can
authors should explicitly present the question in a also be done [25]. A search of not only published
structured format as it allows consumers to data via online databases but also of ‘grey litera-
directly understand the goal of the project [12]. ture’ decreases the risk for potential publication
Step 2: Constructing Eligibility Criteria— bias of results. Grey literature refers to scholarly
Selecting appropriate eligibility criteria is funda- literature that is not formally published and can
mental in ensuring the most relevant and include dissertations, policy documents, book
appropriate studies are included in the review. chapters, and conference abstracts, research
Researchers can include all key components reports, and unpublished research data [9].
needed to comprehensively answer their question Step 4: Screening, Selection, and Extraction—
if the criteria are reflective of the PICO points This process begins upon completion of the
identified in Step 1. Additionally, researchers search. A list of all article abstracts that appeared
should clearly identify which type of studies they in the search undergoes an abstract screening, in
are interested in (i.e. randomized trials, cohort which researchers make decisions about includ-
studies, etc.) and other operational factors rele- ing or excluding studies based on their relevance
vant to the studies including, but not limited to, to the eligibility criteria elicited from the
year published, language, and number of partici- abstract. A recommended practice to establish
pants [25]. inter-rater reliability and increase validity of the
Step 3: Constructing the Search Strategy— study is to have two independent reviewers con-
Using appropriate search terms is essential in duct this process independently. Often, review-
ensuring an exhaustive search of literature rele- ers may face disagreements whether to include
vant to the research question. An incomprehen- or exclude studies. Attempts to reach agreements
sive search can exclude important studies that between the two reviewers should be made, but
334 S. Akhter et al.
if unsuccessful, another study researcher should [28]. The data extraction p rocess also serves as
be consulted. Researchers should log all activi- the final screening process, as the extracted data
ties in this process, such as which studies were will guide final decisions regarding which stud-
included and excluded and the reasons for each ies will be included or excluded [25, 28].
[25]. This log will help researchers create a flow Researchers must also be prepared for when they
diagram to visually represent articles included. face studies that have missing or incomplete
Figure 37.1 is an example taken from the full- data, for which they must contact the individual
text article of Clinical Vignette 2. Once all titles study authors to obtain [16].
and abstracts are screened and a significantly Step 5: Quality Assessment—Appraising the
shorter list of studies remains, a second full-text study quality, although not required, is highly
screening process should be conducted by the recommended to produce a comprehensive and
same reviewers to ensure reliability and consis- highly valid systematic review. A standardized
tency in study methods. This process will operational definition of ‘study quality’ is elu-
exclude all remaining irrelevant studies and sive but generally refers to the confidence a
identify which have extremely high potential to researcher holds in the study’s design and meth-
be included in the review. Finally, researchers ods to minimizing bias [19, 21, 28]. In other
should create a data table or a standardized data words, a quality assessment is a critical appraisal
extraction form to systematically organize and to determine if the design, conduct, and methods
extract data. This table or form is unique to each of a study will reduce systematic error and bias,
systematic review and may include patient clini- which in turn has implications on its internal and
cal and demographic characteristics, author external validity [4, 28]. Bias is defined by the
names, type of study design, and outcomes. It is Cochrane Collaboration as a deviation from truth
most commonly electronic (i.e. a constructed or a systematic error that can lead to either an
table in excel) given its associated considerable under- or overestimation of the true effect [6].
efficiency and mitigation of data errors due to There are multiple types of bias (selection,
mismanagement, but paper also may be used detection, reporting, attrition, performance
biases) that range from small to substantial and
can markedly limit the generalizability of a
Articles identified
by electronic research study [6]. However, researchers should
search n = 944 be cautious in the interpretation of a quality
Duplicates excluded n = 410 assessment as the process is inherently limited
by factors such as a lack of information provided
Titles and
abstracts screened
by the author as well as the absence of a ‘gold
n = 534 standard’ to reference level of quality [21, 26].
Articles excluded n = 528 Nonetheless, recommended guidelines such as
• No meniscal pathology n = 146
• No arthroscopic intervention n = 88
the five-point Oxford Quality Rating Scale or the
• No nonoperative comparison n = 139 Consolidated Standards of Reporting Trials
• Not degenerative tears n = 13 (CONSORT statement) are available for com-
• Not RCTs n = 142
Full texts prehensive quality assessments and should be
screened n = 6
used when appropriate. It is also recommended
Manual search of references that at least two independent reviewers conduct
• Articles included n = 2 the quality assessment process and increase
inter-rater reliability and study validity [8]. A
Article excluded
• Lack of outcomes reported n = 1
consensus within available literature outlines
four main biases that affect study quality: perfor-
Studies
included n = 7
mance, selection, attrition, and detection bias [4,
13, 24, 28]. The quality assessment is a critical
Fig. 37.1 Article selection flow diagram step in the systematic review process as these
37 What Is the Difference Between a Systematic Review and a Meta-analysis? 335
biases can lead to under- or overestimations of strong external validity through inter-reviewer
the true effect, and the resulting spurious asso- agreement, reliability via the kappa coefficient,
ciations may have detrimental implications for and consistency via Cronbach’s alpha coefficient
clinical practice [12, 28]. [23]. The kappa coefficient is a commonly used
Quality assessment tools depend on what type statistical measure of inter-rater reliability, and
of evidence is being synthesized. The Cochrane Cronbach’s alpha coefficient elicits the consis-
risk of bias (ROB) tool is considered the gold tency and correlation of the items of an instru-
standard for the quality assessment of random- ment, such as a survey, to determine its overall
ized controlled trials, whereas the methodologi- reliability [18, 22].
cal index for non-randomized studies (MINORS) Providing an overall confidence in the esti-
is one of many available measures for assessing mate of effect of a particular treatment or inter-
observational studies. The Cochrane ROB tool vention is increasingly being performed utilizing
ensures a systematic and critical appraisal of all the Grading of Recommendations, Assessment,
possible bias domains including selection, per- Development and Evaluations (GRADE) criteria.
formance, detection, attrition, reporting, and This GRADE method provides a reliable esti-
other forms of biases [29]. Assessing for these mate of the effect of the evidence across all stud-
potential biases ensures the researcher takes into ies for outcomes, as opposed to individual study
account systematic differences in baseline char- evaluation. An example of one of the factors cri-
acteristics, provision of care, outcome ascertain- tiqued by GRADE is shown in Fig. 37.3.
ment, participant withdrawals, and reported and Methodological flaws, treatment effects, consis-
unreported findings within the studies. The tency, and generalizability comprise the main
MINORS is a 12-item tool that is used for, but assessment domains seen in this tool [5]. Great
not limited to, quality assessment of systematic care should be taken to select the appropriate tool
reviews that employ observational studies for quality assessment and should be reflective of
(Fig. 37.2) [23]. This tool has demonstrated the type of studies being reviewed.
1. A clearly stated aim: the question addressed should be precise and relevant in the light of available literature
2. Inclusion of consecutive patients: all patients potentially fit for inclusion (satisfying the criteria for inclusion) have been
included in the study during the study period (no exclusion or details about the reasons for exclusion)
3. Prospective collection of data: data were collected according to a protocol established before the beginning of the study
4. Endpoints appropriate to the aim of the study: unambiguous explanation of the criteria used to evaluate the main outcome
which should be in accordance with the question addressed by the study. Also, the endpoints should be assessed on an
intention-to-treat basis.
5. Unbiased assessment of the study endpoint: blind evaluation of objective endpoints and double-blind evaluation of subjective
endpoints. Otherwise the reasons for not blinding should be stated
6. Follow-up period appropriate to the aim of the study: the follow-up should be sufficiently long to allow the assessment of
the main endpoint and possible adverse events
7. Loss to follow up less than 5%: all patients should be included in the follow up. Otherwise, the proportion lost to follow up
should not exceed the proportion experiencing the major endpoint
8. Prospective calculation of the study size: information of the size of detectable difference of interest with a calculation of
95% confidence interval, according to the expected incidence of the outcome event, and information about the level for
statistical significance and estimates of power when comparing the outcomes
Additional criteria in the case of comparative study
9. An adequate control group: having a gold standard diagnostic test or therapeutic intervention recognized as the optimal
intervention according to the available published data
10. Contemporary groups: control and studied group should be managed during the same time period (no historical comparison)
11. Baseline equivalence of groups: the groups should be similar regarding the criteria other than the studied endpoints. Absence
of confounding factors that could bias the interpretation of the results
12. Adequate statistical analyses: whether the statistics were in accordance with the type of study with calculation of confidence
intervals or relative risk
Each domain is scored from 0-2. Items will either be scored as 0 (not reported), 1 (reported but
inadequate), or 2 (reported and adequate) [29]. A score for non-comparative studies of 16 and
comparative studies of 24 is globally accepted [29].
Fig. 37.2 The validated and updated version of the MINORS questionnaire [29]
336 S. Akhter et al.
Unclear risk Most information is Plausible bias Potential limitations are No serious
of bias. from studies at low that raised some unlikely to lower limitations, do not
or unclear risk of doubt about the confidence in the estimate downgrade.
bias. results. of effect.
High risk of The proportion of Plausible bias Crucial limitation for one Serious
bias. information from that seriously criterion, or some limitations.
studies at high risk weakens limitations for multiple downgrade one
of bias is sufficient confidence in the criteria, sufficient to lower level.
to affect the results. confidence in the estimate
interpretation of of effect.
results.
Crucial limitation for one Very serious
or more criteria sufficient limitations,
to substantiallly lower downgrade two
confidence in the estimate levels.
of effect.
Adopted from the Cochrane Handbook of Systematic Reviews of Interventions, this is an extension
of factor 1 of 5 of GRADE that extends the risk of bias assessment to make conclusions regarding
the limitations relating to outcomes [6].
A comprehensive quality assessment adds to the methods, biases, quality assessment, total num-
review’s generalizability and validity and there- ber of patients in treatment and control groups,
fore leads to a greater clinical impact. intervention(s), control, outcomes, and any other
Step 6: Data Analysis and Interpretation of relevant information. The inclusion of items in
Results—The final steps of data analysis and the table should reflect the research question
interpretation should be conducted following a [20]. Interestingly, these information-rich tables
comprehensive quality assessment [4, 13, 24, provide sufficient information for the researcher
28]. This step requires the researcher to create a to determine if statistically pooling the data for a
concise and simple descriptive summary of each meta-analysis is possible [20]. If the study
included study [28]. Many researchers choose to includes sufficient data to be meta-analysed, that
display study characteristics in a tabular format. approach will be adopted as it is a higher level of
An example can be found in Table 1 in the full evidence. As a result, a systematic review can be,
article of the Clinical Vignette 1. The descriptive and is commonly, accompanied by a meta-
table is a comprehensive summary of study char- analysis (Clinical Vignette 2).
acteristics and therefore should include, but is not When interpreting results, the researcher
limited to, author name, year published, study should carefully extrapolate conclusions based
37 What Is the Difference Between a Systematic Review and a Meta-analysis? 337
on the summarization and analysis of the included their research question of interest in the back-
studies. In this section, the findings must be ground section (Step 1). The eligibility criteria
briefly reiterated, and the final impressions from and the use of two independent reviewers were
this synthesis of the best available evidence highlighted (Step 2). Four databases were
should be explicitly stated. Here, the researcher searched producing 283 articles, of which 29
should highlight the strengths and limitations of were included following the screening process
their study and indicate how these findings may (Step 4). Results and figures for study and
or may not have clinical implications. Suggestions patient characteristics, along with results for
for future directions of research should also be study quality and clinical outcomes, were
included as they are valuable statements in sys- reported (Step 5). This information was also
tematic reviews, provided this methodology sum- visually displayed using histograms. The results
marizes available evidence pertaining to a specific in each individual studies were reported cumu-
field and highlights lacking areas. Systematic latively as a single article of evidence, without
reviews directly improve patient care as they the use of statistical methods. The authors
provide information that is a requisite for
finally critically interpreted the results in the
evidence-based decision-making. discussion section (Step 6). Evidently, the
Clinical Vignette 1 displays a systematic authors clearly report their methods allowing
review on the efficacy of antifibrinolytic therapy reproducibility and demonstrating a strong
in reducing patient transfusions in orthopaedic internal validity in their review.
surgery. The authors clearly and explicitly state
included studies; and (6) analysing results and r aters (data collectors) agree in their independent
interpreting the findings. measurements [18]. A multitude of statistical
Step 6: Data Analysis and Interpretation of methods to test for inter-rater reliability exist,
Results—Calculation of the effect sizes and including percent agreement, the contingency
reporting them with a 95% confidence interval coefficient, Pearson’s r, the correlation coeffi-
(CI) are common practice with meta-analyses cient, the concordance correlation coefficient,
[25]. Numerous statistical programmes includ- and the most commonly used Cohen’s kappa for
ing the Cochrane-endorsed Review Manager two raters or Fleiss kappa for three or more rat-
programme (RevMan), MIX 2.0, and MetaStat ers [18]. Although heterogeneity can be judged
that conduct the meta-analysis processes are from graphical representations of data, such as
widely available. The results (effect sizes and CI looking at the error bars in forest plots, research-
intervals) should be reported graphically and ers should conduct a statistical test of heteroge-
quantitatively [25]. A common graphical repre- neity to address concerns of dissimilarities in
sentation of meta-analysis results is a forest plot study results within the meta-analysis. This test
(Fig. 37.4). In Fig. 37.4, the forest plot visually determines if the variation in the study results is
depicts each study as a square where the middle due to genuine measurable differences (hetero-
is the effect size (SMD) and each end point of geneity) or chance alone (homogeneity) but has
the corresponding line represents the upper and the inherent limitation of sensitivity to the num-
lower CI limits. The right portion of the plot (>0) ber of included trials [8]. The I2 statistic is com-
favours the control or comparator, whereas the monly used to quantitatively measure the
left (<0) favours the intervention [25]. The large variability between results (effect sizes of each
diamond at the bottom represents the pooled study). This assessment of consistency is critical
effect of all the individual studies. As the left as it directly relates to generalizability; the more
side of the graph favours the intervention, consistency in studies, the more generalizable it
researchers hope to see this diamond, or pooled is [18]. The statistic Cochran’s Q is commonly
effect, below <0 indicating efficacy of the inter- used to evaluate the null hypothesis of all
vention. Calculation of inter-rater agreement and included studies and evaluate similar effects [8].
tests of heterogeneity are also done in this phase. This test statistic is calculated by weighing the
Measurement of inter-rater reliability is a key sums of squared deviated from the individual
component in the validity of a study as it repre- studies and the overall pooled result [14]. Finally,
sents how well the data collected are accurate a chi-squared (χ2) distribution with k-1 degrees
representations of the variables of interest by of freedom is compared to the results from the Q
quantitatively measuring the extent to which test statistic to obtain P values [9].
With interpreting the results, the researcher an electronic extraction form. The authors
should practise similar caution outlined in the employed statistical methods to report interob-
systematic review process. Summarizing server agreement and outcomes. All steps of the
findings, making evidence-based conclusions,
statistical methods are expertly reported and can
highlighting patient care implications, and sug- be readily reproduced. Test of heterogeneity
gesting future directions for research should be was also performed to determine if variability
included. was a result of chance or inter-study heteroge-
Clinical Vignette 2 [14] displays a meta- neity. Subgroup analyses were outlined a priori,
analysis on arthroscopic surgery for degenera- and sensitivity analyses were done to elucidate
tive tears of the meniscus. Authors systematically the consequences of missing data and studies at
searched three databases to ensure all relevant risk for bias. Figures of the study selection pro-
literature is captured. With a focus on random- cess and study descriptions were provided.
ized controlled trials, the search was conducted Results for the search, each individual outcome,
and studies screened and assessed for eligibility adverse events, and sensitivity analysis were
by two independent reviewers. To ensure a sys- reported and interpreted. Authors followed the
tematic and consistent process, the same review- PRISMA statement for reporting findings. The
ers independently conducted risk of bias authors concluded with limitations, implica-
assessments followed by data extraction using tions, and a conclusion.
14. Khan M, Evaniew N, Bedi A, Ayeni OR, Bhandari ent reference values: workshop summary. Rockville:
M. Arthroscopic surgery for degenerative tears of the U.S. Dept. of Health and Human Services, Agency for
meniscus: a systematic review and meta-analysis. Can Healthcare Research and Quality; 2009.
Med Assoc J. 2014;186(14):1057–64. 22. Santos JRA. Cronbach’s alpha: a tool for assessing the
15. Lau J, Ioannidis JP, Schmid CH. Quantitative syn- reliability of scales. J Ext. 1999;37:2.
thesis in systematic reviews. Ann Intern Med. 23. Slim K, Nini E, Forestier D, Kwiatkowski F, Panis Y,
1997;127(9):820. Chipponi J. Methodological index for non-randomized
16. Liberati A, Al tman DG, Tetzlaff J, Murlow C,
studies (MINORS): development and validation of a
Gøtzsche PC, Clarke M, et al. The PRISMA statement new instrument. ANZ J Surg. 2003;73(9):712–6.
for reporting systematic reviews and meta-analyses 24.
Torgerson C. Systematic reviews. London:
of studies that evaluate health care interventions: Continuum; 2003.
explanation and elaboration. Ann Intern Med. 25. Uman LS. Systematic reviews and meta-analyses. J
2009;151(4):W65–94. Can Acad Child Adolesc Psychiatry. 2011;20(1):57–9.
17. Matthew EF, Eleni EP, George AM, Georgios
26. Verhagen AP, Vet HCD, Bie RAD, Boers M, Brandt
P. Comparison of PubMed, Scopus, Web of Science, PAVD. The art of quality assessment of RCTs
and Google Scholar: strengths and weaknesses. Fed included in systematic reviews. J Clin Epidemiol.
Am Soc Exp Biol. 2015;20 Sep 2007. 2001;54(7):651–4.
18. Mchugh ML. Interrater reliability: the kappa statistic. 27. Weil RJ. The future of surgical research. PLoS Med.
Biochem Med. 2012;22(3):276–82. 2004;1(1):e13.
19. Moher D, Jadad AR, Nichol G, Penman M, Tugwell 28. Wright RW, Brand RA, Dunn W, Spindler KP. How
P, Walsh S. Assessing the quality of randomized con- to write a systematic review. Clin Orthop Relat Res.
trolled trials: an annotated bibliography of scales and 2007;455:23–9.
checklists. Control Clin Trials. 1995;16:62–73. 29. Zeng X, Zhang Y, Kwong JS, Zhang C, Li S, Sun F,
20. Pae C-U. Why systematic review rather than narrative et al. The methodological quality assessment tools for
review? Psychiatry Investig. 2015;12(3):417. preclinical and clinical studies, systematic review and
21. Russell RM. Issues and challenges in conducting
meta-analysis, and clinical practice guideline: a sys-
systematic reviews to support development of nutri- tematic review. J Evid Based Med. 2015;8(1):2–10.
Reliability Studies and Surveys
38
Kelsey L. Wise, Brandon J. Kelly,
Michael L. Knudsen, and Jeffrey A. Macalena
Clinical Vignette 1
Dr. Sessions measures the internal consistency of a newly formulated survey that includes five
questions about the quality of life of patients with below-knee amputations. The hypothesis is
that individuals of similar age, gender, mental functioning, and pain ratings have comparable
level of function, self-image, mental health, and pain ratings at a mean follow-up of 3 years.
For responses, yes = 1 and no = 0.
Patient Item 1 Item 2 Item 3 Item 4 Item 5 Summed scale score
1 1 0 1 1 1 4
2 0 1 0 1 0 2
3 1 1 1 1 1 5
4 0 1 0 1 0 2
5 1 1 1 0 1 4
Coefficient alpha for a series of dichotomous items is
é k ù ê å 1 ( %Positve )( % Negative ) ú
é k
ù
ê k - 1ú ê1 -
ë û s 2
ú
ë û
where k is the number of items in the scale.
Coefficient alpha is
é 5 ù é ( 0.6 )( 0.4 ) + ( 0.8 )( 0.2 ) + ( 0.6 )( 0.4 ) + ( 0.8 )( 0.2 ) + ( 0.66 )( 0.4 ) ù
ê 5 - 1 ú ê1 - ú = 0.35
ë û ë 1.44 û
The internal consistency, measured with Cronbach’s coefficient alpha, is 0.35, indicating a
very poor correlation between items.
38 Reliability Studies and Surveys 345
38.1.2.2 Test-Retest
Fact Box 38.2 Test-retest reliability measures response stability
Internal consistency reliability statistic over time [34]. Participants respond to a survey at
interpretation two distinct time points to measure response sta-
Cronbach’s
bility. Subsequently, correlation coefficients
Population coefficient alpha Interpretation
Individual ≥0.90 Acceptable (r values) are determined to compare responses
comparison agreement [26, 34]. R values are considered strong if they
Group ≥0.70 Acceptable equal or exceed 0.70 [26]. A source of error in this
comparison agreement study is a change in characteristics of the test set-
ting with multiple administrations.
Clinical Vignette 2
Dr. Siljander wishes to evaluate postoperative pain scores using a visual analog scale (VAS) in
patients undergoing open reduction internal fixation (ORIF) of the distal radius. Fifteen patient
responses are assessed at 2 and 4 h postoperatively. Responses to the VAS are scaled 0–100, and
the response at 2 h postoperatively (time 1) is compared to 4 h postoperatively (time 2). The
correlation coefficients of the two sets of data are compared. The r value is calculated to be 0.98,
which indicates an excellent test-retest reliability at 2 and 4 h postoperatively.
Time 1 Time 2
Patient 1 71 68
Patient 2 75 70
Patient 3 78 70
Patient 4 84 83
Patient 5 81 77
Patient 6 85 88
Patient 7 90 83
Patient 8 22 21
Patient 9 44 36
Patient 10 52 48
Patient 11 50 44
Patient 12 66 61
Patient 13 38 30
Patient 14 84 82
Patient 15 85 73
Σ 1005 934
Correlation coefficient, r:
r = 0.98
346 K. L. Wise et al.
Clinical Vignette 3
Dr. Todhunter desires to evaluate the intraobserver reliability of three-dimensional computed
tomography (3D CT) for measuring femoral torsion. He performs this measurement five times
a week, for 5 weeks, on the same imaging study. Dr. Todhunter calculates the variance between
all 25 images to be 3.72° and variance between weekly samples to be 5.30°. With an instrument-
recorded measurement error of 1.33°, he derives an intraobserver reproducibility of 0.96:
between _ subjectSD 2 + between _ observerSD 2
between _ subjectSD 2 + between _ observerSD 2 + measurement _ errorSD 2
( 3.72 ) + ( 5.30 )
2 2
= 0.96
(
3.72 ) + ( 5.30 ) + (1.33)
2 2 2
between _ subjectSD 2
between _ subjectSD 2 + between _ observerSD 2 + measurement _ errorSD 2
( 3.11)
2
= 0.76
( 3.11) + (1.11) + (1.33)
2 2 2
k = PO - PC / 1 - PC
Fact Box 38.5
PO is the proportion of observed agreements,
Data evaluation
and PC is the proportion of agreements expected
Data types Variables Example
by chance [43].
Categorical Discrete or “Absent” or
The value for perfect agreement is 1.0, and the data qualitative “present”
value for no agreement beyond chance is 0.0. Continuous Numeric or Radiographic
Negative values indicate that the measured agree- data quantitative measurements
ment is worse than chance alone [22]. Kappa is Statistic Ideal data type Measurement
often used in interobserver reliability (two or Kappa Categorical data Agreement
coefficient
more clinicians rate same patient) or intraob- Phi statistic Categorical data Agreement
server reliability (single clinician rates same independent of
patient two or more times). Kappa is not as useful chance
when skewed distribution of responses exists Pearson Continuous data Linear
correlation correlation
because agreement above chance is less likely.
Intraclass Continuous (most Reproducibility
correlation common) or
38.1.4.4 Phi Statistic coefficient categorical data
The phi statistic can also be used for categorical (ICC)
data, and it has the benefit of measuring agreement
38 Reliability Studies and Surveys 349
The number of raters and subjects is controlled in Reliability testing of scales and items is a valu-
reliability studies. To increase precision, increasing able tool in orthopedic surgery, as it provides
the number of subjects carries more weight than quantitative data on the performance of an instru-
increasing the number of raters (particularly when ment [26]. Types of reliability studies include
there are more than four raters) [22]. However, internal consistency, test-retest, intraobserver,
increasing either subjects or raters will narrow the interobserver, and alternate form. Categorical
confidence interval. In an effective study design, data typically use the kappa statistic, while con-
the number of raters is selected first based on study tinuous data tend to utilize the intraclass correla-
generalizability and feasibility, and then subject tion coefficient. A correlation coefficient, or r
number for desired precision is selected [22]. value, with a value of 0.70 or greater is generally
Generalizability of the raters is determined by accepted and indicates good reliability.
their characteristics. Feasibility of the raters or
ratings depends more on the subjects. Radiographs
can be rated multiple times, whereas patients 38.2 Surveys
may not tolerate more than one examination.
Sample size calculations are performed once the 38.2.1 Introduction
number of raters is decided.
For intraclass correlation coefficients, the Surveys are useful instruments in orthopedic
number of subjects is determined by the mini- research for gathering data on the demographics,
mum acceptable reliability [48]. Alternatively, practices, or ideas of a group of individuals at a
the desired precision of the reliability estimate specific time or to compare changes over time
can be used to determine subject number [18]. [11, 14, 21, 33, 38, 44]. Constructing surveys
involves writing questions that may ultimately be
converted into numbers to allow for statistical
38.1.6 Interpretation of Results analysis [30]. The results of surveys give insight
into the adoption of existing literature and help
The majority of reliability statistics use a 0.0–1.0 guide clinical practice and future research [44].
scale. A value of 1.0 indicates all variability is Surveys are administered by telephone, mail, fax,
due to true subject differences; a value of 0.0 or electronically.
indicates all variability is due to error [22]. Surveys allow for convenient and inexpensive
Table 38.1 shows the agreement associated with research that can cover a large population over a
certain kappa values. In reality, most studies have short period of time [21]. Surveys may be the
reliability that falls between 0.3 and 0.7 [4]. only method of conducting certain types of
Ultimately, readers must determine how applica- research, such as understanding current view-
ble study design, raters, subjects, and measure- points or attitudes of orthopedic surgeons.
ment tools are to their practices. Additionally, surveys may act as preliminary
studies for successive research ideas [21].
Survey validity is jeopardized by low response
Table 38.1 Interpretation of the agreement with kappa
coefficient rates. Response rates have been as low as 15%
among surgeons [44]. This has been attributed to
Kappa value Agreement
0 Absent
various factors, including busy work schedules,
>0–0.20 Slight an increasing number of commercial requests,
0.21–0.40 Fair increased paperwork, and low priority [6, 7, 25,
0.41–0.60 Moderate 40, 44]. Low responses can result in nonresponse
0.61–0.80 Good bias, as individuals who respond can have mean-
0.81–1.00 Excellent ingful differences from those who do not. This
350 K. L. Wise et al.
can lead to a nonrepresentative sample and a veyed. The “sampling frame” comprises the tar-
subsequent loss of survey credibility [44]. Biases get population for the survey, while the “sampling
tend to be limited by achieving response rates of element” includes the respondents from whom
at least 70% [33]. In addition to increasing the data is gathered and evaluated [1].
validity of surveys, a higher response rate may Sample selection may be random (probability
also eliminate the costs needed for resending design) or nonrandom (nonprobability design)
surveys [37]. [1]. Probability designs include simple random
A systematic approach to survey design, sampling, systematic random sampling, stratified
development, testing, administration, and consid- sampling, and cluster sampling [11].
ering strategies for elevating response rate are
imperative to decrease bias and increase general- –– Simple random sampling: Each person has the
izability [11]. same probability of being selected. Selection
takes place through techniques such as a lot-
tery process or a random-number generator
38.2.2 Survey Design [45]. This technique requires little prior
knowledge of a population but may not cap-
38.2.2.1 Determining the Objective ture certain groups or be efficient [1].
A well-defined objective is important for a suc- –– Systematic random sampling: A starting point
cessful survey [11]. This involves careful consid- is arbitrarily chosen on a list. Subjects are then
eration of the topic and target audience [21]. selected in a methodical manner (e.g., every
Strong research questions are specific, simple, fifth subject) [11]. This technique has high
meaningful, interesting, and answerable [33]. precision and allows ease of analyzing data
and measuring sampling errors [1]. However,
38.2.2.2 Develop the Survey the ordering of the list of subjects in the sam-
Instrument ple frame can create biases, certain groups
After a clearly stated research question has been may be excluded, and there may be a lack of
formulated, investigators may modify an existing efficiency [1].
survey or develop a new instrument [33]. The –– Stratified random sampling: Possible respon-
survey instrument acts as the interface between dents are divided into defined groups. Within
study objectives and responses of individuals sur- these groups, individuals are randomly sam-
veyed [21]. Strategies to identify item content pled by either simple or systematic sampling
include conducting focus groups of potential sub- [11]. This technique allows disproportionate
jects, discussing with an expert panel, or utilizing sampling to be possible and has high precision
the Delphi technique, a process by which items [1]. A disadvantage is that the advanced
are selected and ranked by a group of experts knowledge of a population can result in more
until a consensus is reached [33]. Regardless of complex analysis.
which strategy is employed, a team of colleagues –– Cluster sampling: The population is broken
should be involved in the development process to down into heterogeneous clusters, and specific
lend face validity to the instrument before it is clusters are sampled [11]. This method has
tested. lower costs and allows sampling of groups if
individuals are not available [1]. However, this
38.2.2.3 Identifying the Sampling method is more complex when analyzing data
Frame and evaluating sample errors, and it has low
It is difficult to administer a survey to all indi- precision.
viduals within a goal population as a result of the
population size or the hardship of identifying and In nonprobability sampling, individuals in a
contacting all possible respondents [11, 38]. As a population have an unequal chance of being
result, a fraction of the target population is sur- involved in the sample [11]. Nonprobability sam-
38 Reliability Studies and Surveys 351
ing, and results are more difficult to convert into stated, the most applicable, and noteworthy [44].
analyzable data [33]. Closed response formats Items can be grouped based on content to help
include nominal, ordinal, interval, binary, and ratio with respondents’ thought process and memory
measurements [11]. [33]. The filter technique allows responders to
omit questions based on prior responses, thus
–– Nominal responses: These responses involve a limiting frustration with irrelevant or non-appli-
list of mutually exclusive items (e.g., physi- cable questions [50].
cians, nurses, medical students) that reflect
qualitative distinctions [11]. 38.2.3.4 Survey Length
–– Ordinal responses: These responses imply a Surveys should be succinct. One study found sig-
ranked order. The traditional Likert scale, nificantly lower response rates to a threshold
with answers ranging from “strongly dis- level of 1000 words [20]. Though this threshold
agree” to “strongly agree,” is commonly used was not directly aimed at orthopedic surgeons,
in surveys [33]. This format allows the group- this study suggests that response rate may be con-
ing of like-minded attitudes and viewpoints. siderably affected by minor modifications in
An odd number of points allows a neutral questionnaire length.
response, while an even number of points
forces a commitment [33]. In a phenomenon 38.2.3.5 Survey Format
called the “floor” or “ceiling” effect, respon- The format of a survey is a vital part of presen-
dents tend to choose responses clustered at tation. There should be explicit instructions
the bottom or top of a scale. Increasing the throughout, as opposed to solely at the begin-
number of points on a scale may exacerbate ning of surveys. Arrows or symbols to provide
this phenomenon [33]. visual navigation are beneficial for assistance
–– Interval and ratio measurements: These mea- with survey completion [44]. The size and style
surements show continuous responses. A ratio of the font should be easy to read [11].
scale has an absolute zero (e.g., height and Emphasizing important words or phrases,
weight), while intervals do not have a true numbering questions, providing appropriate
zero (e.g., interval time of day; 1 PM to 2 PM spacing, and listing answers vertically instead
is the same as 5 PM to 6 PM because both are of horizontally are all helpful techniques [1,
1 h increments). 44]. It is helpful to create a polished, yet
unique, appearance that allows a survey to
It is helpful to involve a biostatistician to stand out from other questionnaires responders
ensure that answer choices are in a format where may receive.
the data can easily be analyzed.
Question responses that should be avoided 38.2.3.6 Cover Letters
include vague answer choices (e.g., “other” or Cover letters provide the initial opportunity to
“unknown”), absolute terms (e.g., “always” or persuade readers to complete surveys [44, 50].
“never”), abbreviations, complex answers, and The letter should clearly state the purpose of the
asking respondents to rank responses [11, 21]. survey and why participants were selected [33].
There should be an equal number of positive and Confidentiality and optional completion of the
negative option choices for scaled questions. survey should be conveyed [1, 50]. Details should
be included regarding when the survey should be
38.2.3.3 Chronology completed and who to contact if participants have
The chronology of the questions can have a major questions. Finally, the cover letter should contain
influence on the response rate of surveys. It is an affirmation that the recipient’s participation is
helpful to begin with demographic questions, as vital to the success of the survey and an expres-
these are simple and nonthreatening [33]. The sion of gratitude to the recipient for his or her
first non-demographic question should be clearly time [37].
38 Reliability Studies and Surveys 353
38.2.5.2 Postal
Fact Box 38.7 Postal delivery is the traditional method of sur-
Survey development vey administration. Postal surveys allow
Survey parts Features responders to complete questionnaires in pri-
Question stem Succinct, clear, avoid vacy, and interviewer distortion is minimized
objectionable questions
[6, 44, 50]. Postal surveys may also increase
Question “Open” (free text) or “closed”
response (structured) validity by giving responders time to formulate
Chronology Demographics first, personal answers and utilize various sources [44].
questions last Disadvantages of postal surveys include the
Survey length Concise (as short as possible to cost of supplies, the time necessary for the
convey idea)
labor of collecting responses, and the potential
Survey format Polished and unique with
instructions throughout decrease in accuracy when manually recording
Cover letter Recruits and attracts participants responses as opposed to responses automati-
cally being entered into a database [29, 44].
38.2.5.3 Fax
38.2.4 Pilot Testing Respondents of fax surveys may return the sur-
vey through fax or by postal mail. Data can be
Pilot testing allows trial on methods of sam- collected manually or with utilization of optical
pling, data collection, and survey administra- character recognition by some fax machines [29].
tion. Investigators ask colleagues or other Character recognition may enhance the accuracy
participants who are similar to desired respon- of the recorded data. Costs of fax surveys are
dents to evaluate questions and provide feed- similar to costs of postal surveys [44].
back [11]. This is important to evaluate for
mistakes, gauge the time needed for comple- 38.2.5.4 Electronic
tion, understand whether the survey conveys The current mainstay of survey administration is
the proposed message, and test to see if the sur- through electronic distribution [9, 21, 25, 47].
vey grabs the reader’s interest [50]. Electronic surveys may be web-based, where
respondents fill out a questionnaire on a website.
Alternatively, surveys can be conducted through
38.2.5 Methods of Survey email, as an attachment, or embedded in the text
Administration of the email.
Advantages of electronic surveys include
Surveys can be given over the phone, through ease of completion, ability to reach a large
postal mail, via fax, or electronically. The selected audience, cost-effectiveness, and the immedi-
administration technique depends on the targeted ate availability of responses [9, 21, 25, 29, 44,
audience, the type of information desired, finan- 47]. Simple descriptive statistics usually are
cial limitations, and whether test properties were embedded, allowing concurrent analysis for
constructed. researchers, while more complex statistical
analysis is generally accomplished by export-
38.2.5.1 Telephone ing data to statistical software [47]. This
Telephone surveys offer the advantage of increas- reduces time and resources, as well as the pos-
ing accuracy, as the administrator can ensure sibility of human error. Electronic surveys
question comprehension and complete response allow control of question order, thus prevent-
from the responder. Downfalls include the ing respondents from changing prior answers
expense, difficulty in achieving a high response [9]. Finally, email surveys have been showed to
rate, and the susceptibility to distortion and inter- be associated with a lower number of unan-
view bias [33, 44]. swered questions [29].
354 K. L. Wise et al.
min”) [30]. For email surveys, embedding the electronic surveys [15, 39]. If known, the current
link of the survey in the text rather than including response rate should be included in each reminder,
the survey as an attachment is likely to increase acting as an incentive for readers to participate.
response rate. If utilizing postal surveys, provid-
ing stamped return envelopes, as opposed to 38.2.6.6 Principal Investigator Contact
franked return envelopes or no envelope at all, is Finally, a personal gesture that may increase
a simple, low-cost, and effective method of response is having the principal investigator (PI)
increasing response rate [40]. reach out to subjects directly.
38.2.6.2 Precontacts
Informing participants of an upcoming survey
Fact Box 38.9
may enhance questionnaire participation. There
How to enhance response rate?
have been mixed reports on increasing response
• Convenience: decrease time and increase
rate with prenotification letters [1, 40, 50], though availability
it is thought that a prenotification is particularly • Precontacts: build anticipation and awareness
important with email surveys for these question- • Multiple contacts: Dillman approach [35] for
naires to avoid being viewed as “junk mail” and Internet and postal surveys
deleted [47]. A prenotification letter should be • Incentives: infrequently cost-effective but may
increase responses
personalized, concise, and positive. It should also • Reminders: increased responses in dose-dependent
aim at building anticipation as opposed to provid- relationship
ing excessive detail about the survey [1]. • PI contact subject her/himself
43. Sim J, Wright CC. The kappa statistic in reliability 47. VanDenKerkhof EG, Parlow JL, Goldstein DH, Milne
studies: use, interpretation, and sample size require- B. In Canada, anesthesiologists are less likely to
ments. Phys Ther. 2005;85(3):257–68. respond to an electronic, compared to a paper ques-
44. Sprague S, Quigley L, Bhandari M. Survey design in tionnaire. Can J Anaesth. 2004;51(5):449–54.
orthopaedic surgery: getting surgeons to respond. J 48. Walter SD, Eliasziw M, Donner A. Sample size and
Bone Joint Surg Am. 2009;91(Suppl 3):27–34. optimal designs for reliability studies. Stat Med.
45. Sudman S. Applied sampling. In: Rossi PH, Wright 1998;17(1):101–10.
JD, Anderson AB, editors. Handbook of survey 49. Wright RW, MARS Group. Osteoarthritis clas-
research. San Diego: Elsevier; 1983. p. 145–94. sification scales: interobserver reliability and
46. Thelen P, Delin C, Folinais D, Radier C. Evaluation of arthroscopic correlation. J Bone Joint Surg Am.
a new low-dose biplanar system to assess lower-limb 2014;96(14):1145–51.
alignment in 3D: a phantom study. Skelet Radiol. 50. Zelnio RN. Data collection techniques: mail question-
2012;41(10):1287–93. naires. Am J Hosp Pharm. 1980;37(8):1113–9.
Registries
39
R. Kyle Martin, Andreas Persson, Håvard Visnes,
and Lars Engebretsen
standardized database prospectively collecting Group [26, 27]. The working group outlined the
information on a patient population with a com- rationale for the inclusion of PROMs in the
mon disease or intervention and followed over arthroplasty registries and noted that several have
time [6]. incorporated PROMs into their data collection
The first known medical registry was the [27]. They also made several recommendations
National Leprosy Registry of Norway which was regarding the use of these outcome measures in
established in 1856 [14, 15]. More than a century arthroplasty registers [26].
later, the Swedish Knee Arthroplasty Registry
became the first national registry in orthopaedic
surgery in 1975 [17]. Finland (1980), Norway Fact Box 39.1
(1987), and Denmark (1995) also created arthro- Drolet and Johnson developed a definition
plasty registers and were soon expanded to of a medical registry based on the presence
include all joint replacements. The primary objec- of five characteristics which they called
tive of the first arthroplasty registers was the early MDR-OK. The information must be merge-
detection of inferior results based on implant revi- able (M) into a centralized data set, use stan-
sion data. This proved successful in 1995 when dardized data (D), and must follow a
two studies identified inferior implants at an early protocol outlining rules (R) for systematic
stage, a determination made possible through the and prospective data collection. Additionally,
arthroplasty registry [8, 11, 12]. the observation (O) of patient data is fol-
Today, the use of registries in orthopaedic sur- lowed over time to collect knowledge (K)
gery has expanded to include many subspecialties about patient outcomes and results. The
other than arthroplasty, and several national, authors state that all five characteristics need
regional, and local registries have been devel- to be present in order to differentiate a medi-
oped. Building on the early experience of the cal data registry from a non-registry data-
arthroplasty registers, important changes have base [6].
also been made to the data collection and out-
come measures that are captured. The first knee
ligament surgery registry was created in Norway
in 2004, followed in 2005 by Sweden and Clinical Vignette: In the Operating Room
Denmark [9]. While revision surgery and conver- Dr. Ng is just beginning a standard ACL
sion to total knee arthroplasty were determined to reconstruction in a 26-year-old soccer
be important outcome measures, it was also rec- player. The initial injury occurred 6 months
ognized that inferior results and graft failures ago, and the clinical evaluation and MRI
may not always go on to further surgery and both suggested an isolated ACL rupture. In
therefore may go undetected by the registry [8]. the preoperative waiting room, the patient
To account for this, the knee ligament registries stated that he had a giving-way recently
also include patient-reported outcome measures with immediate joint swelling. Upon inser-
(PROMs), specifically the Knee Injury and tion of the arthroscope, Dr. Ng finds a grade
Osteoarthritis Outcome Score (KOOS) [28], pre- four cartilage lesion which likely devel-
operatively and at standard postoperative time oped during the recent instability episode.
points. Thus, the ability to detect inferior results He has previously managed similar lesions
and early failures is improved without relying on with microfracture treatment, but he recalls
subsequent surgery as the sole end point. a recent registry study that found micro-
In 2014 the International Society of fracture at the time of ACL reconstruction
Arthroplasty Registries created a PROM Working
39 Registries 361
Fig. 39.1 Norwegian National Knee Ligament Registry Perioperative surgeon registration form
364 R. K. Martin et al.
1
Protocol approved by the NKLR board members; subject to regional ethics evaluation and applicable laws.
2
Deidentified
80
70 2004-2006
2007-2009
2010-2016
60
0 2 4 6 8 10 12 14
Years After Primary Operation
Register: 11 years and 73,000 arthroplasties. of revision with hamstring tendon grafts compared
Acta Orthop Scand. 2000;71:337–53. https://doi. with patellar tendon grafts after anterior cruciate
org/10.1080/000164700317393321. ligament reconstruction: a study of 12,643 patients
11. Havelin LI, Espehaug B, Vollset SE, Engesaeter
from the Norwegian Cruciate Ligament Registry,
LB. The effect of the type of cement on early revision 2004-2012. Am J Sports Med. 2014;42:285–91.
of Charnley total hip prostheses. A review of eight https://doi.org/10.1177/0363546513511419.
thousand five hundred and seventy-nine primary 24. Pocock SJ, Elbourne DR. Randomized trials or obser-
arthroplasties from the Norwegian Arthroplasty vational tribulations? N Engl J Med. 2000;342:1907–
Register. J Bone Joint Surg Am. 1995;77:1543–50. 9. https://doi.org/10.1056/NEJM200006223422511.
12. Havelin LI, Espehaug B, Vollset SE, Engesaeter
25.
Robertsson O. Knee arthroplasty registers. J
LB. Early aseptic loosening of uncemented femo- Bone Joint Surg Br. 2007;89:1–4. https://doi.
ral components in primary total hip replacement. A org/10.1302/0301-620X.89B1.18327.
review based on the Norwegian Arthroplasty Register. 26. Rolfson O, Bohm E, Franklin P, Lyman S, Denissen
J Bone Joint Surg Br. 1995;77:11–7. G, Dawson J, Dunn J, Eresian Chenok K, Dunbar
13.
Horan FT. Judging the evidence. J Bone M, Overgaard S, Garellick G, Lübbeke A, Patient-
Joint Surg Br. 2005;87:1589–90. https://doi. Reported Outcome Measures Working Group of
org/10.1302/0301-620X.87B12.17247. the International Society of Arthroplasty Registries.
14. Irgens LM. The origin of registry-based medical
Patient-reported outcome measures in arthroplasty
research and care. Acta Neurol Scand. 2012;126:4–6. registries: Report of the Patient-Reported Outcome
https://doi.org/10.1111/ane.12021. Measures Working Group of the International Society
15. Irgens LM, Bjerkedal T. Epidemiology of leprosy in of Arthroplasty Registries Part II. Recommendations
Norway: the history of The National Leprosy Registry for selection, administration, and analysis. Acta
of Norway from 1856 until today. Int J Epidemiol. Orthop. 2016;87:9–23. https://doi.org/10.1080/17453
1973;2:81–9. 674.2016.1181816.
16. Ivers N, Jamtvedt G, Flottorp S, Young JM, Odgaard- 27. Rolfson O, Eresian Chenok K, Bohm E, Lübbeke A,
Jensen J, French SD, O’Brien MA, Johansen M, Denissen G, Dunn J, Lyman S, Franklin P, Dunbar
Grimshaw J, Oxman AD. Audit and feedback: effects M, Overgaard S, Garellick G, Dawson J, The Patient-
on professional practice and healthcare outcomes. Reported Outcome Measures Working Group of the
In: The Cochrane Collaboration, editor. Cochrane International Society of Arthroplasty Registries.
Database of Systematic Reviews. Chichester: Wiley; Patient-reported outcome measures in arthroplasty
2012. registries: Report of the Patient-Reported Outcome
17. Knutson K, Lewold S, Robertsson O, Lidgren L. The Measures Working Group of the International Society
Swedish knee arthroplasty register. A nation-wide of Arthroplasty RegistriesPart I. Overview and ratio-
study of 30,003 knees 1976-1992. Acta Orthop Scand. nale for patient-reported outcome measures. Acta
1994;65:375–86. Orthop. 2016;87:3–8. https://doi.org/10.1080/174536
18. Lind M, Menhert F, Pedersen AB. The first results 74.2016.1181815.
from the Danish ACL reconstruction registry: epi- 28. Roos EM, Roos HP, Lohmander LS, Ekdahl C,
demiologic and 2 year follow-up results from 5,818 Beynnon BD. Knee Injury and Osteoarthritis
knee ligament reconstructions. Knee Surg Sports Outcome Score (KOOS)—development of a self-
Traumatol Arthrosc. 2009;17:117–24. https://doi. administered outcome measure. J Orthop Sports
org/10.1007/s00167-008-0654-3. Phys Ther. 1998;28:88–96. https://doi.org/10.2519/
19. Malchau H, Herberts P, Eisler T, Garellick G,
jospt.1998.28.2.88.
Söderman P. The Swedish total hip replacement regis- 29.
Røtterud JH, Sivertsen EA, Forssblad M,
ter. J Bone Joint Surg Am. 2002;84-A(Suppl 2):2–20. Engebretsen L, Årøen A. Effect on patient-reported
20. Maloney WJ. National Joint Replacement Registries: outcomes of debridement or microfracture of
has the time come? J Bone Joint Surg Am. concomitant full-thickness cartilage lesions in
2001;83-A:1582–5. anterior cruciate ligament-reconstructed knees:
21. Mygind-Klavsen B, Grønbech Nielsen T, Maagaard a nationwide cohort study from Norway and
N, Kraemer O, Hölmich P, Winge S, Lund B, Lind Sweden of 357 patients with 2-year follow-up.
M. Danish Hip Arthroscopy Registry: an epidemio- Am J Sports Med. 2016;44:337–44. https://doi.
logic and perioperative description of the first 2000 org/10.1177/0363546515617468.
procedures. J Hip Preserv Surg. 2016;3:138–45. 30. Singh J, Politis A, Loucks L, Hedden DR, Bohm
https://doi.org/10.1093/jhps/hnw004. ER. Trends in revision hip and knee arthroplasty
22. Naylor CD, Guyatt GH. Users’ guides to the medi- observations after implementation of a regional joint
cal literature. X. How to use an article reporting replacement registry. Can J Surg. 2016;59:304–10.
variations in the outcomes of health services. The https://doi.org/10.1503/cjs.002916.
Evidence-Based Medicine Working Group. JAMA. 31. de Steiger RN, Miller LN, Davidson DC, Ryan P,
1996;275:554–8. Graves SE. Joint registry approach for identification
23. Persson A, Fjeldsgaard K, Gjertsen J-E, Kjellsen AB, of outlier prostheses. Acta Orthop. 2013;84:348–52.
Engebretsen L, Hole RM, Fevang JM. Increased risk https://doi.org/10.3109/17453674.2013.831320.
Part VIII
How to Perform an Economic Health Care
Study?
How to Perform an Economic
Healthcare Study
40
Jonathan Edgington, Xander Kerman, Lewis Shi,
and Jason L. Koh
Orthopedic care often focuses on increasing The research question is the foundation of all parts
quality of life or increasing patient mobility of the study, and it is worth putting additional time
through improving function of the musculoskel- and effort into crystallizing before beginning
etal system. These tasks have proved costly, work. The question may arise out of personal
with Medicare estimates of orthopedic expendi- interest, gaps in literature, or translation of topics
tures surpassing $8.2 billion annually [14]. from other fields. Often deep thought about the
These high costs provide an incentive for clini- topic leads to a plethora of further questions or
cians to prove the value of their services. ambiguity about the purpose of the study. When
40 How to Perform an Economic Healthcare Study 375
dealing with potentially ambiguous topics like should be a theory or prediction of what the out-
costs and benefits in economic healthcare studies, come of the study may show. The hypothesis
it is important to clearly define the question in should be established well before data is col-
order to guide both planning and execution. lected or analyzed, meaning researchers should
avoid collecting data and then retrospectively
developing the hypothesis. It should be obvious
40.2.2 What Is the State of Current how the hypothesis will be tested given the cir-
Knowledge on the Topic? cumstances of the study. The hypothesis should
not only be valuable if found to be true but also
Knowing what other researchers have learned on should be valuable if found to be false. With a
the topic is important in order to prevent wasted clear research question and hypothesis, one can
effort duplicating results and to further focus the proceed with appropriate selection of an eco-
research question. While mainstream medical nomic evaluation for a given study.
journals increasingly publish healthcare economic
studies, economic- and quality improvement-
focused publications often feature these types of
40.3 Economic Evaluation
projects as well. Finding literature that addresses
and Application to Research
the same question does not preclude the investiga-
Question
tor from performing a study but is useful for com-
parison and learning from previous shortcomings.
Once a research question and a hypothesis have
Review of current literature will focus the question
been identified, the next step is identifying the
and justify prioritizing this question.
appropriate economic analytic method (Table
40.1). As seen in Table 40.2, there are four com-
40.2.3 Why Is the Question mon analytic approaches in healthcare econom-
Important? ics, each with advantages and disadvantages:
cost-minimization (CMA), cost-effectiveness
Having a solid understanding of the current lit- (CEA), cost-utility (CUA), and cost-benefit
erature and knowledge helps define the deficien- (CBA) analyses. All four rely on accurately quan-
cies in evidence and establish the importance of tifying costs but vary in their considerations of
the question. This will add purpose and meaning outcomes. Deciding on which best suits the study
to the study. Answering the question why would it is based on the research question, hypothesis, and
be helpful to know the answer to this question? purpose of the study, as well as the data
establishes how the study will help future clini- available.
cians, researchers, and policymakers. Cost may be quantified in a variety of ways
depending on the perspective of interest.
Common perspectives include total costs to soci-
40.2.4 Hypothesis ety (societal costs), costs to patients (patient
costs), and costs to third-party payors (payor
With this information the clinician can form an costs). Direct costs may measure medical or non-
explicit and a concise hypothesis. The hypothesis medical costs associated with treatment. Other
Table 40.2 Multiple sources of possible financial and cost data with examples in parenthesis
Source of cost or financial data
Individual patient records
Practice records
Hospital records
Healthcare system records (Kaiser Permanente)
Healthcare consortium databases (e.g., the National Surgical Quality Improvement Project, Vizient)
State or regional databases (New York State, California)
National government databases (e.g., in the United States, the National Inpatient Survey)
Private insurance databases (PearlDiver)
Outcomes Cost
Cost-minimization analysis (CMA) Assumed equivalent Dependent on
Cost-effectiveness analysis (CEA) Natural units (objective, quantity, OR quality) perspective of
Cost-utility analysis (CUA) Utility scale (subjective + objective, quantity, AND interest (patient,
quality) payor, societal)
Cost-benefit analysis (CBA) Monetary scale (subjective + objective, quantity,
AND quality)
intangible costs, such as opportunity costs, may easurement is the monetary value that patients
m
be considered in the quantification depending on would be willing to pay to achieve different out-
what is pertinent to the research question [7]. For comes and costs can be seen in Fact Box 40.2.
example, when a patient elects to undergo sur-
gery, the time spent recovering cannot be spent
earning income, and therefore lost wage income 40.3.1 Cost-Minimization Analysis
is an opportunity cost associated with the
surgery. Cost-minimization analysis (CMA) is used when
Outcomes may be quantified in a variety of the evaluation targets a difference in cost of inter-
ways as well. Objective clinical outcomes can be ventions with the assumption that outcomes are
defined very broadly based on the research topic, equivalent. In order for this technique to be valid,
ranging from functional benefits, improved out- equivalent outcomes must be clearly demon-
come scores, to morbidity. Subjective clinical strated comprehensively. The advantage of a
outcomes can also be measured in a variety of cost-minimization analysis is simplicity. The dis-
ways using validated instruments to assess advantage of CMA is establishing complete
patients’ perspectives of their health (e.g., equivalency is often difficult or impossible. Even
patient-reported outcome tools). Objective and if equivalent outcomes were felt to be true, there
subjective outcomes can also be combined is always the possibility that equivalence may
together in a variety of ways. One commonly actually be a failure to detect a difference in two
used combined measure is the quality-adjusted interventions. If outcomes cannot be proved
life year (QALY), which provides a standardized equivalent, cost-minimization analysis is not an
score based on not only remaining quantity of life appropriate economic evaluation technique.
but also the quality, or health state, of those years. Some economists argue that CMA techniques are
A variety of instruments exist to broadly measure very seldom appropriate, with possible excep-
health state on a 0 to 1 scale with 0 being death tions in rare cases where randomized control tri-
and 1 being perfect health. Another form of com- als have already demonstrated equivalent
bined subjective and objective outcome outcomes [2]. CMA are most frequently relevant
40 How to Perform an Economic Healthcare Study 377
for policy or third-party payor decisions on index that attempts to quantify all health out-
resource allocations where concerns of specific comes associated with an intervention. Most
outcomes come secondary to cost considerations. often quality-adjusted life years (QALY) is the
If one assumed the outcomes of hyaluronic acid index of choice which attempts to objectify
injections and corticosteroid injections were length of life as well as quality of life related to
equally effective, one may estimate cost differ- the intervention. A variety of instruments exist to
ence with a CMA evaluation. calculate QALYs by broadly estimating health
quality states on a 0 to 1 scale using a multi-attri-
bute utility model (MAU, 0 = no utility of life/
40.3.2 Cost-Effectiveness Analysis death and 1 = perfect health) and multiplying
this number by the life expectancy benefit of a
Cost-effectiveness analysis (CEA) is a method procedure: QALY = (MAU) * (LEB). Several
used to compare the costs and outcomes of one MAUs are commonly used including the
intervention versus the costs and outcomes of an EuroQoL-5 Dimensions, Health Utilities Index,
alternative. This type of evaluation provides a Short Form-6 Dimensions, Assessment of
value of additional units of cost per additional Quality of Life, 15 Dimensions, and Quality of
unit of benefit. The set of outcomes must be mea- Well-Being [13]. This type of analysis is highly
sured with the same units of effectiveness appropriate in orthopedics, as research in this
between both interventions. As seen in Table 40.3, field typically looks at increased quality of life
this leads to four possible outcomes. The advan- rather than simply increasing length of life. CUA
tage of CEA is the ability to directly compare using QALYs from the societal perspective is
costs and outcomes of two interventions. The dis- recommended by the Panel on Cost-Effectiveness
advantage of CEA is only one outcome may be in Health and Medicine convened by the US
measured even if the intervention may have mul- Public Health Service in 1996 and 2016 [15].
tiple relevant outcomes. For example, a total hip CUA has been effectively used to evaluate
arthroplasty may provide measurable outcomes QALY following total hip arthroplasty using
such as relief of pain, increased range of motion, QALYs [3–5].
and decreased narcotic usage, but only one can be
considered at a time in CEA. An excellent exam-
ple of this type of evaluation is demonstrated by 40.3.4 Cost-Benefit Analysis
Rajan et al. (2018) in their study on surgical fixa-
tion of distal radius fractures [10]. Because of Cost-benefit analysis (CBA) is an analysis that
this short coming, CEA does not fully provide an places a monetary value on both the cost of the
answer to the value question: Is the additional intervention and the outcomes of the interven-
collective benefit gained worth the extra cost? tion. This allows for an analysis that establishes a
net benefit of an intervention. This evaluation
attempts to comprehensively consider what
40.3.3 Cost-Utility Analysis patients value in their healthcare. The advantage
of this analysis is you have an objective net ben-
Cost-utility analysis (CUA) differs from CEA in efit which can answer the question is the addi-
that CUA outcomes measured using a health tional benefit gained worth the additional cost?
The disadvantage of this type of evaluation is
Table 40.3 An example of how comparator choice may assigning monetary values to clinical outcomes is
impact conclusions often difficult and sometimes controversial,
Cost Time Accuracy (%) requiring social science research expertise and
Osteotome and mallet $500 2 min 80 validation. An example of CBA in orthopedics
Hand saw $25 5 min 95 would be assigning dollar values to the net bene-
Hand rasp $25 4 h 99.9 fits gained following total hip arthroplasty, which
378 J. Edgington et al.
would allow a patient-centric conclusion about Table 40.4 Steps to performing an economic healthcare
the value of THA. study
Identify the relevant question
Understand current knowledge
40.4 Perspective Formulate hypothesis
Obtain relevant financial information
Obtain relevant outcome information
The perspective of the study is the lens through Perform appropriate statistical analysis
which the study is viewed and drives the assess- Report results
ments of cost and benefit. Relevant perspectives Discuss results with respect to question of interest
in healthcare economic studies include the
patients’, providers’, hospitals’, insurers’, or a in calculating costs depending on the methodol-
comprehensive societal perspective. The perspec- ogy used can be found in Palsis et al. (2018) [8].
tive can be thought of as the target audience for Several examples of cost or financial data are
any conclusions drawn from the study. It is listed in Table 40.4. In many cases, it can be dif-
important as the priorities of the different groups ficult or expensive to obtain cost or financial data.
likely differ significantly. Changing the perspec- For investigators, identifying an accurate
tive changes both the numerator and denominator source of information is critical. Typically, data
of the value equation because different elements that is derived more directly from clinical records
must be included depending on whose perspec- is more accurate than that derived from adminis-
tive is under consideration. For example, if an trative databases, which can be subject to errors
evaluation is executed from an insurer’s perspec- in coding. Nevertheless, valuable information
tive, the definition of cost would likely exclude can still be obtained from administrative
patients’ out-of-pocket expenses. In certain databases.
instances, it may be valuable to include multiple Additionally, data that is obtained exclusively
perspectives in each study, in which case results from one clinical site (e.g., a hospital) may not
of evaluations should be reported separately from accurately reflect the total costs of care, including
each perspective. items such as costs related to postoperative reha-
bilitation. As financial analysis of healthcare-
related costs becomes more sophisticated, there
40.5 Challenges is increasing ability to more fully understand
costs. In addition, allocation of costs can be done
Healthcare economic studies present several in multiple ways: they can be related to charges,
types of challenges not frequently encountered in known costs (such as for implants), or activity-
other projects. These challenges include diffi- based costing.
culty defining and acquiring data on costs and Finally, the economic data must be tied to
benefits, as well as the underlying issue of what clinical outcomes for patients. Assessment of
interventions are compared to one another. The clinical outcomes is a critically important area
conclusions reached in healthcare economic and is further discussed in other chapters.
studies also may be difficult to apply in practice Relating economic and outcome data should be
or be limited in their generalizability to other pro- performed with appropriate statistical rigor.
cedures or practice settings.
benefit or outcome information would be from a knee replacement. Studies may have limited clin-
double-blind randomized control trial (RCT). ical applicability in cases where cost differences
Few orthopedic RCT comparing surgical out- are statistically different but not clinically signifi-
comes exist, and therefore much of the orthope- cant. For example, suppose a study showed cost
dic outcomes data is observational [1, 6, 11]. savings of $50 per patient for anterior versus pos-
Observational studies can be pooled or meta- terior hip arthroplasty. This may not be signifi-
analyses can be used; however these introduce cant to the patient or payors if total costs are in
bias or error. Outcome data should be as accurate the $10,000–$20,000 range.
and applicable as possible, and as above, meth-
ods of obtaining the data should be explained in
detail. 40.5.5 Generalizability
3. Chang RW, Pellisier JM, Hazen GB. A cost-effective- ation of three operative modalities. J Bone Joint
ness analysis of total hip arthroplasty for osteoarthritis Surg Am. 2018;100:e13. https://doi.org/10.2106/
of the hip. JAMA. 1996;275:858–65. JBJS.17.00181.
4. Daigle ME, Weinstein AM, Katz JN, Losina E. The 11. Rajan PV, Qudsi RA, Wolf LL, Losina E. Cost-
cost-effectiveness of total joint arthroplasty: a sys- effectiveness analyses in orthopaedic surgery: raising
tematic review of published literature. Best Pract the bar. J Bone Joint Surg Am. 2017;99:e71. https://
Res Clin Rheumatol. 2012;26:649–58. https://doi. doi.org/10.2106/JBJS.17.00509.
org/10.1016/j.berh.2012.07.013. 12.
Reidpath DD, Olafsdottir AE, Pokhrel S,
5. Givon U, Ginsberg GM, Horoszowski H, Shemer Allotey P. The fallacy of the equity-efficiency
J. Cost-utility analysis of total hip arthroplasties. trade off: rethinking the efficient health system.
Technology assessment of surgical procedures by BMC Public Health. 2012;12:S3. https://doi.
mailed questionnaires. Int J Technol Assess Health org/10.1186/1471-2458-12-S1-S3.
Care. 1998;14:735–42. 13. Wisløff T, Hagen G, Hamidi V, Movik E, Klemp M,
6. Maniadakis N, Gray A. Health economics and ortho- Olsen JA. Estimating QALY gains in applied studies:
paedics. J Bone Joint Surg Br. 2000;82:2–8. a review of cost-utility analyses published in 2010.
7. Napper M, Newland J. Health economics information Pharmacoeconomics. 2014;32:367–75. https://doi.
resources: a self-study course: U.S. National Library org/10.1007/s40273-014-0136-z.
of Medicine; 2016. 14.
Trends in orthopaedics: an analysis of medi-
8. Palsis JA, Brehmer TS, Pellegrini VD, Drew JM, Sachs care claims, 2000–2010. https://www.healio.
BL. The cost of joint replacement: comparing two com/orthopaedics/journals/ortho/2013-3-36-
approaches to evaluating costs of total hip and knee 3/%7Bed4e1047-4f2f-46bc-9f01-65f36a2efd58%7D/
arthroplasty. J Bone Joint Surg Am. 2018;100:326– trends-in-orthopaedics-an-analysis-of-medicare-
33. https://doi.org/10.2106/JBJS.17.00161. claims-20002010. Accessed 13 Feb 2018.
9. Porter ME. What is value in health care? N Engl J 15. Recommendations from the second panel on cost-
Med. 2010;363:2477–81. https://doi.org/10.1056/ effectiveness in health and medicine | Guidelines |
NEJMp1011024. JAMA | The JAMA Network. https://jamanetwork.
10. Rajan PV, Qudsi RA, Dyer GSM, Losina E. The
com/journals/jama/article-abstract/2552214?redirect
cost-effectiveness of surgical fixation of distal =true. Accessed 15 Feb 2018.
radial fractures: a computer model-based evalu-
Part IX
Multi-Center Study: How to Pull It Off?
Conducting Multicenter Cohort
Studies: Lessons from MOON
41
José F. Vega and Kurt P. Spindler
However, it did not take long to appreciate that the same time, the VSM-CCF ACL Reconstruction
the outcome of an ACL reconstruction is influ- Registry gained a new partner in the form of The
enced by a myriad of factors beyond bone bruis- Ohio State University, which was maintaining its
ing and articular cartilage lesions, and, as a result, own ACL reconstruction database. This three-
54 patients would not be enough. This was also a institution cohort, containing 2286 ACL recon-
period of transition, as the use of primary ACL structions performed over a 10-year period,
repair was nearly abandoned, while the utiliza- yielded a total of eight publications on a wide
tion of ACL reconstruction was growing in popu- range of topics [2, 9, 10, 15, 22, 23, 29, 31].
larity. Thus, new questions were appearing at a What made this multicenter cohort so unique
rate much greater than we could answer. was not only its size (at the time, it was one of the
As the sun began to rise high in the midmorn- largest prospective ACL reconstruction cohorts
ing sky, the senior author and one of his Cleveland in the country) but also its use of patient-reported
Clinic colleagues (JTA) found themselves cycling outcome measures (PROMs) as primary end-
through the elevation changes of Sun Valley, points. The collection of preoperative and intra-
Idaho, mulling over potential ways to address the operative variables, along with the follow-up
seemingly endless list of questions surrounding PROMs, allowed for complex multivariable
ACL reconstruction. regression analysis to identify clinically relevant
It was then that they admitted, reluctantly, that predictors of outcomes at 5 years [31]. Although
the only feasible way to answer so many ques- difficult to imagine given the state of today’s lit-
tions would be to assemble another cohort, this erature, use of PROMs as a primary endpoint was
time with 10 or 20 times the number of patients tantamount to clinical research heresy in the early
they had enrolled in Cleveland. Naturally, the dis- and mid-2000s [12, 35]. Nonetheless, the devel-
cussion then turned to how they could possibly opers of this registry adopted two recently vali-
enroll and follow hundreds (and more likely dated PROMs to measure outcomes following
thousands) of patients undergoing a procedure ACL reconstruction—the International Knee
which, at the time, was being performed less than Documentation Committee Subjective Knee
90,000 times per year across the entire country Form (IKDC-SKF) and the Knee injury and
[3]. The answer, of course, was to develop a mul- Osteoarthritis Outcome Score (KOOS).
ticenter network.
struction (or the prognostic significance of pre- 3. What is the natural progression of posttrau-
and intraoperative variables), as there was no matic osteoarthritis, and what variables mod-
baseline PROM data which would be needed in ify the trajectory?
order to observe change over time. Thus, once
again, there was a need for a new cohort—one
that would be larger, comprised of patients from 41.3.3 Sample Size
more centers, and from which more variables
could be collected. Due to the fact that the co-investigators wanted to
assess long-term outcomes, as well as the develop-
ment of osteoarthritis, this new cohort would need
41.3 Designing MOON to be followed for at least 10 years. Since they also
wanted to identify predictors of graft failure, this
41.3.1 Assembling the Clinical new cohort would need to be large. Considering
Research Team that failure was seen in roughly 10% of the three-
institution ACL reconstruction registry, and that
The MOON principal investigator (PI) and co- multivariable logistic regression was to be done to
investigators recognized that designing a pro- identify significant predictors of graft failure,
spective longitudinal cohort would require a large MOON would need to enroll at least 2250 partici-
team comprised of individuals with clinical pants (15 participants per variable included in the
research expertise beyond orthopedic surgeons. logistic regression, multiplied by 15 variables to
They were aided by one of orthopedic surgery’s be included, multiplied by 10 because failure
original clinician-scientists—Sandy Kirkley— would likely occur in only 10% of the cohort).
who had specialized training in clinical research To enroll such a large number of patients in a
long before it became commonplace (although reasonable period of time, the designers of
still rare today, a master’s in public health [MPH] MOON set out to assemble a team of surgeons
or MPH equivalent was virtually unheard of in and various high-volume ACL reconstruction
orthopedics in the 1990s). In addition, we had institutions that could enroll roughly 600 patients
several experts in prospective trial and cohort per year. Building off of the VSM-CCF ACL
design, an epidemiologist, and a PhD biostatisti- reconstruction registry with the help of personal
cian with specialized training in multivariable connections, the final tally of MOON surgeons
analysis. Furthermore, they hired a program and sites came to 17 surgeons at 7 institutions
manager and several research assistants. Without (Cleveland Clinic Foundation, Vanderbilt
this expertise, all outside of orthopedics, MOON Orthopaedic Institute, The Ohio State University,
would have certainly not been successful at both University of Iowa, Washington University in St.
achieving NIH funding and maintaining high Louis, Hospital for Special Surgery, and
rates of longitudinal follow-up. University of Colorado).
The first step in designing the MOON cohort was In the same fashion as the co-investigators’ three-
to identify which questions were to be answered institutional ACL reconstruction registry, the
by MOON. Among the many options, the co- decision was made to use PROMs as a primary
investigators settled on three primary aims: endpoint. Again, they opted to utilize the IKDC
and the KOOS to track PROMs from baseline
1. What preoperative and intraoperative vari-
until the completion of the study (tentatively pro-
ables predict both short- and long-term out- jected to be 10 years post-op).
comes following ACL reconstruction? The decision to use PROMs as a primary
2. What predicts graft failure? endpoint was twofold. First, PROMs had been
386 J. F. Vega and K. P. Spindler
The next challenge to overcome was to develop a Fig. 41.1 Compaq iPAQ, 2004
data collection system that would not be overly
burdensome to patients and/or surgeons. With
some guidance from their late friend (SK), the classified (e.g., meniscal tear location, depth,
designers compiled a series of paper question- type, etc.) and then treated (e.g., meniscectomy
naires that captured all of the pertinent variables vs. meniscal repair). The group performed a pair
and trialed the Compaq iPAQ (Fig. 41.1) as a of inter-rater agreement studies to demonstrate,
means for electronic data capture. This was no in a scientific way, that their methodology clas-
small feat, especially considering that the Apple sifying each particular variable (specifically,
iPhone would not debut for another 6 years. meniscus tears and articular cartilage lesions)
Luckily, electronic databases such as REDCap was reliable across the group [6, 20].
are now available and make excellent options for A similar challenge emerged with respect to
collecting large amounts of data rapidly across the method in which each surgeon’s technique for
multiple sites. performing a typical ACL reconstruction. Again,
the group conducted two studies in which they
demonstrated that the differences between the
Clinical Vignette way in which each surgeon performed his pre-
In the early stages of MOON, the typical ferred ACL reconstruction were minimal and
work flow involved completion of paper likely clinically irrelevant [32, 33]. In the first
PROMs which were then faxed to a central study, 12 surgeons each performed 6 ACL recon-
location for data collection. As mentioned structions on cadaveric knees using their tech-
above, the iPAQ was trialed briefly, but, nique of preference (e.g., two-incision,
because of difficulties with reliable syncing trans-tibial, or anteromedial). The 72 knees were
and data upload, was abandoned quickly. then imaged in order to identify any significant
differences in tunnel characteristics (which none
were present) [33]. Four surgeons then went on to
duplicate the same study design in MOON
41.3.6 Agreement Studies patients, complete with postoperative CT scans
to assess each tunnel. Again, no differences were
While having so many fellowship-trained sur- identified [32].
geons was undoubtedly an advantage, managing The last obstacle to overcome with regard to
a group as large and experienced as that was not agreement was which graft type to use. Rather
without its own challenges. The most obvious than attempt to convince all of the participating
hurdle facing the group was how to come to an surgeons to perform their ACL reconstructions
agreement on how certain variables would be with BTB or hamstring (which would have been
41 Conducting Multicenter Cohort Studies: Lessons from MOON 387
impossible), the group showed, through a roughly half was provided by a combination of
systematic review, that no clinically relevant
unrestricted gifts (Smith & Nephew [$450,000]
difference exists between BTB and hamstring
and Aircast [$200,000]) and a grant from the
autograft [28]. In hindsight, this was actually a National Football League (NFL) Charities
judicious approach, as the inclusion of both BTB ($125,000). The other half was covered by an
and hamstring ACL reconstructions only provided internal tax placed on one of MOON’s designers
further data to conclusively demonstrate that no (KPS) and two of his partners (ECM, JK) at
clinically relevant difference exists between the Vanderbilt University ($450,000) and the remain-
two approaches when done correctly. ing money from an OREF Prospective Clinical
Research Grant ($149,000).
41.3.7 Funding
Fact Box 41.2
As expected, enrolling and following such a The award rate (defined as the number of
large cohort would come with significant cost awards given in a fiscal year divided by the
(e.g., the estimated annual cost of operationaliz- total number of applications reviewed,
ing MOON at Vanderbilt University alone was including new applications and resubmis-
nearly $200,000). sions from the previous review period) for
MOON enrolled and followed patients for NIH grant proposals between 1990 and 2014
nearly 3 full years before submitting its first fluctuated between 15 and 30%, although it
application for a National Institutes of Health has not exceed 20% since 2005 [24].
(NIH) RO1 grant in February 2004 (which, by
the way, was not selected for funding). By the
end of 2005 (still before MOON was funded by
the NIH), MOON had enrolled 2340 patients. At 41.4 MOON Becomes a Reality
2 years post-op, 93% had phone follow-up, and
85% had returned completed PROMs. Of course, Finally, in January of 2002, MOON was ready to
achieving such high follow-up rates for a cohort enroll its first patient. But the challenges did not
as large as that came with a large price tag. stop there. It was only because of teamwork and
MOON had four full-time employees devoted to dedication that MOON was able to become a real-
seeking follow-up (one employee could follow ity. New and/or ongoing concerns were discussed
roughly 600 patients per year), and, still, MOON during monthly conference calls (a call that has
surgeons had to personally call 25% of their taken place on the second Monday of each month
patients in order to get them to follow-up. The for the last 17 years and continues to this day).
nested cohort within MOON (aimed at under- Each site continues to maintain an active institu-
standing the development of posttraumatic osteo- tional review board (IRB) application.
arthritis) also came with significant expense, Since it received initial funding in 2006,
which included training radiology personnel at MOON has successfully renewed its funding on
three separate sites, specialized bilateral weight- three separate occasions, allowing for 6- and
bearing radiographs, and blinded evaluation by 10-year patient follow-up. Currently, MOON has
surgeons and physical therapists. It was not until >4400 ACL reconstructions in its database and
2006, after three submissions and three rejec- has achieved >80% follow-up at 2, 6, and 10 years
tions, that MOON received NIH funding. post-op [5, 13, 27].
In total, MOON cost roughly $1.4 million to To date, MOON has produced more than 40
operate between 2001 (when the first patient was peer-reviewed publications and has laid the foun-
enrolled) and 2006 (when MOON was first dation for other multicenter orthopedic research
awarded grant funding). Of the $1.4 million, groups [4, 18, 19, 34].
388 J. F. Vega and K. P. Spindler
19. Lynch TS, Parker RD, Patel RM, et al. The impact reconstruction: a multicenter cohort study. Orthop J
of the Multicenter Orthopaedic Outcomes Network Sports Med. 2017;5(7 Suppl 6):2325967117S00247.
(MOON) research on anterior cruciate ligament recon- https://doi.org/10.1177/2325967117S00247.
struction and orthopaedic practice. J Am Acad Orthop 28. Spindler KP, Kuhn JE, Freedman KB, Matthews CE,
Surg. 2015;23(3):154–63. https://doi.org/10.5435/ Dittus RS, Harrell FE. Anterior cruciate ligament
JAAOS-D-14-00005. reconstruction autograft choice: bone-tendon-bone
20. Marx RG, Connor J, Lyman S, et al. Multirater agree- versus hamstring: does it really matter? A systematic
ment of arthroscopic grading of knee articular carti- review. Am J Sports Med. 2004;32(8):1986–95.
lage. Am J Sports Med. 2005;33(11):1654–7. https:// 29. Spindler KP, McCarty EC, Warren TA, Devin C,
doi.org/10.1177/0363546505275129. Connor JT. Prospective comparison of arthroscopic
21. Odensten M, Lysholm J, Gillquist J. Suture of
medial meniscal repair technique: inside-out suture
fresh ruptures of the anterior cruciate ligament. A versus entirely arthroscopic arrows. Am J Sports Med.
5-year follow-up. Acta Orthop Scand. 1984;55(3): 2003;31(6):929–34. https://doi.org/10.1177/0363546
270–2. 5030310063101.
22. Paul JJ, Spindler KP, Andrish JT, Parker RD, Secic 30. Spindler KP, Schils JP, Bergfeld JA, et al. Prospective
M, Bergfeld JA. Jumping versus nonjumping anterior study of osseous, articular, and meniscal lesions
cruciate ligament injuries: a comparison of pathology. in recent anterior cruciate ligament tears by mag-
Clin J Sport Med. 2003;13(1):1–5. netic resonance imaging and arthroscopy. Am
23. Piasecki DP, Spindler KP, Warren TA, Andrish JT, J Sports Med. 1993;21(4):551–7. https://doi.
Parker RD. Intraarticular injuries associated with org/10.1177/036354659302100412.
anterior cruciate ligament tear: findings at liga- 31. Spindler KP, Warren TA, Callison JC, Secic M, Fleisch
ment reconstruction in high school and recreational SB, Wright RW. Clinical outcome at a minimum of
athletes. An analysis of sex-based differences. Am J five years after reconstruction of the anterior cruciate
Sports Med. 2003;31(4):601–5. https://doi.org/10.11 ligament. J Bone Joint Surg Am. 2005;87(8):1673–9.
77/03635465030310042101. https://doi.org/10.2106/JBJS.D.01842.
24. Rockey S. What are the chances of getting funded? 32. Wolf BR, Ramme AJ, Britton CL, Amendola A,
NIH Extramur Nexus. June 2015. https://nexus. MOON Knee Group. Anterior cruciate ligament
od.nih.gov/all/2015/06/29/what-are-the-chances-of- tunnel placement. J Knee Surg. 2014;27(4):309–17.
getting-funded/. Accessed Feb 1 2018. https://doi.org/10.1055/s-0033-1364101.
25. Roos EM, Roos HP, Lohmander LS, Ekdahl C,
33. Wolf BR, Ramme AJ, Wright RW, et al. Variability
Beynnon BD. Knee Injury and Osteoarthritis in ACL tunnel placement: observational clini-
Outcome Score (KOOS)—development of a self- cal study of surgeon ACL tunnel variability. Am
administered outcome measure. J Orthop Sports Phys J Sports Med. 2013;41(6):1265–73. https://doi.
Ther. 1998;28(2):88–96. https://doi.org/10.2519/ org/10.1177/0363546513483271.
jospt.1998.28.2.88. 34. Wright RW, Huston LJ, Haas AK, et al. Effect of
26. Siljander MP, McQuivey K, Fahs A, Galasso L,
graft choice on the outcome of revision anterior
Serdahely K, Karadsheh MS. Current trends in cruciate ligament reconstruction in the Multicenter
patient reported outcome measures in total joint ACL Revision Study (MARS) Cohort. Am J
arthroplasty: a study of four major orthopedic jour- Sports Med. 2014;42(10):2301–10. https://doi.
nals. J Arthroplasty. 2018. https://doi.org/10.1016/j. org/10.1177/0363546514549005.
arth.2018.06.034. 35. Zarins B. Are validated questionnaires valid? J Bone
27. Spindler KP, Huston LJ. O’Donoghue sports injury Joint Surg Am. 2005;87(8):1671–2. https://doi.
award 10 year outcomes and risk factors after ACL org/10.2106/JBJS.E.00554.
MARS: The Why and How of It
42
Rick W. Wright, Amanda K. Haas,
and Laura J. Huston
cohort early in the MOON experience. In a series score, it is assumed on a 16-point scale that 2
of ACL reconstructions reported at minimum points (representing more than a 10% change or
5-year follow-up, revision was the strongest pre- difference) would be clinically significant. The
dictor for worse outcome across the board, but in IKDC at 2 years was 85.6 for primary ACL recon-
the editing process the journal reviewers and edi- structions and 79.6 for revisions, which was sta-
tor requested that the revisions be removed from tistically different (p = 0.005), but not clinically
the study [9]. significant (MCID: 11.5 points). For the 5 KOOS
Unfortunately, little Level 1 or 2 evidence subscales, a difference of 8–10 points is clinically
existed to help us confirm this discrepancy in out- significant. In this study, the KOOS Knee-related
comes between primary and revision ACL- Quality of Life (KRQOL) subscale was lower in
reconstructed patients. In a mixed model revisions (75 vs. 62.5) at 2 years (p < 0.001). The
meta-analysis of 21 studies with minimum 2-year KOOS Sports and Recreation subscale also was
follow-up after revision reconstruction, these worse for revisions (85 vs. 75; p = 0.004) at
worse results were demonstrated [13]. Of the 21 2 years. KOOS Pain was lower in revisions and
studies, however, only 4 were Level 1 or 2, while potentially clinically significant (91.7 vs. 83.3).
1 was Level 3, and 16 were Level 4 studies. KOOS Symptoms (85.7 vs. 78.6) and ADLs (98.7
Objective failure (defined as a re-revision, vs. 97.1) did not demonstrate a clinically signifi-
KT-1000 >5 mm, or a positive pivot shift) cant difference between primary and revision
occurred in 13.7 ± 2.7%—much higher than typi- reconstructions.
cal primary failure rates. Patient-reported out- Thus, this represented a prospectively col-
comes were worse than expected compared with lected cohort of primary and revision ACL recon-
primary ACL results and usually exceeded the structions evaluated identically by validated
known clinically important difference for these patient-reported outcome measures with worse
outcome scores. scores across the board for revisions, but with no
Given these findings, the Multicenter obvious factors contributing to these worse out-
Orthopaedic Outcomes Network (MOON) Group comes. Also, revisions represented only 10% of
reviewed their prospectively collected cohort the cohort. To perform multivariable analysis
which began as a mixed primary and revision requires ~10–15 subjects per variable assessed,
patient ACL reconstruction cohort [10, 12]. One and with the multiple factors (50–75 or more)
working hypothesis for the study was that revision potentially contributing to revision ACL recon-
ACL reconstruction results in worse outcome struction outcomes, it would require quick assim-
compared to primary reconstruction, as measured ilation of 750–1000 patients. It became apparent
by validated patient-based outcome measures the MOON Group, with less than 20 members,
including the Marx activity level, Knee injury and could not enroll an adequate number of patients
Osteoarthritis Outcome Score (KOOS), and quickly enough for this type of study. As we jok-
International Knee Documentation Committee ingly say, “a simple moon could not get it done,
Subjective form (IKDC). 487 ACL reconstruc- we needed a planet.” With this in mind, we set out
tions met the following inclusion/exclusion crite- to establish a larger group of interested sports
ria: (1) all meniscal/chondral treatment included medicine surgeons.
and (2) osteotomy, posterior cruciate ligament We felt the basic approach utilized by the
(PCL), or collateral ligament surgery excluded. MOON Group would be appropriate for evaluat-
408/487 (84%) were available at minimum 2-year ing the revision patient but realized there were dif-
follow-up with 39/47 (83%) revisions available at ferent factors involved in revision surgeries that
2-year follow-up. At 2 years, median Marx scores would need to be captured. A small group devel-
had dropped from 12 to 9 points in the primary oped a standard operating procedure (SOP) man-
reconstructions vs. 10 to 6 points in revisions ual that outlined rules of engagement for surgeons
(p = 0.009). While the minimally clinically impor- and patients. We very early on engaged the
tant difference (MCID) is not known for the Marx American Orthopaedic Society for Sports
42 MARS: The Why and How of It 393
Medicine (AOSSM), through the late Bart Mann, After obtaining informed consent, the patient
PhD, the research director. He, the AOSSM filled out a 13-page questionnaire that included
Research Committee, and the AOSSM society questions regarding demographics, sports partici-
were excited about the chance to offer the research pation, injury mechanism, comorbidities, and
opportunity to their members. Based on this we knee injury history. Within this questionnaire,
utilized their website and email to advertise to each participant also completed a series of vali-
members to participate. Once interested members dated general and knee-specific outcome instru-
were identified, we set up three meetings to edu- ments, including the Knee injury and
cate surgeons and describe how the study would Osteoarthritis Outcome Score (KOOS), the
proceed. We also engaged these attendees in International Knee Documentation Committee
designing forms and determining the variables we Subjective form (IKDC), and the Marx activity
would collect. Over 100 members originally rating scale. Contained within the KOOS was the
expressed interest, and we currently have 83 sur- Western Ontario and McMaster Universities
geons participating at 52 IRB-approved sites. The Osteoarthritis Index (WOMAC). This was filled
surgeons are a near 50/50 mix of academic and out preoperatively and at planned postoperative
private practice surgeons, adding to our generaliz- follow-up points of 2, 6, and 10 years. Patients
ability of results [5]. were paid $20 for filling out the questionnaire
In determining study design, we debated over each time. There were no other patient incentives
a prospective cohort design versus a randomized since the treatment was standard care for a revi-
trial. Ultimately, we believed we did not know a sion ACL reconstruction. Surgeons filled out a
single critical variable to randomize and felt a 42-page questionnaire at the time of the revision
cohort to determine predictors would best serve a surgery that included the impression of the etiol-
revision series with its rich number of potential ogy of the previous failure, physical exam find-
factors. As compared to primary reconstructions, ings, surgical technique utilized, the intra-articular
additional variables involved all of the previous findings, and surgical management of meniscal
reconstruction issues such as tunnel position, and chondral damage.
widening, graft choice, fixation, meniscus and Two-year patient follow-up was completed by
chondral procedures, etc. We thus chose a pro- mail with readministration of the same question-
spective longitudinal cohort similar to the naire as the one they completed at baseline.
Framingham study for cardiovascular disease Patients were also contacted by phone to deter-
many years ago [1]. mine whether any subsequent surgery had
Surgeon inclusion was based on AOSSM occurred to either knee since their initial revision
membership, attendance at an introduction meet- ACL reconstruction. If so, operative reports were
ing, and a willingness to follow the procedural obtained, whenever possible, in order to docu-
issues identified in the SOP. This included utiliz- ment pathology and treatment.
ing a Musculoskeletal Transplant Foundation Completed data forms were mailed from each
graft if allograft was chosen. In the introductory participating site to the data coordinating center.
meetings, we determined that radiographs would Data from both the patient and surgeon question-
be a critical data point and made a required and naires were scanned with Teleform™ software
recommended x-ray list. Required baseline radio- (Cardiff Software, Inc., Vista, CA) utilizing opti-
graphs included a bilateral standing AP view and cal character recognition, and the scanned data
a full extension lateral view. Additional recom- was verified and exported to a master database.
mended views, if they were taken as part of their Teleform paper forms were utilized so data
standard clinical care, included a bilateral bent was available in real time. If starting today we
knee weight-bearing view at 45° (Rosenberg), a would utilize electronic data capture, preferably
bilateral patellofemoral view (Sunrise or REDCap, in 2006, it was felt this was best prac-
Merchant), and a bilateral long leg standing x-ray tice. Data was housed at Vanderbilt similar to
for alignment. MOON. Our data capture was relatively amazing
394 R. W. Wright et al.
with >99% of all patient and surgeon data points least annually and provides advice and over-
completed when assessed on our first 900 patients sight for the study. This has included what
(Table 42.1). We relied on MOON personnel research studies should be performed via an
originally, but quickly identified the need for a application form by members. This has been
national coordinator to assist Laura Huston, and critical for governance issues. The makeup is
Amanda Braun joined our team and was centered balanced by geography, gender, and practice
at the coordinating center at Washington type (Table 42.2).
University in St. Louis. Our coordinator was ini-
tially funded by personal research funds and
industry support that had been donated and 42.2 Findings of MARS
through the AOSSM. To truly run this study with
the personnel necessary depended upon a larger 1215 patients were enrolled in the study. Median
sustained grant process such as the NIH or age was 26 with a range from 12 to 63 years.
Department of Defense, which was the impetus 505 (42%) were female. For 87% it was their
to pursue NIH funding. Once 2-year follow-up first revision [5], while 13% were undergoing
began, we added a full-time follow-up coordina- their second or higher revision ACL reconstruc-
tor that contacts all patients to send out and obtain tion. Seventy-three percent were injured while
questionnaires and perform phone follow-up. playing a sport. The most common sports were
Each site manages their site with their personal soccer and basketball that involve both genders.
health care personnel or research personnel. For the first decade of the modern ACL recon-
Beyond initial enrollment there is little personnel struction with appropriate grafts and tunnel
involvement for the sites. Staff developed an location, the etiology of ACL graft failure was
electronic newsletter (Fig. 42.1) that arrived felt to be technical issues [6] (Tables 42.3 and
monthly to all surgeons and coordinators and 42.4). In this cohort traumatic failure was felt to
acted as an impetus to stimulate patient enroll- be more common, which may reflect two issues:
ment listing total and monthly enrollment figures (1) improved technical ability with improved
for surgeons. Everyone wanted to appear on the training and education and (2) surgeons self-
Top 10 list. reporting on their own failures in this cohort.
Keys that helped us early on were frequently 334 (28%) were the surgeons own failures.
based on our experience with MOON and Associated surgeries included high tibial oste-
included the use of conference calls and the abil- otomies (n = 21), medial meniscus transplants
ity to communicate by email which was relatively (n = 34), and lateral meniscus transplants
new at the time. The group’s experience with IRB (n = 10).
helped new sites get approval, and the knowledge The cause of technical failure was most com-
regarding grants helped immensely in obtaining monly felt to be due to the femoral tunnel
funding. (Table 42.5) with the tibial tunnel the second
We developed a Scientific Advisory Board most common cause. Seldom was fixation felt to
for advice. This eight-member Board meets at be an issue. Varus and valgus malalignment was
42 MARS: The Why and How of It 395
Table 42.2 Time from last ACL reconstruction Table 42.6 Previous approach
# % # %
<1 year 149 12 One incision 972 80
1–2 years 389 32 Two incisions 201 17
>2 years 654 55 Trad’l arthrotomy 12 1
Unknown 8 <1 Mini arthrotomy 10 <1
Total 1200 100 Blank 5 <1
Total 1200 100
18 previous double bundle
Table 42.3 Cause of failure
# %
Traumatic 671 56 Table 42.7 Previous graft source
Technical 615 51 Autograft Allograft
Biologic 325 27 BTB 485 (40%) 133 (11%)
Other 35 3 HS STG 285 (24%) 14 (1%)
Blank 2 <1 HS ST 27 (2%) 3 (<1%)
Note: 36% (427/1200) of these responses listed a combi- Quad 4 (<1%)
nation of these components as reason for failure ITB 3 (<1%)
The denominator is >100% due to the multiple choice Achilles 39 (3%)
option of this question (surgeons were instructed to Ant tib 61 (5%)
“check all that apply”) Post tib 13 (1%)
Other/unk 5 (<1%) 73 (6%)
Blank 1 (<1%)
Table 42.4 Venn diagram of cause of failure
Combined 6 (<1%) 8 (<1%)
TRAUMATIC 815 (68%) 345 (29%)
DOCTOR.IMP
ACLREV.PRIOR.ACLGRAFT.IMP
ACLREV.NO.IMP
AGE
ACLREV.SURGEON.FAIL
ACLREV.PRIOR.SURG.TECH.IMP
WORK.STATUS.IMP
ACLREV.SURG.OPIN.FAIL.IMP
SEX
LEVELH.IK.IMP
ACLREV.PRIOR.TIB.FIX.IMP
MCLPM.REP
ACLREV.PRIOR.FEM.FIX.IMP
PCL.RECON
ACLREV.PRIOR.FEM.TUNNEL.POS.IMP
LEVEL1.IK.IMP
BMI
ACLREV.CAUSE.OF.FAIL.IMP
ACLREV.PRIOR.ACLGRAFT.SOURCE.IMP
LCLPL.REP
SMOKE
PreviousSurgeryContralateral
SCHOOL.YRS.IMP
ACLREV.PRIOR.FEM.TUNNEL.TECH
MARITAL.IMP
ACLREV.PRIOR.TIB.TUNNEL.POS.IMP
ETHNICITY.IMP
MARX.t0
SPORT1.IK
0 20 40 60 80
χ2 – df
Table 42.11 Meniscal and chondral damage was the impact of trochlear groove chondrosis
Articular cartilage (Table 42.13).
pathology
Normal Abnormal
Meniscal Normal 109 (9%) 146 (12%) 42.2.3 Surgical Factors
pathology Abnormal 226 (19%) 719 (60%)
Surgical factors were analyzed to determine
and articular cartilage damage on patient out- impact on patient-reported outcome measures at
comes at 2 years [2]. Previous lateral meniscec- 2 years [7]. A variety of factors were analyzed,
tomy prior to the time of revision significantly and in many cases, it is difficult to determine
resulted in worse patient-reported outcomes intuitively why certain factors impacted outcome.
(Table 42.12) and previous medial meniscectomy With regard to surgical approach, having under-
less so. Grade 2 or worse articular cartilage dam- gone a prior arthrotomy decreased IKDC scores
age grade also impacted patient-reported out- (p = 0.037, OR = 2.43) and decreased all KOOS
comes at 2 years. The most significant finding subscales (p < 0.05, OR range = 2.38–4.35).
42 MARS: The Why and How of It 399
A double femoral tunnel resulted in worse KOOS Use of biologics and bone grafting was ana-
QOL scores (p = 0.027, OR = 3.13). An ideal lyzed. Use of biologic enhancement in the revision
tibial position that was not enlarged resulted in setting resulted in lower 2-year MARX activity
worse KOOS, WOMAC, and IKDC scores level scores (p = 0.025, OR 1.79). Utilizing femo-
(p = 0.001–0.03, OR 1.19–2.68). Using a femoral ral bone grafts resulted in lower MARX activity
tunnel declared “optimum” vs. drilling an entirely scores at 2 years (p = 0.048, OR = 2.04).
new femoral tunnel resulted in worse KOOS Conversely, not bone grafting the tibia resulted in
QOL scores (p = 0.025, OR 1.79). Undergoing a worse KOOS Pain scores (p = 0.046, OR 1.95) and
notchplasty decreased KOOS, IKDC, and WOMAC Pain scores (p = 0.004, OR 3.31).
WOMAC scores (p = 0.013–0.034, OR 1.40–
1.49). Factors that did not impact outcome
included blended tunnels and knee position at 42.2.4 Rehabilitation Factors
time of graft fixation.
Graft fixation as a surgical factor impacting Rehabilitation factors were analyzed regarding
outcome was also analyzed. Femoral fixation their potential impact on revision ACL recon-
with a metal screw had better 2-year outcomes in struction outcomes [3]. Two rehabilitation factors
KOOS and WOMAC scores (p = 0.01–0.05, OR predicted outcome: (1) use of an ACL derotation
1.41–1.96) when compared to using a bioabsorb- brace for return-to-sport had better KOOS sports/
able screw, cross-pin, or combination. Tibial fixa- rec scores at 2 years (odds ratio = 1.50; 95%
tion other than metal screw resulted in worse CI = 1.07–2.11; p = 0.019) and (2) use of an ACL
IKDC (p = 0.017, OR 1.67) and WOMAC stiff- derotation brace for postoperative rehabilitation
ness (p = 0.013, OR 1.72) scores. period were 2.3 times more likely to have a
400 R. W. Wright et al.
least one patient and to require surgeon enroll- to deal with multiple recommendations for
ment logs be submitted in order to verify that improving the manuscript, but the final product is
they were enrolling 80–90% of the patients in the always improved.
study that were eligible. Additionally, I would
ask and expect that they would assist in contact-
ing and helping follow their patients. 42.3.5 Peer-Reviewed Journal
Submissions
and the MARS patients may not be their top pri- Multicenter ACL Revision Study (MARS) cohort. Am
J Sports Med. 2010;38:1979–86.
ority. Personally, it has been difficult enough that 6. Johnson DL, Swenson TM, Irrgang JJ, Fu FH, Harner
I would never use sites again in a multicenter CD. Revision anterior cruciate ligament surgery:
study that required site vs. central follow-up. experience from Pittsburgh. Clin Orthop Relat Res.
1996;100–9.
7. MARS Group, Allen CR, Anderson AF, Cooper DE,
DeBerardino TM, Dunn WR, et al. Surgical predic-
42.4 Conclusions tors of clinical outcomes after revision anterior cru-
ciate ligament reconstruction. Am J Sports Med.
While there have been multiple challenges 2017;45:2586–94.
8. Spindler KP, Kuhn JE, Freedman KB, Matthews CE,
encountered in the MARS study, they have for Dittus RS, Harrell FE Jr. Anterior cruciate ligament
the most part been surmountable. The level of reconstruction autograft choice: bone-tendon-bone
research and the questions that can be asked and versus hamstring: does it really matter? A systematic
answered in a cohort of this size and type is review. Am J Sports Med. 2004;32:1986–95.
9. Spindler KP, Warren TA, Callison JC Jr, Secic M,
unmatched by any other approach. We believe the Fleisch SB, Wright RW. Clinical outcome at a
study design and scaffolding we have developed minimum of five years after reconstruction of the
for this type of truly multi-surgeon multicenter anterior cruciate ligament. J Bone Joint Surg Am.
research can be a model for future groups. 2005;87:1673–9.
10. Wright R, Spindler K, Huston L, Amendola A, Andrish
J, Brophy R, et al. Revision ACL reconstruction out-
comes: MOON cohort. J Knee Surg. 2011;24:289–94.
11. Wright RW, Dunn WR, Amendola A, Andrish JT,
References Bergfeld J, Kaeding CC, et al. Risk of tearing the
intact anterior cruciate ligament in the contralateral
1. Cook NR, Moons KG, Harrell FE Jr. Assessing pre- knee and rupturing the anterior cruciate ligament graft
dictive performance beyond the Framingham risk during the first 2 years after anterior cruciate ligament
score. JAMA. 2010;303:1368–9; author reply 1369. reconstruction: a prospective MOON cohort study.
2. Group M. Meniscal and articular cartilage predic- Am J Sports Med. 2007;35:1131–4.
tors of clinical outcome following revision anterior 12. Wright RW, Dunn WR, Amendola A, Andrish JT,
cruciate ligament reconstruction. Am J Sports Med. Flanigan DC, Jones M, et al. Anterior cruciate liga-
2016;44:1671–9. ment revision reconstruction: two-year results from
3. MARS Group (Wright RW corresponding author). the MOON cohort. J Knee Surg. 2007;20:308–11.
Surgical predictors of clinical outcome after revision 13. Wright RW, Gill CS, Chen L, Brophy RH, Matava
anterior cruciate ligament reconstruction. Am J Sports MJ, Smith MV, et al. Outcome of revision anterior
Med. 2017;45(11):2586–94. cruciate ligament reconstruction: a systematic review.
4. Group M, Group M. Effect of graft choice on the J Bone Joint Surg Am. 2012;94:531–6.
outcome of revision anterior cruciate ligament 14. Wright RW, Magnussen RA, Dunn WR, Spindler
reconstruction in the Multicenter ACL Revision KP. Ipsilateral graft and contralateral ACL rupture
Study (MARS) Cohort. Am J Sports Med. 2014;42: at five years or more following ACL reconstruc-
2301–10. tion: a systematic review. J Bone Joint Surg Am.
5. Group M, Wright RW, Huston LJ, Spindler KP, Dunn 2011;93:1159–65.
WR, Haas AK, et al. Descriptive epidemiology of the
Multicenter Study: How to Pull It
Off? The PIVOT Trial
43
Eleonor Svantesson, Eric Hamrin Senorski,
Alicia Oostdyk, Yuichi Hoshino,
Kristian Samuelsson, and Volker Musahl
Fig. 43.1 The KiRA inertial sensor system for quantifying lateral tibial acceleration during the pivot shift test
43 Multicenter Study: How to Pull It Off? The PIVOT Trial 405
Fig. 43.2 Image analysis system on iPad for quantifying lateral tibial translation during the pivot shift test
ing on testing the hypothesis. As honest research- beginning. To be clear about this before knowing
ers, we need to be aware that this type of research the results was, from our point of view, important
is biased, and also, that this type of approach to for promoting a collaborative environment and
research surely exists among published literature. avoid potential future friction in the group.
In a multicenter trial, it is important to clearly
frame the purpose of the study. If all participating
centers are in agreement regarding the aims of the Fact Box 43.3
study and what is expected and relevant prior to It is essential that the aim and hypothesis are
study start, not only can unnecessary disputes and clearly formulated together with the specific
misunderstandings be avoided, but it also ascer- research questions prior to start of the study.
tains that contribution to publication bias is lim- The agreement of the research questions in
ited. It is important that the purpose and research the PIVOT trial was accomplished through
questions are not too vague, but clear and answer- an open communication in which all centers
able. The agreement of the research questions in participated and shared their opinions until a
the PIVOT trial was accomplished through an consensus was reached.
open communication in which all centers partici-
pated and shared their opinions until a consensus
was reached. Thereafter, the aims and research
questions were documented in the manual of 43.5 R
ecruitment Plan and Study
operations and procedures to function as the over- Population
all common guide during study performance. The
PIVOT trial had three specific aims, all accompa- As for all studies, the study protocol needs to be
nied with a hypothesis and specific research clear regarding the inclusion and exclusion
questions. criteria. Furthermore, a standardized way of
Another important factor for a well-function- recruiting patients needs to be established in
ing collaboration was that it was decided what order to avoid selection bias. It is important to
aim each center was primarily responsible for, acknowledge that each center may have different
i.e., each center was appointed to take the lead in routines for this purpose, especially in a situation
the manuscript writing for specific aims from the where centers from three continents are included.
43 Multicenter Study: How to Pull It Off? The PIVOT Trial 407
Influencing factors may include legal policies, good study since potential bias may be intro-
cultural factors, volume of the patients, and the duced. It was decided that all patients would be
participating researchers’ own preferences. evaluated at 3, 6, 12, and 24 months after ACL
Therefore, this part of the study was discussed reconstruction, and a detailed section for sched-
thoroughly among the participating sites with uled follow-up evaluation was written. For each
focus on how to include a population that could follow-up occation, an exact list of what vari-
answer our research questions while eliminating ables to collect and how to collect these was cre-
as many confounding factors as possible. Strict ated. All sites obtained standardized
inclusion and exclusion criteria were written questionnaires and protocols for collection of
down in the manual of operation. Moreover, a data. Moreover, specific information of all
standardized screening process was established patient-reported outcome measures and clinical
and a power analysis, taking into account an tests were distributed to the clinics. Due to the
anticipated loss to follow-up, was performed to language differences among each country in the
yield the total number of patients to be recruited. PIVOT trial, forms to be completed by the
In agreement of the collaborators, a definition of patients were translated. A specific person was
evaluable patients was established, and it was assigned at the lead site to periodically check the
also decided on how to handle patient with- status of follow-up in each center and send
drawal. As clinicians we are all aware of the hec- notices for upcoming follow-up patients. This
tic clinical work and the limited time for patient person plays a key role to ensure that no patients
consultation, which could contribute to acciden- are missed during the follow-up phase of the
tal deviation from the strict eligibility criteria. To study.
minimize such events, and to further standardize
the recruitment screening, special interview
forms were developed as well as a checklist for Fact Box 43.5
inclusion and exclusion criteria. Specific and clearly stated endpoints were
formulated for each aim of the study. It was
decided that all patients would be evaluated
Fact Box 43.4 at 3, 6, 12, and 24 months after ACL recon-
A standardized way of recruiting patients struction, and a detailed section for sched-
needs to be established in order to avoid uled follow-up evaluation was written.
selection bias. The manual of operation
included strict inclusion and exclusion cri-
teria, a definition of evaluable patients, and 43.7 Standardization of Clinical
how to handle patient withdrawal. To fur- Testing
ther standardize the recruitment screening,
special interview forms were developed as A multicenter trial entails that several examin-
well as checklists. ers are involved in the trial. Therefore, it is
important to undertake every possible action to
standardize the exams and to have a common
approach to the clinical testing. When planning
43.6 T
ime Line and Follow-Up for the PIVOT trial, the main goal was to create
Schedule a standardized approach for data collection.
A lot of effort has been undertaken to ensure
It was decided that the PIVOT trial would exam- that a strict training plan existed for clinicians
ine patients over a 24-month period. An impor- that were to be involved in the examination of
tant factor for the study purpose was that specific patients. The most challenging clinical testing
and clearly stated endpoints were formulated for facing the PIVOT trial was the execution of the
each aim of the study. Without defined study pivot shift test.
endpoints, it is simply not possible to conduct a
408 E. Svantesson et al.
The pivot shift test is a dynamic knee laxity test institution has an internal process for reviewing
which has been referred to as the most specific test research trials for ethical purposes. With a study
for detection of ACL deficiency [2]. However, there across four international centers, the legal stan-
are several execution techniques described for the dards regarding research conducted in each coun-
test [1, 14, 16], which entails that the test is limited try must be addressed as well. Additionally, with
by intra- and inter-variability [8, 9]. In an attempt to studies recruiting human subjects, issues of
overcome this, we described and analyzed the informed consent must be addressed. For the
results of a standardized execution technique for PIVOT trial, the lead site developed protocols,
the pivot shift test among surgeons prior to study data collection forms, and an informed consent
start [6, 14]. Moreover, a separate section in the template to be used to obtain all necessary
manual of operations and procedures was written approvals at each site.
to describe the training plan, clearly stating the It cannot be underlined enough that ensuring
training requirements for participation. When the a system for documentation prior to start of the
PIVOT trial subsequently was started, each exam- study is absolutely crucial for a multicenter
iner was provided a video outlining the steps of the study. Not only is this a matter of legal and
standardized pivot shift test as well as written ethical responsibility, but it also involves the
instructions for the performance. Additionally, the database of your collected data and the docu-
investigators involved in the PIVOT study under- mentation of communication between centers.
went training for the pivot shift test during the One central database was used for all data col-
University of Pittsburgh Panther Global Summit on lection to ensure standardization of all forms
ACL Reconstruction held in Pittsburgh in August completed for the study. Make sure that there
2011. Apart from the specific training and stan- are persons appointed to take responsibility and
dardization of the pivot shift test, all examiners coordinate documentation, and also ascertain
received written instructions for how to perform all that everyone involved in entering data are
other examinations in a standardized manner. educated in how the documentation system
Finally, all investigators and team members of the works with training and written instructions.
PIVOT trial had to complete Good Clinical Furthermore, arrangements for how to docu-
Practices Training prior to the start of the study. ment correspondence, teleconferences, and vid-
eoconferences among the sites need to be made.
If everything is documented, the room for con-
Fact Box 43.6 tretemps is minimized.
A lot of effort was undertaken to ensure
that clinical examination was standardized
among clinicians that were to be involved
in the examination of patients. All investi- Fact Box 43.7
gators received oral and written instruc- The legal standards regarding research
tions of the clinical exams and also conducted in each country were addressed,
underwent training prior to study start. as well as issues of informed consent. One
central database was used for all data
collection to ensure standardization of
all forms completed for the study.
43.8 Regulatory Responsibility Furthermore, arrangement for how to doc-
and Trial Documentation ument correspondence, teleconferences,
and videoconferences among the sites was
To conduct a clinical study, ethical concerns and made.
the regulatory process must be addressed. Each
43 Multicenter Study: How to Pull It Off? The PIVOT Trial 409
Fig. 43.4 PIVOT team meeting at the 15th ESSKA Congress in Geneva, 2012
410 E. Svantesson et al.
Fact Box 43.8 pating sites and prepare a plan for man-
The investigators of the PIVOT trial dis- uscript writing early in the planning
cussed and reached consensus regarding process.
how communication should primarily take • Apply a standardized methodology
form and established a policy around this. across all parts of study performance
All in-person meetings were documented including recruitment of patients, the
and minutes distributed for all members of screening process, the intervention, and
the group that were unable to attend. the clinical examination. The use of
checklists and protocols promotes a
standardized approach.
• Document everything! Have persons
Fact Box 43.9: Key Points for a Successful appointed to take responsibility and
Performance of a Multicenter Trial in the coordinate documentation.
Experience of the PIVOT Trial • Have an open and consistent communi-
• Frame the purpose of the study and cation. A multicenter trial is a team-
establish research questions and hypoth- work, and everyone must contribute to a
eses prior to study start. collaborative environment.
• Define how specific responsibilities • Follow all regulatory requirements of
should be distributed across the partici- each institution and country, respectfully.
43 Multicenter Study: How to Pull It Off? The PIVOT Trial 411
43.10 T
he Results of the PIVOT ing for continued analysis of the data. Since then,
Trial a total of four other papers have been published
in various journals [3, 10, 18, 19], and additional
43.10.1 Data Collection manuscripts are either submitted or in prepara-
and Analysis tion. Over 20 presentations of the PIVOT trial
have been held at international meetings, and
The collection of data was evaluated continu- several abstracts and posters have been presented.
ously during the enrollment phase. The enroll- The PIVOT trial has also received attention as
ment phase was ended when having enrolled a newsletter articles [4, 5] and as a podcast episode
total of 107 patients, which subsequently were by the American Journal of Sports Medicine [12].
followed for 24 months. As part of the study pro- In 2017, the book Rotatory Knee Instability: An
tocol, the statistical analysis for each study aim Evidence Based Approach edited by the principal
was planned in advance, and the setup of a com- investigator of each participating center, was
mon dataset facilitated data analysis. The princi- published [15]. The book could be seen as a
pal site (University of Pittsburgh) took the lead in deepened overview of all major aspects of the
performing data analysis to increase consistency, assessment of rotatory knee instability, including
ensuring that the investigators who performed the the pivot shift phenomenon, and highlights cur-
statistical analysis were well-familiar with the rent knowledge as well as future aspects that
database. The PIVOT trial team followed the ini- need to be addressed by further research.
tial plan of which center that was responsible for
writing the manuscript of each specific study
aim. Additionally, a progress report was updated Fact Box 43.11
successively to keep track of study progress, pub- A validation study of the noninvasive
lications, and assignments. All adverse events, technology for quantitative pivot shift
reoperations, and contralateral ACL ruptures dur- was primarily performed. Since then, a
ing the study period were also documented. total of four other papers have been pub-
lished and several manuscripts are in
progress. Over 20 presentations of the
Fact Box 43.10 PIVOT trial have been held at interna-
The statistical analysis for each study aim tional meetings, and in 2017, the book
was planned in advance, and the setup of a Rotatory Knee Instability: An Evidence
common dataset facilitated data analysis. Based Approach was published.
A progress report was updated successively
to keep track of study progress, publica-
tions, and assignments.
Fact Box 43.12: Key Findings from the
PIVOT Trial
43.10.2 Presentation of the Results • Both devices for quantification of the
pivot shift were found valid and able to
The first results presented from the PIVOT trial detect differences between clinically
was a validation study of the noninvasive tech- graded low- and high-grade pivot shift.
nology for quantitative pivot shift [13]. The paper • Quantitative pivot shift detected a signifi-
concluded that both techniques were valid and cantly higher tibial acceleration and lateral
that the technology was able to detect differences compartment translation in patients under
between clinically graded low- and high-grade anesthesia compared with in awake state.
pivot shift, findings which were indeed encourag-
412 E. Svantesson et al.
References
• Preoperative tibial acceleration and lat-
eral compartment translation were sig- 1. Anderson AF, Rennirt GW, Standeffer WC Jr.
Clinical analysis of the pivot shift tests: description
nificantly reduced by anatomic of the pivot drawer test. Am J Knee Surg. 2000;13(1):
single-bundle ACL reconstruction. 19–23.
• Generalized joint laxity does not appear 2. Benjaminse A, Gokeler A, van der Schans
to correlate with quantitative pivot shift CP. Clinical diagnosis of an anterior cruciate liga-
ment rupture: a meta-analysis. J Orthop Sports Phys
in the ACL-injured knee. Ther. 2006;36(5):267–88. https://doi.org/10.2519/
• Static anteroposterior knee laxity tests jospt.2006.2011.
are poorly correlated to quantitative 3. Hamrin Senorski E, Svantesson E, Sundemo
pivot shift in the ACL-injured knee. D, Musahl V, Zaffagnini S, Kuroda R, et al.
Preoperative knee laxity measurements predict the
Static and rotatory knee joint laxity achievement of a patient-acceptable symptom state
should be considered as separate entities after ACL reconstruction: a prospective multicenter
of the knee examination. study. J ISAKOS: Joint Disord Orthop Sports Med.
2018;3:26–32.
4. Hoshino Y, Musahl V, Kuroda R, Zaffagnini S,
Samuelsson K, Lopomo N, et al. PIVOT Study Group
Project Update ISAKOS Newsletter. 2014.
43.11 Future Directions 5. Hoshino Y, Musahl V, Ryosuke K, Zaffagnini S,
Samuelsson K, Lopoma N, et al. Quantified pivot
shift test by accelerometer and iPad App. ESSKA
During the course of the PIVOT trial, the collabo- Newsletter. 2015.
ration among the centers has grown stronger, and 6. Hoshino Y, Araujo P, Ahlden M, Moore CG, Kuroda
the network has expanded. Conducting a multi- R, Zaffagnini S, et al. Standardized pivot shift test
center trial has encouraged us to continue applying improves measurement accuracy. Knee Surg Sports
Traumatol Arthrosc. 2012;20(4):732–6. https://doi.
this methodology, i.e., quantitative evaluation of org/10.1007/s00167-011-1850-0.
the pivot shift test, for high-quality research. Some 7. Hoshino Y, Araujo P, Ahlden M, Samuelsson K, Muller
of the preliminary research questions have been B, Hofbauer M, et al. Quantitative evaluation of the
answered; however, they have also resulted in pivot shift by image analysis using the iPad. Knee
Surg Sports Traumatol Arthrosc. 2013;21(4):975–80.
identification of new areas of research that need to https://doi.org/10.1007/s00167-013-2396-0.
be undertaken. For example, the results from the 8. Kim SJ, Kim HK. Reliability of the anterior drawer
PIVOT trial have led to the setup of another study test, the pivot shift test, and the Lachman test. Clin
which aims to compare the outcomes and the rota- Orthop Relat Res. 1995;(317):237–42.
9. Kuroda R, Hoshino Y, Kubo S, Araki D, Oka S,
tory laxity between ACL reconstruction and ACL Nagamune K, et al. Similarities and differences of
reconstruction complemented by tenodesis. diagnostic manual tests for anterior cruciate ligament
insufficiency: a global survey and kinematics assess-
Take-Home Message ment. Am J Sports Med. 2012;40(1):91–9. https://doi.
org/10.1177/0363546511423634.
• The multicenter PIVOT trial has provided 10. Lopomo N, Signorelli C, Rahnemai-Azar AA, Raggi
novel knowledge about the use of quantitative F, Hoshino Y, Samuelsson K, et al. Analysis of the
pivot shift in ACL-injured patients across influence of anaesthesia on the clinical and quantita-
three continents. tive assessment of the pivot shift: a multicenter inter-
national study. Knee Surg Sports Traumatol Arthrosc.
• Performance of a multicenter trial requires 2017;25(10):3004–11. https://doi.org/10.1007/
teamwork, which is promoted by an open and s00167-016-4130-1.
consistent communication. 11. Muller B, Hofbauer M, Rahnemai-Azar AA, Wolf
• A cornerstone for the successful execution of M, Araki D, Hoshino Y, et al. Development of com-
puter tablet software for clinical quantification of
the PIVOT trial was the stringent preparation lateral knee compartment translation during the pivot
prior to study start, where consensus was shift test. Comput Methods Biomech Biomed Engin.
reached across all participating sites for a 2016;19(2):217–28. https://doi.org/10.1080/1025584
standardized methodology regarding recruit- 2.2015.1006210.
12. Musahl V. Validation of quantitative measures of rota-
ment of patients, the screening process, the tory knee laxity. Am J Sports Med. 2016;44(9):2393–8.
intervention, and the clinical examination.
43 Multicenter Study: How to Pull It Off? The PIVOT Trial 413
13. Musahl V, Griffith C, Irrgang JJ, Hoshino Y, tive pivot shift and generalized joint laxity: a prospec-
Kuroda R, Lopomo N, et al. Validation of quan- tive multicenter study of ACL ruptures. Knee Surg
titative measures of rotatory knee laxity. Am Sports Traumatol Arthrosc. 2018;26(8):2362–70.
J Sports Med. 2016;44(9):2393–8. https://doi. https://doi.org/10.1007/s00167-017-4785-2.
org/10.1177/0363546516650667. 19. Svantesson E, Hamrin Senorski E, Mårtensson J,
14. Musahl V, Hoshino Y, Ahlden M, Araujo P, Irrgang Zaffagnini S, Kuroda R, Musahl V, et al. Static
JJ, Zaffagnini S, et al. The pivot shift: a global anteroposterior knee laxity tests are poorly cor-
user guide. Knee Surg Sports Traumatol Arthrosc. related to quantitative pivot shift in the ACL-
2012;20(4):724–31. https://doi.org/10.1007/ deficient knee: a prospective multicentre study.
s00167-011-1859-4. J ISAKOS: Joint Disord Orthop Sports Med.
15. Musahl V, Jón K, Kuroda R, Zaffagnini S. Rotatory 2018;3:83–88.
knee instability: an evidence based approach: Springer 20. Zaffagnini S, Lopomo N, Signorelli C, Marcheggiani
International Publishing; 2017. Muccioli GM, Bonanzinga T, Grassi A, et al.
16. Noyes FR, Grood ES, Cummings JF, Wroble RR. An Innovative technology for knee laxity evaluation: clin-
analysis of the pivot shift phenomenon. The knee ical applicability and reliability of inertial sensors for
motions and subluxations induced by different exam- quantitative analysis of the pivot-shift test. Clin Sports
iners. Am J Sports Med. 1991;19(2):148–55. https:// Med. 2013;32(1):61–70. https://doi.org/10.1016/j.
doi.org/10.1177/036354659101900210. csm.2012.08.007.
17. Rosenberg W, Donald A. Evidence based medi-
21. Zaffagnini S, Signorelli C, Grassi A, Yue H, Raggi
cine: an approach to clinical problem-solving. BMJ. F, Urrizola F, et al. Assessment of the pivot shift
1995;310(6987):1122–6. using inertial sensors. Curr Rev Musculoskelet
18. Sundemo D, Blom A, Hoshino Y, Kuroda R, Lopomo Med. 2016;9(2):160–3. https://doi.org/10.1007/
NF, Zaffagnini S, et al. Correlation between quantita- s12178-016-9333-z.
Conducting a Multicenter Trial:
Learning from the JUPITER
44
(Justifying Patellar Instability
Treatment by Early Results)
Experience
Name of surgeon:
Institution Name:
Affiliated University:
Phone No
Research Coordinator
Type of Practice:
Knee Arthroscopy %
2. Of all operative patellar stabilization, how often patients have open femoral physis?
Open Physis %
c enters. Posttreatment outcome assessment was and reproducibility of data. Validated tools are
to be performed at 6, 12, and 24 months, includ- important to use to make sure the data appropri-
ing assessment of function, activity level, health- ately reflects desired outcome evaluation; we use
related quality of life, patellar stability, knee Pedi-IKDC, Kujala, HSS Pedi-FABS, Banff
motion, and complications. Patellofemoral Instability instrument 2.0, and
KOOS Knee survey. Initially, the assessment tool
was a paper document; however, specialty society
44.3.2 Clinical Assessment grant funding was received allowing investigators
to use a Web-based system for data collection and
In JUPITER, a draft assessment tool for data col- management (Oberd™, Columbia, Missouri
lection was developed. The initial tool was rela- USA), and the study transitioned to this during
tively lengthy, and multiple conference calls were the course of enrollment. Other i nvestigators have
made by the group of investigators to help further used REDCap (Research Electronic Data
refine and develop the protocol. Critically, it was Capture), which is a free, research data manage-
felt that it was important to simplify the initial ment system sponsored by Vanderbilt University
form to minimize the burden on the investigators and supported by the National Institutes of
located at multiple sites. A simplified assessment Health. The advantages of using electronic data-
tool also allows for improved patient compliance bases are multiple: (1) can allow for remote col-
44 Conducting a Multicenter Trial: Learning from the JUPITER (Justifying Patellar Instability Treatment… 419
Table 44.2 JUPITER institutions and investigators Given the relative complexity of radiographic
Institutions Investigators evaluation (e.g., for the measurement of anterior
Cincinnati Children’s Shital Parikh, PI tibial tubercle–trochlear groove distance on MRI)
(Coordinating Center) Eric Wall
Hospital for Special Surgery Beth Shubin Stein
in the JUPITER study, it was important that train-
(Coordinating Center) Dan Green ing to establish common standards for imaging
Sabrina Strickland evaluation was necessary. Training was per-
Peter Fabricant formed by using a standard set of images
Boston Children’s Yi-Meng Yen reviewed at in-person meetings and also distrib-
Dennis Kramer
Benton Heyworth uted electronically. Ultimately, concerns still
Matthew Milewski remained about variability in image interpreta-
Columbia University Charlie Popkin tion across sites, and part of the way through
Lauren Redler enrollment, the investigators were able to obtain
Mayo Clinic Diane L. Dahm
sufficient funding to have images electronically
Todd Milbrandt
Aaron Krych sent to a central site to be interpreted by a specifi-
NorthShore University Health Jason L Koh cally trained team of musculoskeletal radiolo-
System David Roberts gists. For this aspect, REDCAP was used to
Verena Schreiber collect and store the data. This evolution to cen-
OrthoIndy Jack Farr
Kosmas Kayes
tralized radiographic evaluation and repository
Oregon Health Sciences Jackie Brady for image assessment is expected to improve con-
University Dennis Crawford sistency of this aspect of evaluation and also
Matthew Halsey reduced the time commitment of individual sites.
Ohio State University Robert Magnussen If there is a potential for significant variability in
Texas Scottish Rite Henry Ellis
Philip Wilson
radiographic interpretation, then centralized
University of Minnesota/TRIA Elizabeth A. Arendt imaging analysis at a single site is preferred.
Marc Tompkins
University of Missouri Seth Sherman
44.3.4 Centralized Data Repository
lection of PRO by any electronic media, (2) can The data from multiple sites must be collected
send automated updates related to follow-up and and aggregated at a centralized site. This data
incomplete data, and (3) can help with data analy- management can be time-consuming and expen-
sis due to advanced data output functions. sive; however, it is critically important to be able
to have the data in a secure location that remains
accessible to appropriate researchers. This typi-
Protocol development was conducted by a
cally requires financial support, which can be
team, and clinical and radiographic assess-
either provided by the sponsoring institution or
ment tools were selected. Assessment is
external grants.
aided by use of Web-based data collection
and management for clinical outcomes and
centralized radiographic evaluation. Data is
44.3.5 Funding and Grants
collected centrally.
Once the research protocol has been established,
it is often valuable to submit for research grants
44.3.3 Radiographic Evaluation from various funding sources. In many cases, the
initial pilot study is self- or institution-funded;
Regarding radiographic evaluation, an extensive however, extrapolating the study to multiple sites
amount of time was spent in investigator meet- requires an additional level of funding. In most
ings to develop standard radiographic methods. nonindustry-sponsored research, support for
420 J. L. Koh et al.
2a. Reviewers: 1.
2. Based upon: Designing Clinical Research: An Epidemiologic Approach by Stephen B. Hulley,
Steven R. Cummings, Warren S. Browner, Deborah G. Grady, Thomas B. Newman. Lippincott
Williams & Wilkins. Third Edition, November 1, 2006.
3. Hypotheses:
Adapted from Vanderbilt University Sports Medicine and MOON Forms
4. Outcome Measures: A.
B.
C.
6. Data/Information
Required From
Coordinating Centers:
7. Power Analysis.
Can The Cohort Answer
The Question?
8. Statistical Analysis
Required? Who Will
Perform This?
writing and editing. Up-to-date criteria with addi- and limited face-to-face engagement can result in
tional detail are provided online at http://www. investigator focus being directed elsewhere.
icmje.org/recommendations. JUPITER has successfully addressed this with
regularly scheduled monthly conference calls
that include clinical research staff (such as
Research questions are evaluated using the research assistants and coordinators) as well as
FINER (feasible, interesting, novel, ethi- investigators. During these calls, critical opera-
cal, relevant) criteria. Presentation and tional updates can be provided to the group and
publication guidelines should be developed especially the personnel that are typically per-
in advance so that there is a clear under- forming much of the day-to-day enrollment and
standing about authorship. data collection activity. There is also time for
investigators or coordinators to bring up and dis-
cuss questions or areas where further clarification
With respect to multicenter trials, authorship is needed. Summary minutes provide valuable
questions become more complex. Fortunately, information that can provide updates to an exist-
several existing models for publication author- ing standardized protocol.
ship and priority exist. Historically, many jour-
nals limited the number of named authors;
however, with the advent of electronic publica- Communication and monitoring are critical
tion and indexing, it has become easier to credit to study progress. The use of regularly
multiple authors on a publication. In many cases, scheduled meetings improves communica-
papers will be published with several principal tion and consistency; transparency about
authors and the rest of the investigators credited site-specific trial milestones helps monitor
as a group, with each individual investigator’s progress and encourages continued investi-
name listed and searchable electronically. For gator participation.
JUPITER, authorship criteria were first discussed
among the executive committee and then circu-
lated among the larger group of investigators for Another successful tool has been to send out
comment. Criteria were modeled after the PRISM frequent regular score cards indicating progress
(Pediatric Research in Sports Medicine) group to specific trial milestones, including investiga-
criteria and included assessment of active partici- tors and sites, IRB status, and their current enroll-
pation in data collection, as well as ICMJE crite- ment numbers. Subject visit and follow-up
ria. To encourage multiple author participation, it compliance are additional measures to be poten-
was recommended that principal authors from tially added. The score cards serve several func-
different sites be listed on each research paper. tions. First, they update the entire group of
Publications would be submitted to the group for investigators as to the current status with respect
editorial review and input as required by ICMJE. to overall enrollment in the project. It is motivat-
ing to see the progress being made across the dif-
ferent locations as the trial progresses. Secondly,
44.4 Execution it allows the group to identify and learn from the
sites that are most successful in terms of enroll-
44.4.1 Communication ment. Finally, it can spur some friendly competi-
and Coordination tion and additional engagement to increase
enrollment activity.
Throughout the conduct of the trial, it is critical Face-to-face group meetings are also valuable
to keep investigators and sites engaged in the to engage the group and continue active participa-
research process. Too often, multicenter trials tion. We have tried to have face-to-face meetings at
lose focus and energy since geographic distance major medical conferences where it is anticipated
44 Conducting a Multicenter Trial: Learning from the JUPITER (Justifying Patellar Instability Treatment… 423
that there will be multiple investigators available. begin to work in a timely fashion. Appropriate
This can be challenging since not all investigators involvement of a biostatistician in planning the
will be at every conference, and even if investiga- study design can make the analytical work more
tors attend the conference, they may have other straightforward when the data has been collected.
commitments that limit their availability. Some Trying to make sense of a pile of data after the
authors have suggested to address this issue, sepa- fact without appropriate preparation can be
rate investigator meetings are helpful; however, challenging.
this can be a significant time and expense burden. Paper drafts should be completed promptly,
and coauthors should commit to providing rapid
review and comments to the drafts, hopefully
44.4.2 Data Monitoring within 1–2 weeks. The main author/s should
assess and appropriately incorporate these com-
Enrollment, ongoing participation, and continued ments and prepare for submission. Determination
follow-up need to be monitored during the course of the appropriate journal to submit to should
of the trial. In this way, accurate progress to involve the lead authors.
milestones can be assessed and communicated. Following submission, it is not uncommon
We recommend significant transparency through- for high-quality journals to either reject or
out this process as there can be loss of trial request significant revisions to the article. The
participation at every step. Digital forms can sig- lead author/s should take the responsibility to
nificantly help in monitoring progress. respond to comments and revise the article for
resubmission or submission to a new journal.
It is appropriate for the group to celebrate after
44.5 Publication publication!
• Challenges are likely to arise during the IRB miologic approach. 3rd ed. Philadelphia: Lippincott
Williams & Wilkins; 2006.
process and conducting the study, so addi- 5. Jaquith BP, Parikh SN. Predictors of recurrent patellar
tional leeway should be included for possibly instability in children and adolescents after first-time
delays. dislocation. J Pediatr Orthop. 2017;37(7):484–90.
• Communication between investigators is 6. Koh JL, Stewart C. Patellar instability. Orthop Clin
North Am. 2015;46(1):147–57.
critical. 7. Krych AJ, O’Malley MP, Johnson NR, Mohan R,
• Multicenter trials can build collegial relation- Hewett TE, Stuart MJ, et al. Functional testing and
ships and further collaborations. return to sport following stabilization surgery for
recurrent lateral patellar instability in competi-
tive athletes. Knee Surg Sports Traumatol Arthrosc.
2018;26(3):711–8.
References 8. Sanders TL, Pareek A, Hewett TE, Stuart MJ, Dahm
DL, Krych AJ. High rate of recurrent patellar dislo-
1. Askenberger M, Janarv PM, Finnbogason T, Arendt cation in skeletally immature patients: a long-term
EA. Morphology and anatomic patellar instability population-based study. Knee Surg Sports Traumatol
risk factors in first-time traumatic lateral patellar dis- Arthrosc. 2018;26(4):1037–43.
locations: a prospective magnetic resonance imaging 9. Tompkins MA, Arendt EA. Patellar instability factors
study in skeletally immature children. Am J Sports in isolated medial patellofemoral ligament recon-
Med. 2017;45(1):50–8. structions—what does the literature tell us? A system-
2. Christensen TC, Sanders TL, Pareek A, Mohan atic review. Am J Sports Med. 2015;43(9):2318–27.
R, Dahm DL, Krych AJ. Risk factors and time 10. Vavken P, Wimmer MD, Camathias C, Quidde J,
to recurrent ipsilateral and contralateral patel- Valderrabano V, Pagenstert G. Treating patella insta-
lar dislocations. Am J Sports Med. 2017;45(9): bility in skeletally immature patients. Arthroscopy.
2105–10. 2013;29(8):1410–22.
3. Chung KC, Song JW, Group WS. A guide to organiz- 11. Weber AE, Nathani A, Dines JS, Allen AA, Shubin-
ing a multicenter clinical trial. Plast Reconstr Surg. Stein BE, Arendt EA, et al. An algorithmic approach
2010;126(2):515–23. to the management of recurrent lateral patellar
4. Hulley SB, Cummings SR, Browner WS, Grady DG, dislocation. J Bone Joint Surg Am. 2016;98(5):
Newman TB. Designing clinical research: an epide- 417–27.
How to Organise an International
Register in Compliance with the
45
European GDPR: Walking
in the Footsteps of the PAMI
Project (Paediatric ACL Monitoring
Initiative)
ACL Monitoring Initiative, as an illustrative the free movement of such data”.2 Discrepancies
example of an international register recently in the specific national laws of EU member
implemented in Luxembourg, Europe. The reader states resulted in a lack of harmonisation, which
is cautioned not to consider the information pro- is why the EU adopted the GDPR 2016/679 in
vided in this chapter to be exhaustive in any way April 2016. GDPR has come into force on May
and is strongly recommended to take legal coun- 25, 2018, thus replacing the previous directive.
sel before setting up his/her research project. Being a regulation, GDPR is a binding legisla-
tive act that must be applied in its entirety across
the EU without requiring national enabling
Fact Box 45.1: A New European Law on Data legislation.
Protection
The European Union (EU) General Data
Protection Regulation (GDPR) applies to 45.3 Some Important Definitions
all organisations that hold or process per-
sonal data of data subjects residing in the In its article 4, the GDPR specifies a number of
EU (whether they are EU citizens or not), definitions that the researcher in clinical ortho-
independently of the organisation’s loca- paedics should be familiar with. The most impor-
tion. It also applies to research organisa- tant ones are presented hereafter.
tions and concerns the vast majority of Personal data—any information, electronic
scientific activities. GDPR is a binding leg- or not, that relates to an identified or identifiable
islative act that must be applied in its natural person, i.e. a “data subject”. Examples
entirety across the EU without requiring include the person’s name, address, title and
national enabling legislation. It has come contact details but also a study identification
into force on May 25, 2018. Any processing number, IP address or a social security number.
under way before that date should be made In other words, compliance to the GDPR is not
compliant with the GDPR within a period required if no link whatsoever can be established
of 2 years. between the processed data and the person they
belong to. This is, however, very rarely the case.
All data pertaining to a person’s health (clinical
outcome questionnaire, treatment, biological
45.2 Background and Aim sample, biometrical measurement, etc.) is con-
of GDPR sidered sensitive. The processing of sensitive
data requires that the concerned data subject has
The general aim of the GDPR is to guarantee a provided explicit consent, as will be further
consistent and high protection of the rights and explained below.
freedoms of natural persons within the EU, Processing—any operation performed on per-
while at the same time facilitating the flows of sonal data, automated or not, such as collection,
personal data within the Union. These aspects recording, organisation, structuring, storage,
are of particular interest in international clinical adaptation or alteration, retrieval, consultation,
orthopaedic research involving human partici- use, disclosure by transmission, dissemination or
pants, where most of the collected data is related otherwise making available, alignment or combi-
to health and thus by their nature considered as nation, restriction, erasure or destruction.
“sensitive”.
Before May 2018, data protection laws in the
EU were implementations by its member states Directive 95/46/EC of the European Parliament and of
2
The ultimate goals of the PAMI project are: authentication of the visited website as well as
confidentiality and integrity of the data exchanged
1. To describe current treatment options follow- through encryption. To maximise data security, a
ing a paediatric ACL injury two-factor authentication (2FA) solution has
2. To analyse the associated short-, medium- and been implemented using SMS (short message
long-term clinical outcome service), thus adding a second level of authenti-
3. To extend the evidence base on optimal treat- cation to a regular account login by project
ment choices participants.
4. To propose international treatment guidelines Finally, partner institutions/hospitals from dif-
ferent European countries providing treatment to
To reach these aims, an electronic data collec- the patient group of interest can engage in this
tion platform has been created, the PAMI data- project. They provide data on current surgical
base, to store systematic and standardised and non-surgical treatments, follow-up treat-
information relevant to the problematic of paedi- ments and clinical outcome using the dedicated
atric ACL injuries in different European coun- Web application and act as local data administra-
tries. Based on a long-term patient follow-up, this tors and data processors. Only pseudonymised
project will provide important insights into the patient information is uploaded into the PAMI
current outcomes of ACL injured knees in chil- database, to ensure maximal data protection and
dren and will allow to discriminate those patients avoid legal issues related to data transfer between
needing operative treatment from those who ben- different European countries. As a consequence,
efit most from a nonoperative treatment. each site coordinator manages a correspondence
Furthermore, large-scale objective outcome data table between patient ID information and a pri-
will provide the knowledge base necessary for a mary key generated by the system. Figure 45.1
first-time-ever proposal of international treat- below depicts the data flow of the upload of
ment guidelines. pseudonymised patient data.
The system:
-Creates a new patient without nominative data,
-Assigns the patient a random ID
-Gives the user the generated ID linked to the patient
local person with access rights to the PAMI data- Take-Home Message
base. Since only pseudonymised data are col- • A new General Data Protection Regulation of
lected within the PAMI project, each site the European Union has been enforced since
coordinator manages a correspondence table May 2018.
between patient ID information and a primary • Research organisations processing personal
key generated by the system. Site coordinators data of residents of the EU must comply with
are also responsible for the long-term follow-up this regulation.
of patients by sending them annual electronic • Researchers in clinical orthopaedics should
questionnaires on patient-reported clinical out- take into account these principles as part of
comes and physical activity/sport participation. risk management of their research projects.
Prior to participation in the PAMI project, • The present chapter presents the most impor-
institutions/hospitals have to seek for ethics tant aspects that need to be considered, as well
clearance to their local or national ethics com- as an illustrative example of an international
mittee when applicable, in accordance with register on paediatric ACL treatments.
their national laws and regulations. Formal • Researchers are strongly advised to take
proof of ethics clearance must be provided to legal counsel with a qualified data protection
the data controller and is a prerequisite for par- officer as soon as the project planning phase
ticipation in the PAMI project. Written consent starts.
will be sought by the participating institutions
from both legal tutors for patients under the age
of majority, which may be different in each References
country; when the patient reaches the age of
majority, he/she will be contacted and asked to 1. Ardern CL, et al. 2018 International Olympic
Committee consensus statement on prevention, diag-
provide written consent himself/herself. It is the nosis and management of paediatric anterior cruciate
responsibility of each participating partner to ligament (ACL) injuries. Knee Surg Sports Traumatol
obtain written consent of their participants and Arthrosc. 2018;26:989–1010. https://doi.org/10.1007/
to archive all original hard copies. The PAMI s00167-018-4865-y.
2. Astur DC, Cachoeira CM, da Silva Vieira T, Debieux
steering committee, acting on behalf of the data P, Kaleka CC, Cohen M. Increased incidence of ante-
controller, reserves the right to perform on-site rior cruciate ligament revision surgery in paediatric
audits of participating institutions/hospitals. verses adult population. Knee Surg Sports Traumatol
The date of provided consent or, if applicable, Arthrosc. 2018;26:1362–6. https://doi.org/10.1007/
s00167-017-4727-z.
the date of consent withdrawal will be stored in 3. Caine D, DiFiori J, Maffulli N. Physeal injuries in
the patient’s record within the PAMI database children’s and youth sports: reasons for concern? Br J
and be visible to the data controller. Every par- Sports Med. 2006;40:749–60. https://doi.org/10.1136/
ticipating institution will retain the right to bjsm.2005.017822.
4. Calvo R, et al. Transphyseal anterior cruci-
access the pseudonymised data concerning their ate ligament reconstruction in patients with
patients. open physes: 10-year follow-up study. Am
Ideally, the PAMI database will enable very J Sports Med. 2015;43:289–94. https://doi.
long-term follow-up of (originally) skeletally org/10.1177/0363546514557939.
5. Dekker TJ, Godin JA, Dale KM, Garrett WE, Taylor
immature children and adolescents who have suf- DC, Riboh JC. Return to sport after pediatric anterior
fered an ACL injury. The foreseen timeframe of cruciate ligament reconstruction and its effect on sub-
follow-up will be 30 years for each patient. At the sequent anterior cruciate ligament injury. J Bone Joint
end of the 30-year follow-up, the data will be Surg Am. 2017;99:897–904. https://doi.org/10.2106/
JBJS.16.00758.
stored for an additional 10 years before being 6. Frosch KH, et al. Outcomes and risks of operative
deleted. Data will be recorded for as long as the treatment of rupture of the anterior cruciate ligament in
patients are willing to comply with the annual children and adolescents. Arthroscopy. 2010;26:1539–
data collection. 50. https://doi.org/10.1016/j.arthro.2010.04.077.
434 D. Theisen et al.
7. Kaeding CC, Flanigan D, Donaldson C. Surgical tion. Am J Sports Med. 2017;45:488–94. https://doi.
techniques and outcomes after anterior cruciate org/10.1177/0363546516638079.
ligament reconstruction in preadolescent patients. 12.
Reider B. A matter of timing. Am J
Arthroscopy. 2010;26:1530–8. https://doi.org/10.1016/j. Sports Med. 2015;43:273–4. https://doi.
arthro.2010.04.065. org/10.1177/0363546515570023.
8. Kocher MS, Saxon HS, Hovis WD, Hawkins 13.
Seil R, Theisen D, Moksnes H, Engebretsen
RJ. Management and complications of anterior cruci- L. ESSKA partners and the IOC join forces to improve
ate ligament injuries in skeletally immature patients: children ACL treatment. Knee Surg Sports Traumatol
survey of the Herodicus Society and The ACL Study Arthrosc. 2018;26:983–4. https://doi.org/10.1007/
Group. J Pediatr Orthop. 2002;22:452–7. s00167-018-4887-5.
9. Lawrence JT, Argawal N, Ganley TJ. Degeneration 14. Shaw L, Finch CF. Trends in pediatric and ado-
of the knee joint in skeletally immature patients lescent anterior cruciate ligament injuries in
with a diagnosis of an anterior cruciate liga- Victoria, Australia 2005-2015. Int J Environ Res
ment tear: is there harm in delay of treatment? Public Health. 2017;14. https://doi.org/10.3390/
Am J Sports Med. 2011;39:2582–7. https://doi. ijerph14060599.
org/10.1177/0363546511420818. 15. Wall EJ, Ghattas PJ, Eismann EA, Myer GD,
10. Moksnes H, Engebretsen L, Seil R. The ESSKA pae- Carr P. Outcomes and complications after all-
diatric anterior cruciate ligament monitoring initiative. epiphyseal anterior cruciate ligament reconstruc-
Knee Surg Sports Traumatol Arthrosc. 2016;24:680– tion in skeletally immature patients. Orthop J
7. https://doi.org/10.1007/s00167-015-3746-x. Sports Med. 2017;5:232596711769360. https://doi.
11. Pierce TP, Issa K, Festa A, Scillia AJ, McInerney org/10.1177/2325967117693604.
VK. Pediatric anterior cruciate ligament reconstruc-
Part X
Helpful Further Information
Common Scales and Checklists
in Sports Medicine Research
46
Alberto Grassi, Luca Macchiarola, Marco Casali,
Ilaria Cucurnia, and Stefano Zaffagnini
pertain the overall health of the patient, including sions separated by a time interval under stable
physical, mental, and social well-being, and offer health conditions. It is considered when the raters
the advantage of being able to use them to com- are not involved or the raters’ effect is negligible.
pare different diseases, severity, and interven- It could be assessed with the intraclass correla-
tions. However, since they represent a generic tion (ICC) or the Cohen’s K statistics.
measure, their ability to detect small but impor- Intra-rater reliability: is defined and the
tant changes could be limited. On the other hand, agreement between two or more repeated score
the “disease-specific measures,” which pertain to evaluation performed by a single rater. Also in
a specific disorder treated in a patient, measure this case, intraclass correlation (ICC) or the
the physical, mental, and social aspects of health Cohen’s K statistics could be used.
affected by the specific disorder. Therefore, they Inter-rater reliability: is defined as the
are able to detect small but important chances but agreement between the scores obtained from two
have a limited value in comparison of health sta- or more raters’ assessment. It measures how
tus across different diseases. For the aforemen- much consensus or heterogeneity there is in the
tioned reasons, a complete picture of treatment rating given by judges. Similarly, intraclass cor-
effect on a patient could be provided only with the relation (ICC) or the Cohen’s K statistics is
assessment through a “disease-specific measure” employed.
in combination with a “generic measure.” Internal consistency: is defined as the corre-
lations between different items on the same test
and measures whether several items that propose
Fact Box 46.1 to measure the same general construct produce
Too often surgeons are poorly interested in similar scores. It is assessed using the Cronbach’s
patient’s global health self-perception and alpha statistics.
mental status; however, the patient’s per- Responsiveness to change: is defined as the
ception of changes in health status and dis- ability of an instrument to detect clinically impor-
ability is the main indicator of the success tant changes between the patient’s pre-interven-
of a treatment. tion and post-intervention state, assuming all
other factors remain constant.
Minimal detectable change (MDC): is
defined as the minimal change that falls outside
46.2 General Scale Characteristics the measurement error in the score of an instru-
ment used to measure a symptom.
The main characteristics and features of clinical Minimal clinically important difference
scales that should be known to choose the appro- (MCID): is defined as the minimal change in the
priate outcome measure are the following [54]. score that is meaningful for patients or that is
Construct validity: is defined as the ability of required for the patient to feel a difference in the
an instrument to measure what it is supposed to variable that is measured.
measure. It depends on how the items that make Standard error of measurement (SEM): It
up the scale include all relevant aspects of the measures the range within which a score would
pathology or disability that is measured. The con- likely fall in the case of re-measurement.
vergent validity indicates how the score could Standardized response mean (SRM): It
correlate with other scores that measure the same measures the responsiveness to change and is
construct. Meanwhile, predictive validity indi- defined as the mean change in score divided by
cates whether the score could predict a patient’s the standard deviation of the change scores.
score on a measure of some related construct. Floor effect: Floor effects occur when a mea-
Repeatability (Test/retest reliability): is sure’s lowest score is unable to assess a patient’s
defined as the agreement between the observa- level of ability. The test is considered poor if the
tions on the same patients on two or more occa- floor effect is >20%.
46 Common Scales and Checklists in Sports Medicine Research 439
Ceiling effect: Ceiling effects occur when a test. It is particularly useful when polyarticular
measure’s highest score is unable to assess a conditions should be evaluated or symptoms and
patient’s level of ability. This might be particu- function of the entire upper extremity are investi-
larly common for measures used over multiple gated. It is also useful in all elbow and hand con-
occasions. The test is considered poor if the ceil- ditions. However, the DASH is region specific
ing effect is >20%. and not joint specific; therefore specificity and
responsiveness are lower than those of unique
shoulder-specific tools [3].
46.3 Measures of Shoulder Conditions: Any or multiple disorders of
Function upper extremity, in particular painful conditions
including: rheumatoid arthritis, multiple sclero-
There are many instruments that measure symp- sis, adhesive capsulitis, shoulder impingement
tom and function of the shoulder and some that and tendinitis, proximal humerus fracture, distal
evaluate both the glenohumeral joint and the radius fractures, hand osteoarthritis or fractures,
whole upper limb. The most widespread and best arthroscopic acromioplasty.
tested is the disabilities of the arm, shoulder, and
hand questionnaire (DASH). Also, the shoulder
pain and disability index (SPADI), the Constant- 46.3.2 Shoulder Pain and Disability
Murley score (CMS), and the American shoulder Index (SPADI)
and elbow surgeons (ASES) questionnaire,
which are more specific for shoulder patholo- The SPADI is a patient-completed scale that
gies, are extensively employed. The simple includes 13 items regarding symptoms and pain,
shoulder test (SST), the shoulder disability ques- scored on a VAS/NRS scale [94]. It is one of the
tionnaire (SDQ), the Oxford shoulder score, and most representative shoulder instruments and has
the West Ontario shoulder instability index been tested in numerous settings; moreover, it is
(WOSI) complete the panorama of most com- easy to administer, understand, and complete. It
mon tools. has a good correlation with the DASH, ASES,
The basic psychometric characteristics, and CMS. One possible weakness in construct
strengths, and weaknesses of the most common validity could be that only one item assesses
scales for shoulder function are described overhead work [3].
(Table 46.1). Conditions: Any disorder of the shoulder
joint, particularly adhesive capsulitis, rotator cuff
pathologies.
46.3.1 Disabilities of the Arm,
Shoulder, and Hand
Questionnaire (DASH) 46.3.3 American Shoulder and Elbow
Surgeons Society Shoulder
The DASH is a patient-completed scale which Assessment Form (ASES)
includes 30 items regarding symptoms, pain,
physical function, and social function [58]. The The ASES is a patient self-evaluation scale of 11
11-item QuickDASH short version is also avail- items evaluation pain and function, which is inte-
able [5]. The DASH is the best tool for the com- grated with a clinician-dependent part [92]. It has
prehensive assessment of upper extremity good reliability, construct validity, and respon-
conditions, since it is easy to apply, analyze, and siveness. However, it uses different type of scales
interpret; moreover it is good for research pur- (binary, Likert, VAS), and the clinician part could
poses in various upper extremity conditions and be time-consuming. It has been developed to be
has a good correlation with SPADI, HAQ, CMS, applied to all shoulder patients regardless of the
ASES, and EQ-5D with Pearson’s or Spearman’s diagnosis, since it evaluates also activities of
440
daily living. It has a good correlation with both 46.3.6 Oxford Shoulder Score (OSS)
SPADI and DASH questionnaires [3].
Conditions: Any disorder of the shoulder The OSS is a patient-reported scale of 12 items
joint, particularly rotator cuff disease, shoulder evaluating pain and daily function [32, 35]. It pro-
impingement, shoulder arthritis, calcific vides a self-assessment of shoulder pain and func-
tendonitis. tion. It is short and easy to complete but not
frequently used in the current literature. Correlation
with SPADI, DASH, and CMS is good [3].
46.3.4 Constant-Murley Score (CMS) Conditions: Degenerative and inflammatory
shoulder conditions, subacromial impingement,
The CMS is both patient- and clinician-reported rotator cuff, osteoarthritis, and proximal humerus
score which includes eight items regarding pain, fractures.
ADLs, mobility, and strength. It is a method to
record individual parameters, providing an
overall clinical functional assessment, irrespec- 46.3.7 UCLA Shoulder Score
tive of diagnosis or radiographic abnormalities
[22, 23]. Based in the difference with the abnor- The UCLA (University of California at Los
mal side, the indexed shoulder could be graded Angeles) shoulder score is both a five-item
as excellent (<11), good (11–20), fair (21–30), patient- and clinician-reported scale which evalu-
or poor (<30). Despite the CMS is highly ates pain, function, ROM, strength, and patient’s
accepted throughout the clinical community, satisfaction [2]. Despite being one of the earliest
there are several limitations to its use due to the available shoulder outcome measures, it has not
low inter-tester reliability, non-standardized formally been validated. It is simple and fast but
measurement of strength, and only few items requires physician manual evaluation; for this
evaluating pain and ADL. It is useful for mea- reason, it could result in a poor validity or respon-
surement protocols but does not provide an ade- siveness, which does not make it ideal for
quate self-assessment of patient pain and research setting. The UCLA has a good correla-
function. It has a good correlation with ASES, tion with the DASH, SPADI, and SF-36 and
DASH, and SPADI [3]. could be dichotomized as good/excellent (>27)
Conditions: Mainly rotator cuff-related disor- or fair/poor (<27) [61].
ders, impingement, degenerative or inflammatory Conditions: Common shoulder pathologies.
pathologies, instability, osteoarthritis.
The SST is a patient-reported score which The WOSI is a 21-item patient-reported scale that
includes 12 dichotomous (yes/no) items regard- evaluates physical symptoms, pain, sport, work,
ing pain, strength, and range of motion [71]. It lifestyle, and emotions related to shoulder insta-
assesses the functional disability of the shoulder bility [62, 63]. It has been developed to assess
in a very simple and short manner; however, due disease-specific quality of life patients with symp-
to the binary response option, its use as a compre- tomatic shoulder instability. It has the advantage
hensive measure of outcomes could be ques- of being specific for this condition, but due to lack
tioned. It has a good correlation with the SPADI, of testing data, caution is necessary at individual
ASES, DASH, and CMS scores [3]. patient level. It has a good correlation with the
Conditions: General shoulder injuries and VAS for function and the DASH score [3].
rotator cuff pathology. Conditions: Shoulder instability.
442 A. Grassi et al.
100(best)
Note: HHS (Harris hip score), HOOS (hip disability and osteoarthritis outcome score), OHS (Oxford hip score), LISOH (Lequesne in severity for osteoarthritis of the hip), NAHS
(non-arthritic hip score), iHOT33 (international hip outcome tool-33), HAGOS (the Copenhagen hip and groin outcome score)
a
Nilsdotter A, Bremander A. Measures of Hip Function and Symptoms. Arthritis Care & Research Vol. 63, No. S11, November 2011, pp. S200–S208
b
Martinelli N, Longo UG, et al. Cross-cultural adaptation and validation with reliability, validity, and responsiveness of the Italian version of the Oxford Hip Score in patients
with hip osteoarthritis. Qual Life Res (2011) 20:923–929
c
Ramisetty N, Kwon Y and Mohtadi N. Patient-reported outcome measures for hip preservation surgery. A systematic review of the literature. Journal of Hip Preservation Surgery
Vol. 2, No. 1, pp. 15–27
d
Mohtadi NG, Griffin DR, Pedersen ME et al. The development and validation of a self-administered quality-of-life outcome measure for young, active patients with symptom-
atic hip disease: the international hip outcome tool (iHOT-33). Arthroscopy 2012; 28: 595; 605
e
Kemp JL, Collins NJ, Roos EM et al. Psychometric properties of patient-reported outcome measures for hip arthroscopic surgery. Am J Sports Med 2013; 41: 2065–73
f
Thorborg K, Holmich P, Christensen A, Petersen F, Roos EM. The Copenhagen Hip and Groin Outcome Score (HAGOS): development and validation according to the COSMIN
checklist. Br J Sports Med. 2011;45:478–491
445
446 A. Grassi et al.
46.5.1 Harris Hip Score (HHS) replacement. The HOOS is an extension of the
WOMAC and is suggested to be valuable for
The HHS is a clinician-based outcome score younger and more active people due to the added
which includes ten items that evaluates pain, subscales. It is suitable for use in research as a
function, absence of deformity, and range of disease-specific questionnaire. It has a good cor-
motion [44]. There are two versions of the score: relation with Oxford hip score, the Lequesne
the original one, published in 1969, and the mod- index, and the VAS for pain. Based on its score, it
ified HHS (MHHS). The latter only includes could be graded as excellent (>41), good ( 34–41),
pain and function components and has been fair (27–33), or poor (<27) [86].
widely used to evaluate outcomes in hip arthros- Conditions: Osteoarthritis, general hip
copy surgery. The HHS is widely used through- disorders.
out the world for evaluating outcome after THR,
and it has also been proven appropriate to mea-
sure outcome after surgical interventions for 46.5.3 Oxford Hip Score (OHS)
femoral neck fractures. It seems to be useful for
short-time follow-up; moreover there are unac- The OHS is a 12-item patient-reported outcome
ceptable ceiling effects that severely limit its score regarding pain and function of the hip in
validity. The HHS has been used in many differ- relation to daily activities such as walking, dress-
ent countries (Sweden, the Netherlands, ing, and sleeping. It is designed for the assess-
Denmark, etc.), but there are no validated ver- ment of joint replacement and has been used in
sions in other languages available. It has a good several countries in large registry studies [31,
correlation with WOMAC, NHP, NAHS, and 83]. The OHS was developed to supplement other
SF-36 for pain and function domain. Based on generic outcome measures in systematic studies
the obtained score, it could be graded as excel- of hip replacement surgery with long-time fol-
lent (90–100), good (80–90), fair (70–80), or low-up. It has also been validated and used in
poor (<70) [86]. revision hip replacement. Due to its shortness,
Conditions: Femoral neck fractures, osteoar- the OHS questionnaire is feasible for surveys by
thritis of hip, impingement. mail, and it yields a high response rate and is
therefore preferred for larger studies. High
correlation was found between OHS and the
46.5.2 Hip Disability HHS in THR patients [86].
and Osteoarthritis Outcome Conditions: Osteoarthritis of hip.
Score (HOOS)
the need for hip replacement. It has limited con- outcome) [81]. The iHOT-33 was developed
struct validity; also the convergent validity of the with the cooperation of the multicenter arthros-
questionnaire has been questioned. copy of the hip outcomes research network
Recommendations are to only use the LISOH for (MAHORN). It has a short version: the iHOT12
group comparisons. Based on its score, the hand- that includes 12 items instead of the original 33,
icap derived from hip osteoarthritis could be designed to be more easily used in clinical set-
graded as extremely severe (>14), very severe tings and validated and tested for reliability. The
(11–13), severe (10–8), moderate (5–7), mild appropriate population for this tools includes
(1–4), or none (0) [86]. patients aged between 18 and 60 years who have
Conditions: Osteoarthritis, the effectiveness a Tegner activity level of 4 or higher, meaning
of pharmacologic interventions, and to help with that they are engaged in recreational physical
indications for surgery like THA. activities at least once a week or have an occupa-
tion involving moderately heavy labor. There
were no floor or ceiling effects noted for iHOT-
46.5.5 Non-arthritic Hip 33 in their original paper. In the end the construct
Score (NAHS) validity was demonstrated with a correlation of
0.81 to the NAHS [57].
The non-arthritic hip score (NAHS) consists of Conditions: Femoroacetabular impingement,
20 items distributed in four domains of pain, hip arthroscopic surgery for intra-articular hip
mechanical symptoms, functional symptoms, lesion.
and activity level. It was developed for young
active patients with higher demands and expecta-
tions. This is a patient-based, self-administered 46.5.7 The Copenhagen Hip
questionnaire that was developed as a modifica- and Groin Outcome Score
tion of the Western Ontario and McMaster (HAGOS)
Universities Osteoarthritis Index (WOMAC).
Input from patients, surgeons, physical thera- The HAGOS is a patient-reported outcome ques-
pists, and epidemiologists was used in creating tionnaire; it consists of 37 items distributed in six
NAHS scoring system. The NAHS has satisfac- subscales of pain, symptoms, physical function
tory internal consistency in each of its four in activities of daily living, physical function in
domains. But there is no further evidence about sports and recreation, participation in physical
internal consistency from head-to-head compari- activities, and hip- and/or groin-related quality of
son studies with other outcome measures. Hence, life [107].
the summation score for internal consistency for The Copenhagen hip and groin outcome score
NAHS is good. The summation score for test- was developed in 2011, and this was the first out-
retest reliability is excellent. Construct validity is come measure developed with the COSMIN
satisfactory between the NAHS and the Harris checklist guidelines. The goal of this instrument
hip score (HHS) and short form (SF)-12, respec- is to evaluate hip and/or groin disability related to
tively [18]. impairment (body structure and function), activ-
Conditions: All non-arthritic hip conditions. ity (activity limitations), and participation (par-
ticipation restrictions) according to the
international classification of functioning, dis-
46.5.6 International Hip Outcome ability, and health (ICF) in young to middle-aged
Tool-33 (iHOT-33) physically active patients with hip and/or groin
pain. The HAGOS has excellent test-retest reli-
These 33 questions were formulated into a self- ability properties; this was evident from ICC
administered questionnaire using a visual analog ranging from 0.82 to 0.92 for all its subscales
scale response format from 0 to 100 (worst–best from their original paper, and it has also excellent
448 A. Grassi et al.
internal consistency properties and good content Conditions: Knee ligament injury and sur-
validity. Floor or ceiling effects were noted in gery, meniscal tears, cartilage lesions, knee
some subscales of HAGOS as described in their dislocation.
original paper, while there were no floor effects
for HAGOS ceiling effects that were noted in
HAGOS ADL (32%) and physical activity (28%) 46.6.2 International Knee
subscales between 12 and 24 months after sur- Documentation Committee
gery. In the end construct validity is satisfactory Objective Score (Objective
between the HAGOS and the SF-36 [57, 91]. IKDC)
Conditions: Young to middle-aged patients
with long-standing hip and/or groin pain. The objective IKDC form is a clinician-reported
score that evaluates all the aspects of knee find-
ings. Twenty-five items, grouped in seven sub-
46.6 Measures of Knee Function groups evaluating effusion, passive motion deficit,
ligament examination, crepitus, harvest site
The knee is one of the most investigated joints in pathology, X-ray findings, and functional evalua-
Orthopaedics and Sports Medicine; therefore many tion with one-leg hop test, compose it. Every item
outcome measures exist for clinical or research use. is rated in a four-grade Likert scale from A to D or
The most relevant are those evaluating pain, func- normal to severely abnormal, with respect to the
tion, quality of life, and activity. However, the contralateral healthy knee. The overall score is
clinician-reported scales that have been described determined by the lowest value of considering
allow to register objective characteristics of the only the first three subgroups (swelling, passive
joint such as deformity, ROM, and stability. ROM, and ligament stability). However, all the
The basic psychometric characteristics, items should be compiled even if they did not con-
strength, and weakness of the most common tribute to the overall score. This form is frequently
scales for knee function are described (Table 46.4). used when evaluating ligamentous surgery and
allows the comparison of different groups of treat-
ment in a reliable m anner. However, to increase
46.6.1 International Knee its precision, it requires instrument-assisted evalu-
Documentation Committee ation of knee stability. Moreover, in the case of
Subjective Score bilateral pathology or contralateral previous
(Subjective IKDC) injury, the score could not be used since it implies
the comparison to a healthy contralateral limb.
The subjective IKDC form is an 18-item patient- Usually, the grade C and D are considered as fail-
reported score that examines knee symptoms, ure of the treatment [1].
sport participation, and daily activities [53]. It Conditions: Especially knee ligament injury
was developed in 1994 and revised to its current and surgery, but also other knee conditions inves-
form in 2001. Its strength is the comprehensive tigated by subjective IKDC form such as menis-
assessment of the patient status and, above all, cal tears, cartilage lesions, knee dislocation.
responsiveness to change following surgical
interventions. Its limited administrative and
respondent burden makes it ideal in both clinical 46.6.3 Knee Injury and Osteoarthritis
and research settings. Moreover, it has a good Outcome (KOOS)
correlation with the Cincinnati knee rating sys-
tem, VAS for pain, WOMAC, Lysholm, and The KOOS is a 42-item patient-reported scale,
SF-36. On the other hand, it lacks in the psycho- which includes five domains, each one scored
metric testing, which makes it suboptimal for the separately: pain, symptoms, activity of daily life
evaluation of osteoarthritic patients [21]. (ADL), sports and recreational activities, and
Table 46.4 Measures of knee function
Time Time
compilation calculation Internal
Questionnaire Items Options Administration Range Cutoffs (min) (min) consistency Test-retest SRM MDC MCID
Subjective 18 Likert Patient 0 (worst) to 100 (best) No 10 5 0.92–0.97 0.87–0.89 0.94–1.5 6.7 11.5
IKDCa (5–10)
Objective 25 Likert (4) Clinician A (best) to D (worst) Yes 15 3 NR NR NR NR NR
IKDCa
KOOSa 25 Likert (5) Patient 0 (worst) to 100 (best) No 10 5 0.25–0.90 0.0–0.97 0.61–2.12 5–21 NR
Lysholma 8 Likert Patient 0 (worst) to 100 (best) Yes 5 3 0.65–0.73 0.88–0.97 0.90–1.10 8.9–10.1 NR
(3–6)
OKSa 12 Likert (5) Patient 0 (best) to 48 (worst) No 5 5 0.97–0.93 0.91–0.94 0.7 6.1 NR
CKRSb 13 Mix Patient + clinician 0 (worst) to 100 (best) Yes 20 10 NR 0.75–0.98 0.72–2.48 NR NR
WOMACa 24 Likert (5) Patient 0 (best) to 20 pain, 8 No 10 5 0.67–0.98 0.65–0.98 0.40–2.02 10.6–30.6 14–33
stiff, 68 funct (worst)
WOMAC 24 VAS/ Patient 0 (best) to 500 pain, No 5 3 NR NR NR NR NR
VASa NRS 200 stiff, 1700 funct
(worst)
KSSc 7 Likert Patient + clinician 0 (worst) to 100 (best) Yes 10 5 0.74–0.94 0.65–0.88 NR NR NR
(2–25)
HSSd 13 Mix Patient + clinician 0 (worst) to 100 (best) Yes 10 5 0.70 0.98–0.99 NR NR NR
46 Common Scales and Checklists in Sports Medicine Research
knee-related quality of life (QoL). It is a com- graded as excellent (95–100), good (84–94), fair
plete questionnaire, since it explores all the pos- (65–83), or poor (<65) [21].
sible domains of a multitude of possible knee Conditions: Ligament injuries and surgery,
pathologies [99]. However, for this reason, particularly knee conditions with symptoms of
acceptability and reliability could be different instability, but also meniscal tears, cartilage
based on the patient’s age and condition, espe- lesions, patellofemoral pain, and knee
cially on the sport subscale. It has good correla- osteoarthritis.
tion with the SF-36 score and the WOMAC. For
these reasons and its relative simplicity, the
KOOS is used extensively, especially in large- 46.6.5 Oxford Knee Score (OKS)
volume registries. Moreover, the individual score
for each subscale, rather than an aggregate score, The OKS is a 12-item patient-reported score
allows for clinical interpretation of different developed for patients undergoing TKR [34].
interventions in different dimensions. On the However, it could be used to evaluate OA and
other hand, the KOOS has not been validated for early OA. For these reasons it has a good correla-
telephone and interview administration, which tion with WOMAC, KOOS, and SF-36 scores. It
could limit its applicability due to the need for is valid, reliable and responsive to change of
direct patient involvement [21]. score, which makes it useful in research settings.
A short version of the KOOS which includes However, its development based on knee OA lim-
only seven items from the ADL and sport sub- its its use [21].
scales is available as the KOOS-PS (physical Conditions: TKR, OA, rheumatoid arthritis.
function short form), which is shorter, faster, and
easier to administer in clinical setting [89].
Conditions: Young and middle-aged patients 46.6.6 Cincinnati Knee Rating
with post-traumatic OA (undergoing TKA), System (CKRS)
patient with chondral, ligamentous, or meniscal
injuries. The CKRS, that has been proposed in 1983 and
further modified along the years, is a both patient-
and clinician-reported form that is composed of
46.6.4 Lysholm Score 13 scales assessing symptoms (pain, swelling,
and giving way), perception of overall knee con-
The Lysholm score is an eight-item patient- dition, daily life function (walking, stairs climb-
reported scale that evaluates knee symptoms such ing, and squatting), sport function (running,
as limp, locking, swelling, instability, pain, stair jumping, and pivoting), sport activity, and occu-
climbing, and squatting. It is one of the most pation. The evaluation is completed with physical
commonly used clinical scores for knee evalua- examination, functional testing with one-legged
tion, introduced in 1982 [74]. It is extremely hoop exercises, and radiographic measurement of
popular and widely used in clinical and research joint narrowing. The overall score, rated from 0
settings. It has a limited floor and ceiling effect, to 100, is obtained combining symptoms (20),
making it is useful for tracking improvement of functional activities (15), physical examination
interventions or deterioration over time. (25), stability (20), radiographic findings (10),
Moreover, it has a good correlation with the sub- and functional testing (10). Based on the results,
jective IKDC, Cincinnati knee ligament score, it could be graded as excellent (>80), good (55–
and the WOMAC. However, its main limitation is 79), fair (30–54), or poor (<30). It is a compre-
that it is clinician derived with no patient input. hensive and rigorous scale, with a good reliability
There are concerns about limited reliability and a and high responsiveness to detect changes.
lack of definition of MCID. The Lysholm score is However, it can be quite time-consuming. Its use
46 Common Scales and Checklists in Sports Medicine Research 451
is mostly limited to sports medicine ligamentous It is used mostly for TKA evaluation, and it has a
and meniscal knee conditions [4]. good correlation with SF-36 and the OKS score.
Conditions: Ligamentous injuries and sur- Based on the obtained values, it could be inter-
gery, meniscal allograft and repair. preted as excellent (80–100), good (70–79), fair
(60–69), or poor (<60) [43, 102].
Conditions: Knee OA.
46.6.7 Western Ontario
and McMaster Universities
Index (WOMAC) 46.6.9 Hospital for Special Surgery
Score (HSS)
The WOMAC is a 24-item patient-reported
scale that evaluates three domains, each one The HSS is a 13-item scale, both patient- and
with a dedicated subscale: pain, stiffness, and clinician-reported. It evaluates pain, function,
functional activities. It is available both as ROM, muscle strength, flexion deformity, insta-
five-point Likert scale and 100-mm VAS or bility and alignment. It has similar features to
NRS; therefore, based on the type of scoring, the KSS score and, also, could be graded as
different ratings are obtained for the three sub- excellent (85–100), good (70–84), fair (60–69),
scales. However, the obtained values could be or poor (<60). It offers a precise evaluation of
converted to a simple 0–100 scale. The knee function but lacks in general quality of life
WOMAC is one of the most common scales to assessment. Therefore, it should be used with
evaluate patients with knee OA and is validated other scores capable of depicting patient’s gen-
in numerous languages. Moreover, it has the eral condition [84].
advantage of being validated for use in person, Conditions: Knee OA.
over the telephone, or electronically. The indi-
vidual scores for the three domains, rather than
the aggregate value, enhance its interpretation 46.6.10 K
ujala Anterior Knee Pain
for each domain. However, the presence of Scale (AKPS)
items related to uncommon tasks could result
in missing data, while the lack of difficult tasks The AKPS, also known as “Kujala score,” is a
makes it not optimal for more active patients. 13-item patient-reported scale that evaluates the
This scale is optimal for research purpose due subjective response to six activities, regarded as
to its reliability and ability to detect changes triggers for anterior knee pain such as walking,
especially after surgical and nonsurgical inter- running, jumping, climbing stairs, squatting, and
ventions for knee OA and chondral defects [6, sitting [66]. Moreover, the evaluation is inte-
7, 21]. grated with objective basic knee characteristics
Conditions: Knee OA, cartilage lesions, and such as swelling, thigh atrophy, flexion contrac-
ACL injury. ture, and patellar abnormal movements.
Therefore, it is specifically dedicated to painful
conditions of the anterior knee and specifically
46.6.8 Knee Society Score (KSS) patellofemoral pathologies. It is a simple and
fast questionnaire and has a good correlation
The KSS is a seven-item both patient- and clini- with Lysholm, KOOS, and VAS pain; however, it
cian-based score that integrates subjective assess- does not distinguish between patients with one
ment of pain with objective features such as episode of patellar dislocation and recurrent
flexion or extension lag, ROM, alignment, and instability [27].
laxity. For this reason, it is limited by low reli- Conditions: Anterior knee pain conditions,
ability and inter- and intra-observer variations. patellofemoral pathologies, especially instability.
452 A. Grassi et al.
46.6.11 V
ictorian Institute of Sport to 100 [45, 64]. The AOFAS scores are not purely
Assessment-Patella patient-reported since it incorporates both sub-
(VISA-P) jective and objective data that requires the clini-
cal assessment. Despite its popularity the AOFAS
The VISA-P score is an eight-item patient- score has limitations due to lack of validation,
reported questionnaire composed of a VAS and a high inter-observer variability, and poor correla-
Likert portion, which assesses pain during activity tion with other generic PROMs. For these rea-
or functional tests and sport participation. It has sons the AOFAS society itself recommended the
been specifically developed for the measurement usage of more validated and standardized out-
of patellar tendon-related conditions. It has a good come scores [90].
reliability and repeatability and has a good corre- Conditions: These region-specific question-
lation with the VAS pain. Moreover, since its naires have been used to evaluate patients in a
MCID is available, the VISA-P represents one of wide variety of foot and ankle pathologies such
the most utilized scores for the assessment of as arthritis, cartilage defects, soft tissue patholo-
treatments for patellar tendinopathy [48]. gies, and toe and finger deformities.
Conditions: Patellar tendon disorders.
(worst)
QOLb 5 Likert (5) Patient 0 (worst) to 20 No 3 1 0.85–0.91 NR NR NR NR
(best)
OMASd 9 Likert Patient 0 (worst) to No 3 3 0.76 0.95 NR NR NR
(3–5) 100 (best)
VISA-Ae 8 Mix Patient 0 (worst) to No 5 7 0.73 0.79 NR NR NR
100 (best)
Note: AOFAS (American orthopaedic foot and ankle society score), AAOS-FAM (American Academy of Orthopaedic: foot and ankle), FFI (foot function index), FAOS (foot
and ankle outcomes score), FAAM (foot and ankle ability measure), FADI (foot and ankle disability index), ACFAS (American College of Foot and Ankle Surgeons), FHSQ
(foot health status questionnaire), ROFPAQ (Rowan foot pain assessment), QOL (sport ankle QOL), OMAS (Olerud-Molander ankle score), VISA-A (Victorian Institute of
sports assessment-Achilles)
a
De Boer AS, Meuffels DE, et al. Validation of the American Orthopaedic Foot and Ankle Society Ankle-Hindfoot Scale Dutch language version in patients with hindfoot frac-
tures. BMJ Open. 2017;7(11):e018314
b
Martin RL, Irrgang JJ. A survey of self-reported outcome instruments for the foot and ankle. J Orthop Sports Phys Ther. 2007;37(2):72–84
c
Cook JJ, Cook EA, et al. Validation of the American College of Foot and Ankle Surgeons Scoring Scales. J Foot Ankle Surg. 2011;50(4):420–9
d
Nilsson et al. The Swedish version of OMAS is a reliable and valid outcome measure for patients with ankle fractures. BMC Musculoskeletal Disorders. 2013;14:109
e
Iversen, J. V., Bartels, et al. Danish VISA-A questionnaire. Scand J Med Sci Sports. 2016; 26: 1423–1427
453
454 A. Grassi et al.
s uggests that FFI may be a good measure of both Conditions: Sport-related foot and ankle
health status and patients’ outcomes [13, 14, 104]. pathologies and trauma evaluation.
Conditions: Generally used in older patients,
rheumatoid patients, orthotics outcomes, poor
reliability in professional athlete due to the 46.7.7 American College of Foot
reported ceiling effect. and Ankle Surgeons (ACFAS)
Universal Evaluation Scoring
Scales
46.7.4 Foot and Ankle Outcomes
Score (FAOS) The American College of Foot and Ankle
Surgeons developed these anatomically based
The FAOS, released in 2001, is a 42-item patient- scoring scales in 2005 as clinical instruments to
reported outcomes measure that consists in five evaluate objective and subjective parameters
subscales (pain, symptoms, ADL, sport, and before and after surgery [106]. Four modules
ankle-related quality of life). Each subscale is exist for the first metatarsal-phalangeal joint and
graded separately and scored in a 0–100 value. first ray, the forefoot (excluded first ray), the rear
The FAOS demonstrated good reliability and foot, and the ankle; each questionnaire is com-
validity, but the length of this survey can create pleted by both patient and clinician and includes
significant burden for the patient [41, 98]. subjective (pain, appearance, and functional
Conditions: It has been validated for a variety capacity) and objective (radiographic and func-
of foot and ankle pathologies such as adult flat- tional) parameters, for a total of 100 points. This
foot deformity, hallux valgus, hallux rigidus. instrument has been validated and presents good
reliability and sensitivity to change [24].
Conditions: Foot and ankle musculoskeletal-
46.7.5 Foot and Ankle Ability related pathologies requiring surgical intervention.
Measure (FAAM)
The FAAM was developed in 2005; it is region- 46.7.8 Foot Health Status
specific and composed of 29 patient-reported Questionnaire (FHSQ)
items divided in activity of daily living (ADL)
and sport subscales. A recent study demonstrated The FHSQ was developed for individuals under-
that the FFI and FAAM are highly correlated for going surgical treatment for common foot condi-
foot and ankle trauma patients [40, 77]. tions. It consists of four subscales with a total of
Conditions: It is valid for a range of foot and 13 items representing the following four
ankle conditions as well as for chronic ankle domains: pain (four items), function (four items),
instability and diabetes mellitus-related footwear (three items), and general foot health
conditions. (two items). Scores from each subscale range
from 0 to 100, with a higher score representing
better outcomes [8].
46.7.6 Foot and Ankle Disability Conditions: Foot- and ankle-related disorders
Index (FADI) including those affecting skin and nail.
39 items in the following four subscales for pain ent domains: pain, stiffness, swelling, stair climb-
assessment: sensory (16 items), affective (ten ing, running, jumping, squatting, supports, and
items), cognitive (ten items), and comprehension work/activity level [87, 88].
(three items). Each subscale is scored indepen- Conditions: Ankle fracture, ligament ankle
dently from 1 through 5, and the item responses injury.
are merged together to produce a subscale score
ranging from 1 to 5, with a higher score repre-
senting more pain [100]. 46.7.12 V
ictorian Institute of Sports
Conditions: Chronic foot and ankle pain. Assessment-Achilles
(VISA-A)
46.7.10 S
port Ankle QOL (Quality The VISA-A is a disease-specific instrument
of Life) designed to evaluate the clinical severity for
patients with chronic Achilles tendinopathy. It
The sport ankle rating system quality of life mea- is an easily self-administered questionnaire
sure was developed as a region-specific measure that evaluates symptoms and their effect on
including self-reported and clinician-completed physical activity. The questionnaire contains
outcome measures. The QOL measure, the clini- eight questions, covering three necessary
cal rating score, and a single numeric evaluation domains: pain, functional status, and activity.
are the three outcome measures that could be The first six questions use a visual analog scale
used together or independently. The QOL is a so that the patient may report the magnitude of
self-reported questionnaire designed to assess an a continuum of subjective symptoms; the final
athlete’s quality of life after an ankle injury; it two questions used a categorical rating scale.
contains five items that evaluate symptoms, work The final results range from 0 to 100, with
and school activities, recreation and sports activi- asymptomatic persons expected to score 100
ties, activities of daily living, and lifestyle. points [95].
The clinical rating score is composed of 11 Conditions: Chronic Achilles tendinopathy.
items both patient and clinician based; finally,
with the numeric VAS evaluation, the patient is
asked to score his ankle function from 0 to 100 46.8 Measures of Activity Level
[113].
Conditions: Ankle injuries and specifically Making patients capable of an unlimited physical
ankle sprains. activity is the main focus of clinicians; for this
reason several scores have been created to assess
outcomes in terms of return to sport/activity
46.7.11 Olerud-Molander Ankle (RTS). While considering these instruments, a
Score (OMAS) factor to be outlined is that athletes are different
from the general population since they have
The OMAS is a disease-specific outcomes mea- higher level of physical function and perceived
sure developed for patients with ankle fractures health, often they do not perceive symptoms dur-
and has been frequently used to evaluate this ing the daily activities, and common outcome
group of subjects; furthermore, it has been measures may not detect problems that only
reported to be a valid item for recording short- result from high-intensity training and
term changes after an acute ankle ligament injury. competition.
OMAS is a self-administered patient question- The basic psychometric characteristics,
naire; the scale ranges from 0 points (totally strength, and weakness of the most common
impaired function) to 100 points (completely scales for activity level are described below
unimpaired function) and is based on nine differ- (Table 46.6).
Table 46.6 Measures of activity level
456
Time Time
compilation calculation Internal
Questionnaire Items Options Administration Range Cutoffs (min) (min) consistency Test-retest SRM MDC MCID
TEGNERa 1 Likert (11) Patient 0 (worst) to 10 (best) No 1 1 NR 0.8 1.0 1 1
UCLAb 1 Likert (10) Patient 1 (worst) to 10 (best) Yes 2 2 NR NR NR 1 1
ARSc 4 Likert (5) Patient 0 (worst) to 16 (best) No 1 1 8.87 0.81 NR NR NR
AASd 1 Likert (11) Patient 0 (worst) to 10 (best) No 1 1 1.0 NR NR 1 1
LEFSe 20 Likert (5) Patient 0 (worst) to 80 (best) No 5 3 0.96 0.86 NR 9 9
SASf 7 Likert Patient 0 (worst) to No 1 1 NR 0.92 NR NR NR
(4–5) 20(best)/A–D
HASg 44 Likert (6) Patient 0 (worst) to 220 (best) No 120 30 NR 0.75 NR NR NR
OSTRCh 4 (each Likert Patient 0 (best) to 100 (worst) No >5 (variable) 3–7 0.86–0.91 NR NR NR NR
joint) (4–5)
SQUASHi 11 Open Patient 1 min. to 9 max. (per No 3–5 10 NR NR NR NR NR
item) no upper limit
IPAQ-SFi 7 Open Patient 0 to No upper limit No 3–5 10 <0.8 NR NR NR NR
HAPi 94 Likert (3) Patient 1 (worst) to 94 (best) No 10 5 NR 0.84 MAS; NR 6.5 MAS; NR
0.79 AAS 8.4 AAS
Note: ARS (activity rating scale), AAS (ankle activity score), LEFS (lower extremity functional scale), SAS (shoulder activity scale), HAS (Heidelberg sport activity score), OSTRC
(Oslo Sports Trauma Research Centre overuse injury questionnaire), SQUASH (short questionnaire to assess health-enhancing physical activity), IPAQ-SH (international physical
activity questionnaire – short form), HAP (human activity profile)
a
Briggs KK, Lysholm J, et al. The reliability, validity, and responsiveness of the Lysholm score and Tegner activity scale for anterior cruciate ligament injuries of the knee: 25 years
later. Am J Sports Med. 2009;37(5):890–7
b
Terwee CB1, Bouwmeester W, et al. Instruments to assess physical activity in patients with osteoarthritis of the hip or knee: a systematic review of measurement properties.
Osteoarthritis Cartilage. 2011;19(6):620–33
c
Hossein Negahban, Neda Mostafaee, Soheil Mansour Sohani, Masood Mazaheri, Shahin Goharpey, Mahyar Salavati, Shahla Zahednejad, Zohreh Meshkati & Ali Montazeri (2011)
Reliability and validity of the Tegner and Marx activity rating scales in Iranian patients with anterior cruciate ligament injury, Disability and Rehabilitation, 33:22–23, 2305–2310
d
Tamás Halasi, MD, Ákos Kynsburg, MD, et al. Development of a New Activity Score for the Evaluation of Ankle Instability. The American Journal of Sports Medicine.
2004;32,899–908
e
Alcock GK, Stratford PW. Validation of the Lower Extremity Function Scale on athletic subjects with ankle sprains. Physiother Can. 2002;54:233–240
f
Brophy RH, Beauvais RL, et al. Measurement of shoulder activity level. Clin Orthop Relat Res. 2005;439:101–8
g
Seeger J, Weinmann S, et al. The Heidelberg Sports Activity Score - A New Instrument to Evaluate Sports Activity. The Open Orthopaedics Journal. 2013;7:25–32
h
Clarsen B, Myklebust G, et al. Development and validation of a new method for the registration of overuse injuries in sports injury epidemiology: the Oslo Sports Trauma Research
Centre (OSTRC) Overuse Injury Questionnaire. Br J Sports Med. 2013;47:495–502
i
Terwee CB, Bouwmeester W, et al. Instruments to assess physical activity in patients with osteoarthritis of the hip or knee: a systematic review of measurement properties.
Osteoarthritis Cartilage. 2011;19(6):620–33
A. Grassi et al.
46 Common Scales and Checklists in Sports Medicine Research 457
46.8.1 Tegner Activity Score time/month) to 4 (>4 time/month), and the total
score range is 0–16. ARS is based on the idea of
First described in 1985 [105] for the prospective measuring specific components of function/
evaluation of the knee ligaments injuries, the movement (that apply universally to the lower
Tegner activity scale provides an arbitrary rank- limb) to allow more accurate comparison among
ing based on the level of sport and leisure time patients. This scale can be completed in a very
activities and competition at which the patient is short time frame [78].
currently participating. It is a simple scale in Conditions: Sport activity involving complex
which the subject indicates his/her current activ- articular motions of the knee and lower limb.
ity ranging from 0 (no physical activity/dis-
abled) to 10 (participation in competitive soccer
or pivoting sports). It was created as a comple- 46.8.4 Ankle Activity Score (AAS)
ment to the Lysholm score; but its use has also
extended into other joints, including the hip and The AAS is a joint-specific score that was pub-
ankle [42, 78]. lished in 2004; it was based on the Tegner score.
Conditions: Tegner score was developed and It contains 53 sports, three working activities,
is mostly used for knee ligamentous injuries and and four general activities; the patient is asked to
reconstructions. select his/her most appropriate sport/activity and
to indicate a level of participation (top level,
lower competitive level, recreational level). The
46.8.2 University of California at Los result, as with the Tegner score, is represented by
Angeles (UCLA) Activity a single number from 0 to 10 [42].
Rating Scale Conditions: Ankle injuries.
The ARS/Marx questionnaire quantifies the fre- The Brophy-Marx SAS was developed in 2005 as
quency of activities that challenge the dynamic an easy instrument to evaluate the patient’s over-
stability of the knee; it consists in four questions all shoulder activity level that could be general-
about how frequently the patients perform activ- ized across different sports and completed in less
ity such as running, cutting, decelerating, and than 1 min. It is composed of two parts: the first
pivoting. Each question is scored from 0 (<1 five items describe five common activities of the
458 A. Grassi et al.
shoulder and the relative frequency, during the injury problems in any area of the body. The
patient’s previous year, for each item is scored instrument is designed in four items for each
from 0 to 4 (never or less than once a month, once affected joint; the final “severity score” ranges
a month, once a week, more than once a week, or between 0 and 100 (25 point for item) for each
daily). The total numerical activity score ranges overuse problem. In studies with multiple ana-
from a minimum of 0 points to a maximum of 20 tomical areas of interest, the four questions are
points. In the second part of the score, the patients repeated for each area. This questionnaire uses
are asked if they participate in contact sports and the term “problem” rather than “injury” since
sports that involve repetitive overhead throwing. there is greater variation in interpretation of the
The answers of these two questions range from A term “injury” [20].
(no) to D (yes, at professional level). SAS has Conditions: OSTRC is mainly used for the
shown good reliability and validity [11, 12]. evaluation of overuse problems in sports injury
Conditions: This score has been developed on epidemiology.
patients with rotator cuff tears.
The HAS was published in 2013; this validated The SQUASH was not designed to measure
instrument divides sport activities into 11 catego- energy expenditure but to give an indication of
ries: walking, swimming, cycling, running, cross- the habitual activity level. It consists of 11 ques-
country skiing, alpine skiing, golfing, dancing, tions on commuting activities, leisure time and
racket sports, ball sports, and miscellaneous. For sports activities, household activities, and activi-
each of these activities, the patient is asked to grade ties at work and school. The total activity score is
between 0 and 5 about frequency, duration, level of calculated by taking the sum of the activity scores
importance, and impairment from the affected for separate questions [112].
joint. For each activity, a core from 0 to 20 is calcu- Conditions: SQUASH is a short physical
lated with a formula: (frequency + dura- activity questionnaire with the general purpose to
tion) × (1 + impairment/10 + importance/10). The assess habitual physical activity.
scores are then added to obtain a final score
between 0 and 220. HAS has proven high validity,
sensitivity, reliability, and sensitivity. It can be used 46.8.10 International Physical
for elite-level athletes and athletes who perform Activity Questionnaire-
different sports and is valid for different joints; nev- Short Form (IPAQ-SF)
ertheless, its disadvantage is an extremely long
time for compilation (120 min) [103]. The IPAQ-SF consists in seven questions about
Conditions: Evaluation of activity after the frequency and durations of participation in
trauma or surgery; it can be used in elite-level strenuous, moderate, and walking activities in
athletes. addition to the time spent sitting during the past
week. The final score is expressed in metabolic
equivalents (METs) which represent the oxygen
46.8.8 Oslo Sports Trauma Research consumption of an individual sitting for 1 min
Center Overuse Injury (3.5 mL/kg/min) [26, 29].
Questionnaire (OSTRC) Conditions: IPAQ has been validated, and it
presents reasonable measurement properties for
The intention of the OSTRC was to create a monitoring population levels of physical activity
questionnaire that could be applied to overuse in diverse settings.
46 Common Scales and Checklists in Sports Medicine Research 459
46.8.11 H
uman Activity Profile The basic psychometric characteristics,
(HAP) strength, and weakness of the most common
scales for global and mental health are described
The HAP is a 94-item self-report measure of (Table 46.7).
energy expenditure or physical fitness; it was
developed as an outcome measure for medical
rehabilitation for people with a wide spectrum of 46.9.1 36-Item Short-Form Health
physical disorders. It consists of a list of activi- Survey (SF-36) and Short-
ties for which patients should indicate if they are Form 12 (SF-12)
currently able to perform the activity, have
stopped performing the activity, or have never The SF-36 is a general health measure, intro-
performed the activity. Each of the selected duced in 1992 that includes 36 items addressing
activities has an estimate energy requirement eight domains of overall health status: physical
between approximately 1 and 10 METs. Two functioning (PF), bodily pain (BP), role limita-
scores are calculated: the maximum activity tions due to physical health problems (RP), role
score (MAS) and the adjusted activity score limitations due to personal or emotional prob-
(AAS) [38]. lems (RE), general mental health (MH), social
Conditions: Epidemiologic and population functioning (SF), energy/fatigue or vitality (VIT),
studies as well as rehabilitation medicine. and general health perceptions (GH). Although
this scale has been validated for orthopedic use,
experts recommend pairing the SF-36 with an
Fact Box 46.2 orthopedic-specific measure since it is a general
Athletes population is in many ways differ- health scale, and it could be difficult to isolate
ent from general one; they have much orthopedic outcomes from other unrelated health
higher level of physical functioning and conditions.
health status. For this reason, they do not The SF-12 is a shortened version of the SF-36,
perceive symptoms during the daily activi- developed in 1996, with the aim of reducing
ties and choosing ad adequate outcome redundancies and time burden on the patient. It
measure is compulsory. shortens the survey to 12 items and reports 2
scores in physical and mental domains. It has
been validated for orthopedic patients. The SF-12
was included as a recommended PRO measure
46.9 Measures of Global for “general quality of life” by the AAOS.
and Mental Health Both SF-36 and SF-12 are great questionnaires
for outcomes assessment in research; both need to
Generic measures of health-related quality of be administered in conjunction with orthopedic-
life are frequently used to evaluate the impact specific measures [15, 52, 110, 111, 115].
of treatments and clinical results and to monitor
population health. Often these scales are com-
posed of various independent domains/dimen- 46.9.2 EuroQol-5 Domains-3 Likert
sions that together represent the notion of (EQ-5D-3L)
health-related quality of life. The items are
weighted to indicate the relative importance The EQ-5D health status and quality-of-life
attributed to them by the respondents and then measure is composed of five items (mobility,
aggregated into a single number reflecting the self-care, usual activity, pain/discomfort, and
quality or value of a health state. To obtain such anxiety/depression), with three possible response
values, several instruments have been levels (no problems, some/moderate problems,
developed. extreme problems). The EQ-5D index is
460
calculated from the five dimensions, ranging initiative develops and evaluates standard mea-
from −0.594 (worst) to 1.0. Moreover, to the sures for key patient-reported health indicators
EQ-5D index, the EQ-5D includes a VAS for rat- and symptoms. PROMIS measures are standard-
ing of overall health status from 0 (worst imagin- ized, allowing for assessment of many patient-
able health) to 100 (best imaginable health). A reported outcome domains such as pain, fatigue,
common criticism of this measure is the lack of emotional distress, physical functioning, and
sensitivity to change since only three levels of social role participation.
responses are available within each construct. Computerized adaptive testing (CAT) soft-
With the aim of addressing this issue, a version ware has been implemented; this allows tailoring
of the measure with five responses has been the PRO assessment to the individual patient by
developed, called the EQ-5D-5L [47]. selecting the most informative set of questions
based on responses to previous questions [17].
The AQoL is a 12-item instrument which loads Pain is a complex and subjective experience and
onto four dimensions: independent living, social that implies several measurement challenges. It is
relationships, physical senses, and psychological important for the clinicians to utilize sensitive
well-being. These subscales are weighted and accurate pain outcome measures although
between 0.0 (death) and 1.0 (full health). With currently we rely mainly on self-report measures.
its emphasis upon psychosocial dimensions of The cutoff value for clinical significance of pain
health, it offers significant advantages for evalu- reduction must be determined on the minimal
ation studies where these dimensions are impor- amount of change being important to patients. A
tant [93]. reduction of 10–20% of pain can be considered
clinically significant [67].
The basic psychometric characteristics,
46.9.4 Nottingham Health strength, and weakness of the most common
Profile (NPH) scales for pain assessment are described
(Table 46.8).
The NHP questionnaire is a self-administered
questionnaire. It was developed in English and
consists of two parts: Part 1 contains 38 “yes/no” 46.10.1 V
isual Analog Scale for Pain
questions covering six dimensions: pain, physical (VAS for Pain)
mobility, emotional reactions, energy, social iso-
lation, and sleep. Part 2 has seven “yes/no” ques- The VAS for pain was introduced in 1976; it is a
tions concerning problems of daily activities. It widely recognized and simple instrument that
has been shown to be internally consistent, valid, allows the patient to score his own pain level on a
reproducible, and sensitive [55]. straight 100-mm line with zero indicating “no
pain” and 100 “worst imaginable pain.” Its use-
fulness in orthopedic surgery has been recog-
46.9.5 Patient-Reported Outcomes nized, and VAS has proven high validity and
Measurement Information responsiveness; on the other hand, its low speci-
System 10 Global Health ficity has been shown with a 1.1 cm decrease cor-
(PROMIS-10 Global Health) responding to the minimal clinical important
difference for pain. The patient acceptable symp-
The PROMIS was established in 2004 with fund- tomatic state is considered with a value less than
ing from the National Institutes of Health; this 3 cm [37, 46, 59, 115].
462
46.10.2 N
umerical Rating Scale severe. The SF-MPQ also has a single VAS item
for Pain (NRS for Pain) for pain intensity and a VRS for rating the over-
all pain experience. It is used particularly to
The NRS is an 11-point scale consisting of inte- measure the sensory and affective aspects of
gers from 0 through 10: 0 representing “no pain” pain and pain intensity in adults with chronic
and 10 representing “worst imaginable pain.” pathologies [46, 79].
Respondents select the single number that best
represents their pain intensity. It is considered to
be more comprehensive compared to the VAS Fact Box 46.3
for; however, it may capture the complex nature Since “pain” perception is maybe the most
of the pain experience [46]. relevant outcome in clinical practice,
research protocols should include sensitive
and accurate pain measures. Currently we
46.10.3 V
erbal Rating Scale for Pain rely mainly on self-report measures where
(VRS for Pain) a reduction of 10–20% of pain can be con-
sidered as the minimal amount of change
The VRS is a single domain five-point scale con- being important to patients.
sisting of a list of sentences (no pain, mild pain,
moderate pain, intense pain, maximum pain)
describing increasing levels of pain severity.
Respondents select the single phrase that best 46.11 Measures of Sport-Related
characterizes their pain intensity [46]. Psychological Aspects
46.11.1 Injury-Psychological
46.10.5 S
hort-Form McGill Pain Readiness to Return
Questionnaire (SF-MPQ) to Sport Scale (I-PRRS)
The SF-MPQ is a multidimensional measure, The I-PRRS is an easy to use tool developed to
with extensive clinical research use. Patients rate measure the athlete’s psychological readiness to
their pain in sensory terms (e.g., sharp or stab- return to sport after injury. It is a six-item scale,
bing) and affective terms (e.g., sickening or fear- each item is scored from 0 (no confidence) to 100
ful), with 15 total descriptors. Each item is rated (maximum confidence) with intervals of 10. The
on a four-point scale that ranges from none to scores from the six items are summed and divided
464
Arthritis Care Res (Hoboken). 2011;63(Suppl 38. Fix AJ, Daughton DM. Human activity profile pro-
11):S208–28. https://doi.org/10.1002/acr.20632. fessional manual. Odessa: Psychological Assessment
22. Constant CR, Gerber C, Emery RJ, Sojbjerg JO, Resources, Inc.; 1988.
Gohlke F, Boileau P. A S186 Angst et al review of 39. Glazer DD. Development and preliminary vali-
the Constant score: modifications and guidelines for dation of the Injury-Psychological Readiness to
its use. J Shoulder Elb Surg. 2008;17:355–61. Return to Sport (I-PRRS) Scale. J Athl Train.
23. Constant CR, Murley AH. A clinical method of 2009;44(2):185–9.
functional assessment of the shoulder. Clin Orthop 40. Goldstein CL, Schemitsch E, et al. Comparison of
Relat Res. 1987;214:160–4. 56. different outcome instruments following foot and
24. Cook JJ, Cook EA, et al. Validation of the American ankle trauma. Foot Ankle Int. 2010;31(12):1075–80.
College of Foot and Ankle Surgeons Scoring Scales. 41. Golightly YM, Devellis RF, et al. Psychometric
J Foot Ankle Surg. 2011;50(4):420–9. properties of the foot and ankle outcome score
25. Cooney WP, Bussey R, Dobyns JH, Linscheid in a community-based study of adults with
RL. Difficult wrist fractures. Perilunate fracture- and without osteoarthritis. Arthritis Care Res.
dislocations of the wrist. Clin Orthop Relat Res. 2014;66(3):395–403.
1987;(214):136–47. 42. Halasi T, Kynsburg A, et al. Development of a new
26. Craig CL, Marshall AL, et al. International physi- activity score for the evaluation of ankle instability.
cal activity questionnaire: 12-country reliability and Am J Sports Med. 2004;32:899–908.
validity. Med Sci Sports Exerc. 2003;35(8):1381–95. 43. Hamamoto Y, Ito H, et al. Cross-cultural adapta-
27. Crossley KM, Bennell KL, et al. Analysis of out- tion and validation of the Japanese version of the
come measures for persons with patellofemoral new Knee Society Scoring System for osteoarthritic
pain: which are reliable and valid? Arch Phys Med knee with total knee arthroplasty. J Orthop Sci.
Rehabil. 2004;85(5):815–22. 2015;20(5):849–53.
28. Dacombe PJ, Amirfeyz R, Davis T. Patient-reported 44. Harris WH. Traumatic arthritis of the hip after dis-
outcome measures for hand and wrist trauma: is there location and acetabular fractures: treatment by
sufficient evidence of reliability, validity, and respon- mold arthroplasty. An end-result study using a new
siveness? Hand (New YorkNY). 2016;11(1):11–21. method of result evaluation. J Bone Joint Surg Am.
29. Daughton DM, Fix AJ, et al. Maximum oxygen con- 1969;51:737–55.
sumption and the ADAPT quality-of life scale. Arch 45. Hasenstein T, Greene T, et al. A 5-year review of
Phys Med Rehabil. 1982;63:620–2. clinical outcome measures published in the Journal
30. Dawson J, Doll H, Boller I, et al. The development of the American Podiatric Medical Association and
and validation of a patientreported questionnaire to the Journal of Foot and Ankle Surgery. J Foot Ankle
assess outcomes of elbow surgery. J Bone Joint Surg Surg. 2017;56(3):519–21.
Br. 2008;90:466–73. 46. Hawker GA, Mian S, et al. Measures of adult pain:
31. Dawson J, Fitzpatrick R, Carr A, Murray Visual Analog Scale for Pain (VAS Pain), Numeric
D. Questionnaire on the perceptions of patients Rating Scale for Pain (NRS Pain), McGill Pain
about total hip replacement. J Bone Joint Surg Br. Questionnaire (MPQ), Short Form McGill Pain
1996;78:185–90. Questionnaire (SF MPQ), Chronic Pain Grade Scale
32. Dawson J, Fitzpatrick R, Carr A. Questionnaire on (CPGS), Short Form 36 Bodily Pain Scale (SF 36
the perceptions of patients about shoulder surgery. J BPS), and Measure of Intermittent and Constant
Bone Joint Surg Br. 1996;78:593–600. Osteoarthritis Pain (ICOAP). Arthritis Care Res.
33. Dawson J, Fitzpatrick R, Carr A. The assessment 2011;63:S240–52.
of shoulder instability. The development and vali- 47. Herdman M, Gudex C, et al. Development and
dation of a questionnaire. J Bone Joint Surg Br. preliminary testing of the new five-level ver-
1999;81:420–6. sion of EQ-5D (EQ-5D-5L). Qual Life Res.
34. Dawson J, Fitzpatrick R, Murray D, Carr 2011;20(10):1727–36.
A. Questionnaire on the perceptions of patients 48. Hernandez-Sanchez S, Hidalgo MD, et al.
about total knee replacement. J Bone Joint Surg Br. Responsiveness of the VISA-P scale for patel-
1998;80:63–9. lar tendinopathy in athletes. Br J Sports Med.
35. Dawson J, Rogers K, Fitzpatrick R, Carr A. The 2014;48(6):453–7.
Oxford Shoulder Score revisited. Arch Orthop 49. Hicks CL, von Baeyer CL, et al. The Faces Pain
Trauma Surg. 2009;129:119–23. Scale – Revised: toward a common metric in pediat-
36. Dreiser R, Maheu E, Guillou GB, Caspard H, ric pain measurement. Pain. 2001;93:173–83.
Grouin JM. Validation of an algofunctional index 50. Hunt KJ, Hurwit D. Use of patient-reported outcome
for osteoarthritis of the hand. Rev Rhum Engl Ed. measures in foot and ankle research. J Bone Joint
1995;62:43S–53S. Surg Am. 2013;95(16):e118(1–9.
37. Ferreira-Valente MA, Pais-Ribeiro JL, et al. 51. Hunt KJ, Lakey E. Patient-reported outcomes in
Validity of four pain intensity rating scales. Pain. foot and ankle surgery. Orthop Clin North Am.
2011;152(10):2399–404. 2018;49(2):277–89.
468 A. Grassi et al.
52. American Academy of Orthopaedic Surgeons. 64. Kitaoka HB, Alexander I, et al. Clinical rating sys-
Instruments for collection of orthopaedic quality tems for the ankle-hindfoot, midfoot, hallux, and
data; 2016. lesser toes. Foot Ankle Int. 1994;15(7):349–53.
53. Irrgang JJ, Anderson AF, Boland AL, Harner CD, 65. Klassbo M, Larsson E, Mannevik E. Hip Disability
Kurosaka M, Neyret P, et al. Development and and Osteoarthritis Outcome Score: an extension of
validation of the International Knee Documentation the Western Ontario and McMaster Universities
Committee subjective knee form. Am J Sports Med. Osteoarthritis Index. Scand J Rheumatol.
2001;29:600–13. 2003;32:46–51.
54. ISAKOS Scientific Committee, Audigé L, Ayeni 66. Kujala UM, Jaakola LH, Koskinen SK, Taimela S,
OR, Bhandari M, Boyle BW, Briggs KK, Chan K, Hurme M, Nelimarkka O. Scoring of patellofemoral
Chaney-Barclay K, Do HT FM, Fu FH, Goldhahn J, disorders. Arthroscopy. 1993;9:159–63.
Goldhahn S, Hidaka C, Hoang-Kim A, Karlsson J, 67. Lee JS, Hobden E, et al. Clinically important change
Krych AJ, RF LP, Levy BA, Lubowitz JH, Lyman S, in the visual analogue scale after adequate pain con-
Ma Y, Marx RG, Mohtadi N, Marcheggiani Muccioli trol. Acad Emerg Med. 2003;10(10):1128–30.
GM, Nakamura N, Nguyen J, Poehling GG, Roberts 68. Lequesne M. Indices of severity and disease activ-
LE, Rosenberg N, Shea KP, Sohani ZN, Soudry M, ity for osteoarthritis. Semin Arthritis Rheum.
Voineskos S, Zaffagnini S, International Society of 1991;20(Suppl 2):48–54.
Arthroscopy, Knee Surgery and Orthopaedic Sports 69. Lequesne MG, Mery C, Samson M, Gerard
Medicine. A practical guide to research: design, P. Indexes of severity for osteoarthritis of the hip
execution, and publication. Arthroscopy. 201127(4 and knee: validation—value in comparison with
Suppl):S1–112. other assessment tests. Scand J Rheumatol Suppl.
55. Jenkinson C, Fitzpatrick R, et al. The Nottingham 1987;65:85–9.
Health Profile: an analysis of its sensitivity in 70. Lequesne MG. The algofunctional indices for hip and
differentiating illness groups. Soc Sci Med. knee osteoarthritis. J Rheumatol. 1997;24:779–81.
1988;27:1411–4. 71. Lippitt SB, Harryman DT, Matsen FA. A practical
56. Johanson NA, Liang MH, et al. American Academy tool for evaluation of function: the Simple Shoulder
of Orthopaedic Surgeons lower limb outcomes assess- Test. In: Matsen FA, Fu FH, Hawkins RJ, editors.
ment instruments. Reliability, validity, and sensitivity The shoulder: a balance of mobility and stabil-
to change. J Bone Joint Surg Am. 2004;86-a(5):902–9. ity. Rosemont: American Academy of Orthopedic
57. Kemp JL, Collins NJ, Roos EM, Crossley Surgeons; 1993. p. 545–59.
KM. Psychometric properties of patient-reported 72. Lo IKY, Griffin S, Kirkley A. The development and
outcome measures for hip arthroscopic surgery. Am evaluation of a disease-specific quality of life mea-
J Sports Med. 2013;41:2065–73. surement tool for osteoarthritis of the shoulder: The
58. Kennedy CA, Beaton DE, Solway S, McConnell Western Ontario Osteoarthritis of the Shoulder Index
S, Bombardier C. The DASH outcome measure (WOOS). Osteoarthritis Cartilage. 2001;9:771–8.
user’s manual. 3rd ed. Toronto: Institute for Work & 73. Longo G, Franceschi F, Loppini M, Maffulli N,
Health; 2011. Denaro V. Rating systems for evaluation of the
59. Kersten P, White PJ, et al. Is the pain visual analogue elbow. Br Med Bull. 2008;87(1):131–61.
scale linear and responsive to change? An exploration 74. Lysholm J, Gillquist J. Evaluation of knee ligament
using Rasch analysis. PLoS One. 2014;9(6):e99485. surgery results with special emphasis on use of a
60. Kirkley A, Griffin S, Alvarez C. The development scoring scale. Am J Sports Med. 1982;10:150–4.
and evaluation of a disease-specific quality of life 75. Macdermid J. Update: the Patient-rated Forearm
measurement tool for rotator cuff disease: The Evaluation Questionnaire is now the Patient-
Western Ontario Rotator Cuff Index (WORC). Clin J rated Tennis Elbow Evaluation. J Hand Ther.
Sport Med. 2003;13:84–92. 2005;18:407–10.
61. Kirkley A, Griffin S, et al. Scoring systems for the 76. Martin R, Burdett R, et al. Development of the Foot
functional assessment of the shoulder. Arthroscopy. and Ankle Disability Index (FADI). J Orthop Sports
2003;19(10):1109–20. Phys Ther. 1999;29(1):A32–3.
62. Kirkley A, Griffin S, McLintock H, Ng L. The devel- 77. Martin RL, Irrgang JJ, et al. Evidence of validity for
opment and evaluation of a disease-specific quality the Foot and Ankle Ability Measure (FAAM). Foot
of life measurement tool for shoulder instability: the Ankle Int. 2005;26(11):968–83.
Western Ontario Shoulder Instability Index (WOSI). 78. Marx RG, Stump TJ, et al. Development and evalu-
Am J Sports Med. 1998;26:764–72. ation of an activity rating scale for disorders of the
63. Kirkley A, Werstine R, Ratjek A, Griffin knee. Am J Sports Med. 2001;29(2):213–8.
S. Prospective randomized clinical trial comparing 79. Melzack R. The short-form McGill Pain
the effectiveness of immediate arthroscopic stabili- Questionnaire. Pain. 1987;30:191–7.
zation versus immobilization and rehabilitation in 80. Michener LA, McClure PW, Sennett BJ. American
first traumatic anterior dislocations of the shoulder: Shoulder and Elbow Surgeons Standardized
long-term evaluation. Arthroscopy. 2005;21:55–63. Shoulder Assessment Form, patient self-report
46 Common Scales and Checklists in Sports Medicine Research 469
section: reliability, validity, and responsiveness. J Centre for Health Economics, Monash University;
Shoulder Elb Surg. 2002;11(6):587–94. 2011.
81. Mohtadi NGH, Griffin DR, Pedersen ME, et al. 94. Roach KE, Budiman-Mak E, Songsiridej N,
The development and validation of a self-adminis- Lertratanakul Y. Development of a Shoulder
tered quality-of-life outcome measure for young, Pain and Disability Index. Arthritis Care Res.
active patients with symptomatic hip disease: 1991;4:143–9.
the International Hip Outcome Tool (iHOT-33). 95. Robinson JM, Cook JL, et al. The VISA-A ques-
Arthroscopy. 2012;28(5):595–610.e1. tionnaire: a valid and reliable index of the clinical
82. Morrey BF, An KN. Functional evaluation of the severity of Achilles tendinopathy. Br J Sports Med.
elbow. In: Morrey BF, editor. The elbow and its dis- 2001;35(5):335–41.
orders. Philadelphia: WB Saunders; 1993. 96. Roelofs J, van Breukelen G, et al. Norming of
83. Murray DW, Fitzpatrick R, Rogers K, Pandit H, the Tampa Scale for Kinesiophobia across pain
Beard DJ, Carr AJ, et al. The use of the Oxford diagnoses and various countries. Pain. 2011;152:
Hip and Knee Scores. J Bone Joint Surg Br. 1090–5.
2007;89:1010–4. 97. Rompe JD, Overend TJ, et al. Validation of
84. Narin S, Unver B, et al. Cross-cultural adaptation, the Patient-rated Tennis Elbow Evaluation
reliability and validity of the Turkish version of the Questionnaire. J Hand Ther. 2007;20(1):3–10.
Hospital for Special Surgery (HSS) Knee Score. 98. Roos EM, Brandsson S, et al. Validation of the foot
Acta Orthop Traumatol Turc. 2014;48(3):241–8. and ankle outcome score for ankle ligament recon-
85. Nilsdotter AK, Lohmander LS, Klassbo M, Roos struction. Foot Ankle Int. 2001;22(10):788–94.
EM. Hip Disability and Osteoarthritis Outcome 99. Roos EM, Roos HP, Lohmander LS, Ekdahl C,
Score (HOOS): validity and responsiveness in Beynnon BD. Knee Injury and Osteoarthritis
total hip replacement. BMC Musculoskelet Disord. Outcome Score (KOOS): development of a self-
2003;4:10. administered outcome measure. J Orthop Sports
86. Nilsdotter A, Bremander A. Measures of hip func- Phys Ther. 1998;28:88–96.
tion and symptoms: Harris Hip Score (HHS), 100. Rowan K. The development and validation of a
Hip Disability and Osteoarthritis Outcome Score multi-dimensional measure of chronic foot pain:
(HOOS), Oxford Hip Score (OHS), Lequesne Index the Rowan Foot Pain Assessment Questionnaire
of Severity for Osteoarthritis of the Hip (LISOH), (ROFPAQ). Foot Ankle Int. 2001;22:795–809.
and American Academy of Orthopedic Surgeons 101. Scott J, Huskisson EC. Graphic representation of
(AAOS) Hip and Knee Questionnaire. Arthritis Care pain. Pain. 1976;2(2):175–84.
Res. 2011;63:S200–7. 102. Scuderi GR, Bourne RB, Noble PC, Benjamin
87. Nilsson, et al. The Swedish version of OMAS is a JB, Lonner JH, Scott WN. The New Knee Society
reliable and valid outcome measure for patients Knee Scoring System. Clin Orthop Relat Res.
with ankle fractures. BMC Musculoskelet Disord. 2012;470(1):3–19.
2013;14:109. 103. Seeger J, Weinmann S, et al. The Heidelberg Sports
88. Olerud C, Molander H. A scoring scale for symptom Activity Score - a new instrument to evaluate sports
evaluation after ankle fracture. Arch Orthop Trauma activity. Open Orthopaed J. 2013;7:25–32.
Surg. 1984;103:190–4. 104. SooHoo NF, Vyas R, et al. Responsiveness of the
89. Perruccio AV, Stefan Lohmander L, Canizares M, foot function index, AOFAS clinical rating systems,
Tennant A, Hawker GA, Conaghan PG, et al. The and SF-36 after foot and ankle surgery. Foot Ankle
development of a short measure of physical function Int. 2006;27(11):930–4.
for knee OA KOOS-Physical Function Shortform 105. Tegner Y, Lysholm J. Rating systems in the evalu-
(KOOS-PS): an OARSI/OMERACT initiative. ation of knee ligament injuries. Clin Orthop Relat
Osteoarthr Cartil. 2008;16:542–50. Res. 1985;(198):43–9.
90. Pinsker E, Daniels TR. AOFAS position statement 106. Thomas JL, Christensen JC, et al. ACFAS
regarding the future of the AOFAS clinical rating Scoring Scale user guide. J Foot Ankle Surg.
systems. Foot Ankle Int. 2011;32(9):841–2. 2005;44(5):316–35.
91. Ramisetty N, Kwon Y, et al. Patient-reported out- 107. Thorborg K, Holmich P, Christensen A, Petersen F,
come measures for hip preservation surgery: a sys- Roos EM. The Copenhagen Hip and Groin Outcome
tematic review of the literature. J Hip Preserv Surg. Score (HAGOS): development and validation
2015;2(1):15–27. according to the COSMIN checklist. Br J Sports
92. Richards RR, An KN, Bigliani LU, Friedman RJ, Med. 2011;45:478–91.
Gartsman GM, Gristina AG, et al. A standardized 108. van der Linde JA, van Kampen DA, et al. The Oxford
method for the assessment of shoulder function. J Shoulder Instability Score; validation in Dutch
Shoulder Elb Surg. 1994;3:347–52. and first-time assessment of its smallest detectable
93. Richardson J, Sinha K, et al. Modeling the utility of change. J Orthop Surg Res. 2015;10:146.
health states with the Assessment of Quality of Life 109. Walker N, et al. A preliminary development of the
(AQoL) 8D instrument: overview and utility scor- Re-Injury Anxiety Inventory (RIAI). Phys Ther
ing algorithm. Research paper, vol. 63. Melbourne: Sport. 2010;11(1):23–9.
470 A. Grassi et al.
110. Ware J Jr, Kosinski M, et al. A 12-item short-form individuals with acute lateral ankle sprains. Foot
health survey: construction of scales and prelimi- Ankle Int. 2003;24:274–82.
nary tests of reliability and validity. Med Care. 114. Zahiri CA, Schmalzried TP, et al. Assessing activ-
1996;34(3):220–33. ity in joint replacement patients. J Arthroplast.
111. Ware JE Jr, Sherbourne CD. The MOS 36-item short- 1998;13(8):890–5.
form health survey (SF-36). I. Conceptual frame- 115. Zampelis V, Ornstein E, et al. A simple visual ana-
work and item selection. Med Care. 1992;30(6): log scale for pain is as responsive as the WOMAC,
473–83. the SF-36, and the EQ-5D in measuring out-
112. Wendel-Vos GCW, Schuit AJ, et al. Reproducibility comes of revision hip arthroplasty. Acta Orthop.
and relative validity of the short questionnaire to 2014;85(2):128–32.
assess health-enhancing physical activity. J Clin 116. Zwiers R, Weel H, et al. Large variation in use of
Epidemiol. 2003;56:1163e9. patient-reported outcome measures: a survey of 188
113. Williams GN, Molloy JM, et al. Evaluation of the foot and ankle surgeons. Foot Ankle Surg. 2017. pii:
Sports Ankle Rating System in young, athletic S1268-7731(17)30050-4.
A Practical Guide to Writing (and
Understanding) a Scientific Paper:
47
Meta-Analyses
Alberto Grassi, Riccardo Compagnoni,
Kristian Samuelsson, Pietro Randelli, Corrado Bait,
and Stefano Zaffagnini
A. Grassi (*)
Dipartimento Scienze Biomediche e Neuromotorie Department of Orthopaedics, Sahlgrenska University
DIBINEM, Università di Bologna, Bologna, Italy Hospital, Mölndal, Sweden
IIa Clinica Ortopedica e Traumatologica, IRCCS P. Randelli
Istituto Ortopedico Rizzoli, Bologna, Italy 1st Department, Azienda Socio Sanitaria Territoriale
Centro Specialistico Ortopedico Traumatologico
SIGASCOT Arthroscopy Committee, Florence, Italy
Gaetano Pini-CTO, Milan, Italy
e-mail: alberto.grassi3@studio.unibo.it
C. Bait
R. Compagnoni
Società Italiana del Ginocchio Artroscopia Sport
Società Italiana del Ginocchio Artroscopia Sport
Cartilagine Tecnologie Ortopediche (SIGASCOT)
Cartilagine Tecnologie Ortopediche (SIGASCOT)
Arthroscopy Committee, Florence, Italy
Arthroscopy Committee, Florence, Italy
Istituto Clinico Villa Aprica, Como, Italy
1st Department, Azienda Socio Sanitaria Territoriale
Centro Specialistico Ortopedico Traumatologico S. Zaffagnini
Gaetano Pini-CTO, Milan, Italy IIa Clinica Ortopedica e Traumatologica, IRCCS
Istituto Ortopedico Rizzoli, Bologna, Italy
K. Samuelsson
Department of Orthopaedics, Institute of Clinical Laboratori di Biomeccanica e Innovazione
Sciences, The Sahlgrenska Academy, University of Tecnologica, Istituto Ortopedico Rizzoli,
Gothenburg, Gothenburg, Sweden Bologna, Italy
selected databases and the discarding of all the Table 47.1 Summary of the main items that should be
duplicates. To prepare a precise flow chart of extracted from each study included in a meta-analysis
study selection, all the numbers of studies found, Main items considered in data collection for a
removed and excluded should be noted and meta-analysis
Methods Study design and duration, sequence
justified.
generation, allocation sequence
The crucial step in the phase of study selection concealment, blinding, other concerns
is a clear and precise definition of inclusion and regarding bias, inclusion and
exclusion criteria [26]. Their goal is to create a exclusion criteria
homogeneous study population for the meta- Participants Total number, setting, diagnostic
criteria, age, gender, co-morbidities
analysis. The rationale for their choice should be Interventions Number of intervention groups,
stated, as it may not be apparent to the reader. specific interventions, intervention
Inclusion criteria may be based on study design, details
sample size and characteristics of the subject, Outcomes Outcomes and time points, definition,
unit, scale
type of treatment and follow-up. Examples of
Results Number of participants in each
exclusion criteria include studies not published in intervention group and for each
English or as full-length manuscripts, drop-out outcome, summary data for each
rate (usually >20%), doctoral theses and studies intervention group, eventual subgroup
published in non-peer-reviewed journals. analysis
Miscellaneous Funding source, key conclusion from
Moreover, the outcomes that should be analysed
the study authors
and the way they are presented could be a matter
of inclusion and exclusion criteria (e.g. radio-
graphic evaluation of osteoarthritis according case, the wider population or most recent report
only to the Kellgren-Lawrence scale). This search can be chosen. The main items considered in data
and data extraction (presented below) should be collection are presented in Table 47.1:
performed by two independent investigators; the If all the relevant information cannot be
results should therefore be compared and any dis- obtained from the full text, the study authors
agreement should be resolved by a third indepen- could be contacted to request the missing
dent investigator. Usually, the first screening of desired information. This step could be crucial
all the results is based on title and abstract evalu- if several parameters are missing in the outcome
ation. The full text of the potentially eligible report, since these data are fundamental for the
studies should then be obtained and carefully statistical calculation of the meta-analysis. For
evaluated. dichotomous outcomes, only the number of
patients that experience the outcome and the
total number of patients are needed. Moreover,
47.4.2 Extract All the Relevant categorical values (e.g. Kellgren-Lawrence
Information scale for knee osteoarthritis) should be analysed
as dichotomous variables, after defining a clini-
Two or more authors of a meta-analysis should cally meaningful cut-off value to determine two
abstract information from studies independently. groups. On the other hand, for continuous data,
“Data” is defined as any information about (or the mean values and especially the SDs are
deriving from) a study, including details of meth- needed as well. Since the SD is mandatory for
ods, participants, setting, context, interventions, effect-size calculation, if it is not reported in the
outcomes, results, publications and investigators original study, there are several artefacts to
[13]. Data should only be collected from separate approximate it from the known information,
sets of patients, and the authors should be careful such as standard error (SE), the range of values
to avoid studies that publish the same subjects or or the p-value [14] (Table 47.2). The last option
overlapping groups of subjects that appeared in is to impute the SD, borrowing the SD from a
different studies in duplicate publications. In this similar study included in the meta-analysis,
476 A. Grassi et al.
Table 47.2 Methods to derive the standard deviation from the data usually provided in a scientific paper
Obtain standard deviation from the available data
Obtaining standard deviation from the standard error
Standard deviation: Standard error ´ sample size
Obtaining standard deviation from the ranges
Standard deviation: (Upper range − lower range)/4
Obtaining standard deviation from the p-value
Step 1: from p-value to t-value
Excel formula: =tinv(P-value,degrees of freedom)
Degrees of freedom isThe number of patients in an experimental group + control group − 2
Step 2: from t-value to standard error
Standard error: (mean of group 1 − means of group 2)/t-value
Step 3: from standard error to standard deviation
Standard deviation: Standard error / (1 / patient of group one + 1 / group two )
using the highest or a “reasonably high” SD, or Community, which is available free of charge and
using the average SD. All imputation techniques which has been designed to facilitate preparation
involve making assumptions about unknown of protocols and full reviews, including text,
statistics introducing a source of bias, and it is characteristics of studies, comparison tables and
best to avoid using them wherever possible. If study data. Moreover, it can perform meta-analysis
most of the studies in a meta-analysis have of the data entered and present the results graphi-
missing standard deviations, these values should cally. Another valid free alternative is
not be imputed, and the meta-analysis cannot be OpenMetaAnalyst, which is a simple open-
performed. A narrative presentation of the source software with an Excel-like interface for
results is then preferable. On the other hand, performing meta-analyses of binary, continuous
when SDs are derived, a sensitivity analysis (see or diagnostic data, using a variety fixed- and
further section) is recommended to establish random-effect methods, including Bayesian and
whether the derived imprecision could affect the maximum likelihood analysis. It also enables to
result of the meta-analysis and the effect of a perform meta-regression analysis, meta-analysis
treatment. of proportions and continuous variables from
single-arm studies and cumulative, leave-one-out
or subgroup analyses. Another simple software is
47.5 How to Analyse MedCalc (MedCalc Software, Ostend, Belgium),
the Obtained Data which offers also the possibility to perform a
multitude of statistical tests and analysis, with a
47.5.1 Choose Appropriate variety of graphic representations. However, its
Statistical Software options for meta-analysis purposes are limited,
and moreover it is only available freely as a
The tabulation of data and the calculation of sim- sample.
ple medians, means and SDs are possible with
Microsoft Excel or the equivalent. However, to
perform an appropriate statistical analysis of a 47.5.2 Define Correct Effect-Size
meta-analysis, dedicated software is necessary. Measurement
One of the most frequently used is Review
Manager (RevMan, Copenhagen: The Nordic As previously stated, the outcomes could be pre-
Cochrane Centre, The Cochrane Collaboration, sented in two ways: dichotomous or continuous.
2014), the official software of the Cochrane Based on this, the effect size of the treatment
47 A Practical Guide to Writing (and Understanding) a Scientific Paper: Meta-Analyses 477
should be presented properly, using one or more ratio in the context of a typical control group risk.
of the following parameters [5, 17]. Attention should be paid to not misinterpreting
The risk ratio (RR): also defined as the relative RR and OR.
risk, is the ratio of the risk of an event in the two The risk difference (RD): is defined as the dif-
groups. It is used for dichotomous outcomes and ference between the observed risks (proportions
ranges from 0 to infinity. The RR describes the of individuals with the outcome of interest)
multiplication of the risk that occurs using the between the two groups of experimental and
experimental intervention. For example, an RR control treatment. It is used for dichotomous out-
of 5 for a treatment implies that events with treat- comes and could be comprised between −1 and
ment are five times more likely than events with +1 (which means that it could be easily con-
the control treatment. Alternatively, we can say verted to per cent by multiplying it by 100). Like
that the experimental treatment increases the risk the RR, the clinical importance of an RD may
of events by 100 × (RR − 1)% = 400%. depend on the underlying risk of events in the
Conversely, an RR of 0.20 is interpreted as the control treatment, since an RD of 0.05 (or 5%)
probability of an event with experimental treat- may represent a small, clinically insignificant
ment being a fifth of that with control treatment. change from a risk of 55 to 60% or a proportion-
Alternatively, this reduction could be expressed ally much larger and potentially important
using the relative risk reduction (RRR) (see change from 1 to 6%.
below). The interpretation of the clinical impor- The number needed to treat (NNT): is defined
tance of a given RR should be made considering as the expected number of people who need to
the typical risk of events with the control treat- receive the experimental treatment to obtain one
ment, since an RR of 5 could correspond to a additional person either incurring or avoiding the
clinically important increase in events from 10 to considered event. It is used for dichotomous out-
50%, or a small, less clinically important increase comes and is always a positive number, usually
from 1 to 5%. rounded up to the nearest whole number. It is
The relative risk reduction (RRR): is an alter- derived from the absolute value of |RD| and is
native way of re-expressing the RR as a percent- calculated as 1/|RD|. An RD of 0.23 therefore
age of the reduction of the relative risk after the corresponds to an NNT of 4.34, which is rounded
experimental treatment compared with the con- up to 5, and it means that “it is expected that one
trol treatment. For example, an RR of 0.20 is additional (or less) patient will incur the event for
interpreted as the probability of an event with every five participants receiving the experimental
experimental treatment being a fifth of that with intervention rather than control”. Since the NNT
control treatment. Since the RRR is calculated as gives an “expected value”, it does not imply that
100 × (1 − RR)%, we can therefore say that the one additional event will necessarily occur in
experimental treatment reduces the risk of events every group of “n” patients treated with the
by 80%. experimental procedure.
The odds ratio (OR): considering “odds” as The mean difference (MD): is defined as the
the ratio between the probability that a particular absolute difference between the mean value of
event will occur and the probability that it will the experimental and control group. It is used for
not occur, the OR is the ratio of the odds of an continuous outcome measured with the same
event in the two groups. It is used for dichoto- scale and estimates the amount by which the
mous outcomes and ranges from 1 to infinity. The experimental intervention changes the outcome
OR is more difficult to interpret, as it describes on average compared with the control.
the multiplication of the odds of the outcome that The standardised mean difference (SMD): is
occur using the experimental intervention. To defined as the size of the intervention effect in each
understand what an OR means in terms of study relative to the variability observed in the
changes in the numbers of events, it is simplest to study. Since it is used when the same outcome is
convert to it into an RR and then interpret the risk measured with different scales (e.g. knee function
478 A. Grassi et al.
measured with Lysholm, subjective IKDC or no empirically developed cut-off points to deter-
KOOS scores), the results should be statistically mine when there is too much heterogeneity to
standardised to a uniform scale before being com- perform a meta-analysis, and it is left to the
bined. Care should be taken when the scales have a author’s discretion to determine whether a meta-
different “direction” (e.g. a scale increase while the analysis is appropriate, based on the results of the
others decrease with disease severity); in this case, test of heterogeneity and clinical judgement.
multiplying for −1 should be used when needed, to
ensure that all the scales point in same direction.
47.5.4 Apply Strategies to Address
High Heterogeneity:
47.5.3 Identify and Measure The Random Effect and Others
Heterogeneity
Since high heterogeneity implies dissimilarity in
Heterogeneity is a term used to describe the vari- the studies, a meta-analysis should be conducted
ability of studies. Variability in the studied par- with caution, and several strategies should be
ticipants (e.g. males or females, old adult or implemented to consider this situation. These
adolescent patients), interventions (open or mini- should be applied after checking whether data are
mally invasive procedures, different grafts for correct and no errors have been made in data
reconstructions) and outcomes (objective or sub- extraction [4, 5].
jective) may be described as clinical heterogene- Perform random-effect meta-analysis: the two
ity, variability in study design and risk of bias most frequently used models to conduct a meta-
may be described as methodological heterogene- analysis are the fixed- (Mantel-Haenszel, inverse
ity, while the variability in the treatment effects variance or Peto methods) [24] and random-
in the different studies is known as statistical het- effect (DerSimonian and Laird method) [6] mod-
erogeneity [17, 21]. The last one is usually a con- els. The fixed-effect model investigates the
sequence of clinical or methodological diversity, question “What is the best estimate of the popula-
or both, between the studies. Studies with meth- tion effect size?”, assuming a common treatment
odological flaws and small studies may overesti- effect and that the differences between studies
mate treatment effects and can contribute to are due to chance. This model should be used
statistical heterogeneity. The statistical heteroge- with low heterogeneity, as it gives greater weight
neity should therefore be examined and quanti- to larger studies. On the other hand, the rando-
fied using statistical tests, to implement measures meffect model investigates the question “What
to reduce the risk of bias [4]. The chi-square test is the average treatment effect?”, assuming the
(χ2) assesses whether observed differences in distribution of the treatment effect along a range
results are compatible with chance alone: a low of values. This model should be used with high
p-value provides evidence of heterogeneity in heterogeneity that cannot be explained, as it
intervention effects. Since this measurement did assumes that the treatment effect in the different
not provide the “amount” of heterogeneity, quan- studies is not identical due to clinical and meth-
tification according to the Higgins I2 statistic odological heterogeneity, and this model there-
should be performed. This test produces a fore gives less weight to larger studies. If the
0–100% value that represents the percentage of heterogeneity is not extreme, they frequently lead
total variation across studies due to heterogene- to similar results. On the other hand, they can
ity. According to the Cochrane Guidelines, this produce different conclusions, and the use of
value is interpreted as follows: 0–40% heteroge- fixed- or random-effect models should therefore
neity might be not important, 30–60% may repre- be carefully considered based on the amount of
sent moderate heterogeneity, 50–90% may heterogeneity, despite no guidelines existing in
represent substantial heterogeneity and 75–100% this direction (random effect is usually used with
considerable heterogeneity. However, there are I2 > 50%) (Fig. 47.1).
a Surgical Conservative Risk Ratio Risk Ratio
Study or Subgroup Events Total Events Total Weight M-H, Fixed, 95% CI M-H, Fixed, 95% CI
Ampollini et al. 86 800 117 800 58.8% 0.74 [0.57, 0.95]
Bait et al. 11 20 5 20 2.5% 2.20 [0.93, 5.18]
FIXED
Carulli et al. 2 15 6 15 3.0% 0.33 [0.08, 1.39]
Compagnoni et al. 5 33 6 33 3.0% 0.83 [0.28, 2.46]
Ferrua et al. 7 45 3 45 1.5% 2.33 [0.64, 8.46]
Fravisini et al. 14 51 8 51 4.0% 1.75 [0.80, 3.81]
Grassi et al. 4 23 6 23 3.0% 0.67 [0.22, 2.05]
Mazzitelli et al. 21 250 45 250 22.6% 0.47 [0.29, 0.76]
Simonetta et al. 2 11 3 11 1.5% 0.67 [0.14, 3.24]
Fig. 47.1 Example of forest plots evaluating the risk ratio of treatment failure between circle) (a). If a random-effect method is used to account for the high heterogeneity, the
the conservative and surgical treatment for a fictitious pathology. If a fixed-effect final result changes dramatically: the largest studies and smallest studies are given less
method was used, larger studies are represented with the largest squares (red circle) weight (red dotted line and circle) and more weight (purple dotted line and circle),
proportional to their weight (red line); similarly, the smallest studies have small squares respectively, and this produces an enlargement of confidence intervals for the overall
(purple circle) and weight (purple line). In this specific case, the diamond of the overall effect, which crosses the central line containing the null value (RR = 1). So, when
effect (blue circle) crosses the central line and its confidence intervals do not contains random effect was used, we can affirm that there is no evidence of a significant effect
the null value (RR = 1); it could therefore be assumed that there is a significant reduc- of surgical treatment compared with conservative treatment in reducing failures
479
tion in failure risk after surgical treatment (P.0.007). However, the results should be (P = 0.57) (b)
interpreted with extreme caution due to the high and significant heterogeneity (green
480 A. Grassi et al.
1.1.2 Minimally-Invasive
Fravisini et al. 9 250 8 250 13.3% 1.13 [0.44, 2.87]
Grassi et al. 6 40 6 40 10.0% 1.00 [0.35, 2.84]
Mazzitelli et al. 6 45 7 45 11.7% 0.86 [0.31, 2.35]
Simonetta et al. 6 18 5 18 8.3% 1.20 [0.45, 3.23]
Subtotal (95% CI) 353 353 43.3% 1.04 [0.63, 1.71]
Total events 27 26
Heterogeneity: Chi2 = 0.25, df = 3 (P = 0.97); I2 = 0%
Test for overall effect: Z = 0.16 (P = 0.88)
Fig. 47.2 Example of forest plot evaluating the risk ratio group analysis is performed separating open and mini-
of complications after the conservative or surgical treat- mally invasive surgery, the results differ from the main
ment for a fictitious pathology (a). In this case, it appears analysis (b). In the case of open surgery, the final overall
to be correct to use a fixed-effect model since the hetero- effect is similar to the main analysis (blue line and circle);
geneity is null (green circle); this is confirmed by the con- in the case of minimally invasive surgery, the confidence
cordance of the effect size of most of the studies, which intervals of the overall effect (dotted blue line and circle)
lies on the right side of the forest plot. Since the confi- contain the null value (RR = 1), suggesting that there is no
dence intervals do not contain the null value (RR = 1) (red evidence of a difference in complications after minimally
circle and line), it could be suggested that there is evi- invasive surgery or conservative treatment for the ficti-
dence of an increased risk of complications after surgical tious pathology
treatment for the fictitious pathology. However, if a sub-
Perform a subgroup analysis: subgroup anal- treatment within the subgroups, thereby explain-
yses may be conducted as a means of investigat- ing the heterogeneity to some extent. However,
ing heterogeneous results or to answer specific the subgroup analysis should be limited only to
questions about patient groups, types of inter- restricted cases, since it could increase the risk
vention or types of study (Fig. 47.2). They can of type-II error due to the reduction of patient
be performed for a subset of participants or a cohort size.
subset of studies, and they consider the meta- Perform a meta-regression: meta-regression
analysis results from each group separately. The is an extension of subgroup analyses that allows
non-overlap of the CI usually indicates statisti- the investigation of the effect of continuous or
cal significance and a different effect of the categorical characteristics simultaneously. Its
47 A Practical Guide to Writing (and Understanding) a Scientific Paper: Meta-Analyses 481
role is like that of simple regression, where the subgroup analysis, the sensitivity analysis is
outcome variable is the effect estimate (RR, OR, not designed to estimate the effect of the inter-
MD), and the explanatory variables, or covari- vention in the group excluded from the analy-
ates, are the study characteristics that might sis, so its report should be produced with a
influence the size of the intervention effect. The summary table.
regression coefficient obtained from a meta- Change the measurement of effect: the choice
regression will describe how the outcome vari- of the measurements of effect size may affect the
able changes with a unit increase in the covariate, degree of heterogeneity; however, it is unclear
while its statistical significance describes whether the heterogeneity of intervention effect
whether there is a linear relationship between alone is a suitable criterion for choosing between
them. It should be underlined that the character- the different measurements.
istics to investigate (covariates) should be justi- Exclude studies: since heterogeneity could be
fied by biological and clinical hypotheses and due to the presence of one or two outliers, studies
should be the lowest number possible. One limi- with conflicting results compared with the rest
tation of the meta-regression is that more than could be excluded, as their exclusion could
ten studies in the meta-analysis are generally address the problem of heterogeneity. However,
required for its use. is not appropriate to exclude a study based on its
Perform a sensitivity analysis: while the aim result since it may introduce a bias. It can there-
of the subgroup analysis is to estimate a treat- fore only be removed with confidence if there are
ment effect for a particular subgroup, the aim obvious reasons. Unfortunately, there are no tests
of the sensitivity analysis is to investigate to determine the extent of clinical heterogeneity,
whether the meta-analysis findings change and researchers must decide whether the studies
based on different arbitrary or unclear deci- contributing to a meta-analysis are similar
sions related to the meta-analysis process. The enough clinically to make meta-analysis feasible.
main decisions that can generate the need for a Refining inclusion criteria and excluding studies,
sensitivity analysis could be related to the eligi- even if this reduces heterogeneity, also decreases
bility criteria of the studies (e.g. study design or the total number of articles included on a topic. A
methodological issues), the data analysed (e.g. sensitivity analysis is suggested to check whether
imputation of missing SD) or analysis methods the excluded study/studies could alter the meta-
(fixed or random effect, choice of effect-size analysis results.
measurement). In practical terms, the sensitiv- Do not perform a meta-analysis: if high het-
ity analysis consists of the repetition of the erogeneity cannot be addressed using the pre-
meta-analysis, excluding the studies burdened sented strategies, the investigator should consider
by unclear or arbitrary decisions and in the whether the amount of heterogeneity is so large
informal comparison of the different ways the that the results of the meta-analysis are problem-
same thing is estimated. After the sensitivity atic. In this case, especially when there is incon-
analysis, when the overall conclusions are not sistency in the direction of the treatment effects
affected by the different decisions made during that could make the use of an average value mis-
the review process, more certainty can be leading, meta-analysis should be abandoned, and
assumed. On the other hand, if decisions are the evidence should be fairly expressed in a sys-
identified as influencing the findings, the results tematic review. Another and frequent reason to
must be interpreted with an appropriate degree avoid meta-analytic pooling of data is when too
of caution if it is not possible to improve the few studies with no new findings are obtained
process (Fig. 47.3). As different from the after the systematic search.
a Surgical Conservative Risk Ratio Risk Ratio
482
Study or Subgroup Events Total Events Total Weight M-H, Random, 95% CI M-H, Random, 95% CI
Ampollini et al. 86 800 117 800 22.0% 0.74 [0.57, 0.95]
Bait et al. 11 20 5 20 11.6% 2.20 [0.93, 5.18]
Carulli et al. 2 15 6 15 6.0% 0.33 [0.08, 1.39]
Compagnoni et al. 5 33 6 33 8.9% 0.83 [0.28, 2.46]
Ferrua et al. 7 45 3 45 7.0% 2.33 [0.64, 8.46]
Fravisini et al. 14 51 8 51 12.8% 1.75 [0.80, 3.81]
Grassi et al. 4 23 6 23 8.5% 0.67 [0.22, 2.05]
Mazzitelli et al. 21 250 45 250 17.9% 0.47 [0.29, 0.76]
Simonetta et al. 2 11 3 11 5.2% 0.67 [0.14, 3.24]
b
Total (95% CI) 1177 1177 100.0% 0.67 [0.50, 0.90]
Total events 127 186
Heterogeneity: Tau2 = 0.03; Chi2 = 7.24, df = 6 (P = 0.30), I2 = 17%
Test for overall effect: Z = 2.64 (P = 0.008) 0.02 0.1 1 10 50
Favours [Surgical] Favours [Conservative]
Fig. 47.3 Example of forest plots evaluating the risk ratio of treatment failure methodology and a high risk of bias have been found (Bait et al. and Fravisini
between conservative and surgical treatment for a fictitious pathology (a). After a et al.) and excluded through a sensitivity analysis (b). The results of the sensitivity
random-effect meta-analysis due to the high heterogeneity (green circle), the final analysis show a reduction in heterogeneity (green dotted circle) and the narrowing
result is that there is no evidence of effect of surgical treatment in reducing the of the confidence intervals of the overall effect (blue dotted line and circle) that no
risk of failures, since the confidence intervals of the overall effect contain the null longer contain the null value (RR = 1). In this case, the results of the meta-analysis
value (RR = 1). However, after risk-of-bias assessment, two studies with poor should be interpreted with extreme caution since methodological issues and
biases were able to affect the entity of the overall effect of treatment
A. Grassi et al.
47 A Practical Guide to Writing (and Understanding) a Scientific Paper: Meta-Analyses 483
47.6 H
ow to Present and Evaluate have little knowledge of the effect. The CI width
the Results for an individual study depends on the sample size,
SD (for continuous outcomes) and risk of the event
47.6.1 Prepare the Forest-Plot (dichotomous outcomes). When the CI crosses the
Graphic for the Main central line (indicating an MD or SMD of 0 and an
Outcomes OR or RR of 1), it is possible that the experimental
or control treatment has the same effect on the eval-
The result section of a meta-analysis should sum- uated outcome. If the effect size of most of the
marise the findings in a clear, logical order, explic- included studies lies on the same side of the
itly addressing the objective of the review [11, 30]. graphic, thus indicating a similar effect of the treat-
The characteristics of methods, participants, inter- ment, the overall heterogeneity is usually low.
vention and outcomes should be reported in a narra- The overall effect size of the meta-analysis is
tive manner or with reference to tables [31]. On the represented by a “diamond”. Its position indicates
other hand, the data analysis is better presented the value of the effect size, while its width indi-
through the so-called “forest-plot” graphic; this is a cates the CI. This width depends on the precision
simple, immediate and visually friendly method for of the individual study estimates, the number
describing the raw data, estimate and CI of the cho- of studies combined and the heterogeneity (in
sen effect measurement, the choice between fixed- random-effect models, precision will decline
or random-effect meta-analysis, the heterogeneity, with increasing heterogeneity). When the 95% CI
the weight of each study and a test for the overall for the effect of the meta-analysis does not cross
effect (Fig. 47.1). Forest plots should not be used the central line, it excludes the null value (MD or
when an outcome has only been investigated in a SMD of 0 and OR or RR of 1), and the p-value of
single study. the overall meta-analysis will therefore be <0.05.
The measurement of the effect of each study In this case, we can affirm that the observed effect
included in the meta-analysis and in the forest plot is very unlikely to have arisen purely by chance
is represented by a square, with the dimension pro- and, as a result, there are differences in the effect
portional to its weight (based on sample size and of experimental and control interventions.
the choice of a fixed- or random-effect model) and The forest plot, in a certain study design,
a horizontal line corresponding to its CI. Since the could present the effect size of continuous or
CI describes a range of values within which we dichotomous outcomes from single-arm case
can be reasonably sure that the true effect lies, a series; in this case, we only have the estimation
narrow CI indicates an effect size that is known of pooled outcomes, without the comparison
precisely, while a very wide CI indicates that we between two treatments (Fig. 47.4).
80 85 90 95 100
Fig. 47.4 Example of a forest plot of a continuous out- treatment is evaluated in the included studies; the main
come from single-arm case series. In this case, no com- result is therefore the mean of the considered score or out-
parison between treatments is made, since a single come, based on the weight of each study
484 A. Grassi et al.
Table 47.4 Modified Coleman Score used for orthopaedic case series (adapted from: Magnussen RA, Carey JL,
Spindler KP. Does autograft choice determine intermediate-term outcome of ACL reconstruction? Knee Surg Sports
Traumatol Arthrosc. 2011;19(3):462–472)
Modified Coleman Methodology Score
Outcome Option Points
Part A: Only one score to be given for each section (total = 60)
1. Study size: number of patients >120 10
81–120 7
40–80 4
<40 or not stated 0
2. Mean follow-up (years) >6 years 5
3–6 years 3
<3 years, not stated, unclear 0
3. Percentage of patients with follow-up >90% 5
80–90% 3
<80% 0
4. No. of interventions per group or separate outcomes One procedure 10
should be reported More than one procedure but consistent 5
Among all patients in each group
Unclear or multiple interventions 0
Among patients in the same group
5. Type of study Randomised controlled trial 15
Prospective cohort study 10
Retrospective cohort study 5
6. Diagnostic certainty In all 5
In >80% 3
In <80%, not stated, unclear 0
7. Description of surgical technique Technique stated with details 5
Technique named without elaboration 3
Not stated, unclear 0
8. Description of postoperative rehabilitation Well described 5
Described without complete detail 3
Protocol not reported 0
Part B: Scores could be given for each option in each of the three sections (total = 40)
1. Outcome criteria Outcome measurements clearly defined 2
Timing of outcome assessment clear 2
Use of outcome with good reliability 3
Use of outcome with good sensitivity 3
2. Procedure for assessing outcomes Subjects recruited 5
Independent investigators 4
Written assessment 3
Patient-centred data collected 3
3. Description of subject selection process Selection criteria reported and unbiased 5
Recruitment rate reported >80% 5
Eligible subjects not included in the 5
Study satisfactorily accounted for
The total score (max = 100) of the evaluated study is calculated as the sum of Parts A and B
486 A. Grassi et al.
Table 47.5 Modified Newcastle-Ottawa Scale for non-randomised studies (adapted from: http://www.uphs.upenn.
edu/cep/methods/Modified%20Newcastle-Ottawa.pdf)
Question Y\N
Study population:
1. All study groups derived from similar source/reference populations?
2. Attrition not significantly different across study groups?
Study validity:
3. The measurement of exposure is valid?
4. The measurement of outcome is valid?
5. Investigators blinded to end-point assessment?
Confounders:
6. Potential confounders identified (e.g. co-morbidities)
7. Statistical adjustment for potential confounders made?
8. Funding source(s) disclosed and no obvious conflict of interests?
The scale does not require a calculation of a numeric score but can be used to map the study characteristics visually
Table 47.6 Reporting quality scale for randomised controlled trials based on the Consolidated Standards for Reporting
Trials (CONSORT) guidelines (adapted from: Huwiler-Müntener K, Jüni P, Junker C, Egger M. Quality of reporting of
randomized trials as a measure of methodologic quality. JAMA. 2002 Jun 5;287(21):2801–4)
Reporting quality scale based on 1996 CONSORT statement
Question Y\N
1. Does the title identify the study as a randomised controlled trial?
2. Is the abstract presented in structured format?
3. Are objectives stated?
4. Is hypothesis stated?
5. Is the study population described?
6. Are the inclusion and exclusion criteria described?
7. Are the interventions described?
8. Are the outcome measurements described?
9. Is a primary outcome specified?
10. Is a minimum (clinically?) important difference for the primary outcome reported?
11. Are power calculations described?
12. Is the rationale for the statistical analyses explained?
13. Are the methods for statistical analyses described?
14. Are stopping rules described?
15. Is the unit of randomisation described?
16. Is the method used to generate the allocation schedule described?
17. Is the method of allocation concealment described?
18. Is the timing of assignment described?
19. Is the method for separating those generating the allocation sequence from those assigning participants
to groups described?
20. Are the mechanisms of blinding described?
21. Is the number of eligible patients reported?
22. Is the number of randomised patients reported for each comparison group?
23. Are prognostic variables by treatment and control group described?
24. Is the number of patients receiving an intervention as allocated reported for each group?
25. Is the number of patients analysed reported for each comparison group?
26. Are withdrawals and drop-outs described for each comparison group?
27. Are protocol deviations described for each comparison group?
28. Is the estimated effect of intervention on primary and secondary outcomes stated, including
measurements of precision?
29. Are the results stated in absolute numbers?
30. Are summary data and inferential statistics presented in sufficient detail to permit alternative analyses
and replication?
The total score (max = 30) of the evaluated study is calculated as the sum of the answer “Yes”
47 A Practical Guide to Writing (and Understanding) a Scientific Paper: Meta-Analyses 487
Table 47.7 Strategies to identify high, low or unclear risk of bias for the main domains of the “Cochrane Risk of Bias
Tool” (adapted from: Table 8.5c of Higgins JPT, Altman DG (editors). Chapter 8: Assessing risk of bias in included
studies. In: Higgins JPT, Green S (editors). Cochrane Handbook for Systematic Reviews of Interventions. Chichester
(UK): John Wiley & Sons, 2008)
Risk-of-bias assessment according to the “Cochrane Risk of Bias Tool”
Selection Random sequence generation: method used to generate the allocation sequence in sufficient detail to
allow an assessment of whether it should produce comparable groups
Low risk Randomisation with random number table, coin toss, computer
generator, dice…
High risk Randomisation with date of birth, admission day, clinic record
number, judgement of clinician…
Unclear risk Insufficient information on randomisation process
Allocation concealment: method used to conceal the allocation sequence to avoid intervention
allocations being foreseen in advance or during enrolment
Low risk Participants could not foresee assignation due to central
allocation, identical drug containers, opaque sealed envelopes
High risk Participants could foresee assignation due to open allocation
schedule, alternation, unsealed envelopes
Unclear risk Insufficient information or concealment not described
Performance Blinding of participants and personnel: measures used to blind study participants and personnel from
knowing the intervention received
Low risk Blinding or use of outcomes not influenced by lack of blinding
High risk No or incomplete blinding that could influence outcomes
Unclear risk Insufficient information to permit judgement
Detection Blinding of outcome assessment: measures used to blind outcome assessors from knowledge of the
intervention a participant received
Low risk Blinding or use of outcomes not influenced by lack of blinding
High risk No or incomplete blinding that could influence outcomes
Unclear risk Insufficient information to permit judgement
Attrition Incomplete outcome data: completeness of outcome data for each main outcome, including attrition
and exclusion from the analysis
Low risk No missing outcome data, or missing data balanced between
groups and not related to outcomes
High risk Reason for missing data related to outcomes, or imbalance
between intervention group
Unclear risk Insufficient reporting of exclusions to permit judgement
Reporting Selective reporting: possibility of selective outcome reporting
Low risk All the prespecified outcomes of interest have been reported
High risk Not all prespecified outcomes have been reported, or outcomes
reported incompletely, or lack of key outcomes
Unclear risk Insufficient information to permit judgement
Other Other bias: any important concerns about bias not addressed in the other domains in the tool
Low risk The study appears to be free from other sources of bias
High risk Bias related to specific study design, or fraudulent, or other
problems
Unclear risk Insufficient information to assess whether other biases exist
and any re-inclusions in analyses performed • Other (other bias): state any important con-
by the review authors. cerns about bias not addressed in the other
• Selective reporting (reporting bias): state how domains in the tool.
the possibility of selective outcome reporting
According to the quality and methodology of
was examined by the review authors and what
the study, each domain should be rated
was found.
(Table 47.7). The overall risk of bias should
488 A. Grassi et al.
Other bias
are determined
• Reporting bias: the systematic differ-
ence between reported and unreported
Ampollini et al. + + + – +
findings
Bait et al. + + + + – +
Carulli et al. – – – – + + –
Other bias
Fig. 47.6 In these risk-of-bias graphs, it is possible to see to be the bias with a lower risk, while the “performance
a visual presentation of the recurrent bias based on each bias” and the “detection bias” appear to be those with a
domain. In this specific case, the “reporting bias” appears higher risk
a 0 b 0
0.05 0.05
SE(RD)
SE(RD)
0.1 0.1
0.15 0.15
0.2 0.2
–1 –0.5 0 0.5 1 –1 –0.5 0 –0.5 1
RD RD
Fig. 47.7 In a funnel plot where the risk of bias is low, tion of small studies without statistical significance, the
the symmetrical shape of an inverted funnel is seen (a). funnel plot results are asymmetrical, with a gap in the bot-
When the publication bias tends to exclude the publica- tom corner (bottom left corner, in this case) (b)
• Language bias: when the publication of simple scatter plot of the intervention effect esti-
research findings in a language other than mates from individual studies against some mea-
English are sometimes be regarded as of sec- surement of each study’s size of precision; the
ondary importance, while studies publishing effect estimates are plotted on the horizontal
positive results might also be more likely to scale and the measurement of study size on the
publish in English vertical axis (Fig. 47.7). As effect estimates from
small studies scatter more widely at the bottom of
Outcome reporting bias: when the selective the graph and those of larger studies are scattered
reporting of some outcomes but not others more narrowly at the top of the graph, the plot
depends on the type of results found, if they are should assume the shape of a symmetrical
positive or negative or if they introduce new or inverted funnel. If, due to publication bias,
repetitive findings. smaller studies without statistical significance
One practical way to detect reporting bias is remain unpublished, the plot will be asymmetri-
the use of the funnel plot graph [35]. This is a cal with a gap in a bottom corner. In this case, the
490 A. Grassi et al.
meta-analysis will tend to overestimate the inter- For this purpose, “summary of findings”
vention effect. Apart from publication bias, tables could be useful, since it presents the main
asymmetry of the funnel plot could also be due to findings in a simple format, providing key infor-
poor methodological quality, true heterogeneity, mation on the quality of evidence, the magnitude
artefactual or chance. Funnel plot asymmetry of the effect of the interventions and the sum of
should only be used if at least ten studies are available data on the main outcomes [31]. Six
included in the meta-analysis, when all the stud- elements should be reported: a list of all impor-
ies do not have similar sizes. tant outcomes, a measurement of the typical bur-
den of these outcomes, the absolute and relative
magnitude of effect, numbers of participants and
47.6.3 Correctly Approach studies addressing these outcomes, a rating of the
and Evaluate Non-randomised overall quality of evidence for each outcome and
Studies a space for comments. Special mention should be
made of the quality of evidence, which is assessed
Finally, a few words should be devoted to the through the Grades of Recommendation,
meta-analysis of non-randomised controlled Assessment, Development and Evaluation
studies. Pooling together the results of non- (GRADE) tool [1]. It describes the body of evi-
randomised studies could be appropriate when dence as “High”, “Moderate”, “Low” or “Very
they have a large effect; however, combining Low”, based on the methodological quality,
RCT and non-randomised studies is not recom- directness of evidence, heterogeneity, precision
mended, as their results should be expected to of effect estimates and risk of publication bias.
differ systematically, resulting in increased het-
erogeneity [29].
Meta-analyses of non-randomised studies 47.7.2 Pay Attention: What Is
have greater potential bias, and their results “Statistically Significant” Is
should therefore be interpreted with caution. In Also “Clinically Significant”
fact, serious concerns could be related especially
to the differences between people in different When numerical results are going to be inter-
intervention groups (selection bias), caused by preted, attention should be paid to the 95% CI,
the lack of randomisation. because, if it is narrow, the effect size is known
If both RCTs and non-randomised studies of an precisely, while, if it is wider, the uncertainty is
intervention are available and the author also greater. The CI and the p-value of the meta-
wants to include a non-randomised study due to analysis are strictly linked, as a value of <0.05
the small number of RCTs, they should be pre- will exclude the null value (OR, RR of 1 or MD,
sented separately, or the findings of the non- SMD of 0) from the interval between the CIs,
randomised studies should be discussed in the thus suggesting that the experimental treatment
final discussion with the meta-analysis findings. has an effect compared with the control
treatment.
However, even if the findings are statistically
47.7 How to Interpret Your significant, the clinical meaning of the benefit of
Findings Critically the experimental treatment should be accurately
weighted. When the treatment effect is measured
47.7.1 Summarise Your Main with RR or RD, an interpretation of the clinical
Findings importance cannot be made without knowledge
of the typical risk of events without treatment. In
After the results have been correctly and clearly fact, a risk ratio of 0.75, for example, could cor-
reported and methodology and bias adequately respond to a clinically important reduction in
evaluated, the main findings of the meta-analysis events from 80 to 60% or a small, less clinically
can be critically interpreted. important reduction from 4 to 3%. Conversely,
47 A Practical Guide to Writing (and Understanding) a Scientific Paper: Meta-Analyses 491
when dealing with continuous scales and mean with single-bundle demonstrated in a Level I
differences, the proper minimum clinically RCT has been questioned, since a difference of
important difference (MCID) of the considered less than 1 mm in an arthrometric evaluation and
outcome should be considered. Since the MCID around two points in a subjective IKDC were not
represents the smallest change in a treatment out- considered clinically meaningful, despite being
come that a patient would identify as important, statistically significant [23].
it is possible that a mean difference, despite being Furthermore, conclusions should not be drawn
statistically significant, could be irrelevant from a too quickly without performing an accurate eval-
clinical point of view (e.g. an MD of 4 points in uation of heterogeneity through subgroup or sen-
the subjective IKDC, where the MCID is 11.5 sitivity analysis. For example, Soroceanu et al.
points) [3, 8] (Table 47.8). Moreover, the mini- [33] reported a relative risk of re-rupture of 0.4 in
mum detectable change (MDC), which is the favour of surgical repair compared with conser-
minimum amount of change in a patient’s score vative treatment in the case of Achilles tendon
that ensures the change is not the result of mea- rupture. However, since the authors found a not
surement error, should be considered [3]. negligible heterogeneity of 35%, they identified
Recently, in a JBJS commentary, the superiority the item of “functional rehabilitation” as a cause
of double-bundle ACL reconstruction compared of heterogeneity through meta-regression. So,
Table 47.8 The minimum detectable change (MDC) and minimum clinically important difference (MCID) of the
main clinical scores used in knee surgery (adapted from: Collins NJ1, Misra D, Felson DT, Crossley KM, Roos
EM. Measures of knee function: International Knee Documentation Committee (IKDC) Subjective Knee Evaluation
Form, Knee Injury and Osteoarthritis Outcome Score (KOOS), Knee Injury and Osteoarthritis Outcome Score Physical
Function Short Form (KOOS-PS), Knee Outcome Survey Activities of Daily Living Scale (KOS-ADL), Lysholm Knee
Scoring Scale, Oxford Knee Score (OKS), Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC),
Activity Rating Scale (ARS) and Tegner Activity Score (TAS). Arthritis Care Res (Hoboken). 2011 Nov;63 Suppl
11:S208–28. doi: https://doi.org/10.1002/acr.20632)
The clinical scores most used for knee evaluation
Score Condition MDC MCID
Subjective IKDC Injuries 8.8–15.6 6.3 (6 m) to 16.7 (12 m)
Subjective IKDC Mixed pathologies 6.7 11.5 (sensitive) to 20.5
(specific)
KOOS pain Injuries 6–6.1 –
KOOS symptoms Injuries 5–8.5 –
KOOS ADL Injuries 7–8 –
KOOS sport/rec Injuries 5.8–12 –
KOOS Qol Injuries 7–7.2 –
KOOS pain OA 13.4 –
KOOS symptoms OA 15.5 –
KOOS ADL OA 15.4 –
KOOS sport/rec OA 19.6 –
KOOS Qol OA 21.1 –
Lysholm Injuries 8.9–10.1 –
Lysholm Mixed pathologies – –
Oxford Knee Scale OA 6.1 –
WOMAC pain OA 14.4–16.2 22.87 (TKR 6 m) to 27.98
(TKR 24 m)
WOMAC symptoms OA 22.9–30.6 14.43 (TKR 6 m) to 21.35
(TKR 24 m)
WOMAC function OA 10.6–15 19.01 (TKR 6 m) to 20.84
(TKR 24 m)
Tegner Activity Scale Injuries 1 –
Tegner Activity Scale OA – –
MDC minimum detectable change, MCID minimum clinically important difference
492 A. Grassi et al.
after performing a subgroup analysis separating When results are instead counter-intuitive,
patients undergoing functional or conventional clinical judgement based on experience, educa-
rehabilitation, they found no difference in the re- tion and current practices will be needed to deci-
rupture rate between surgical treatment and con- pher the unexpected results. The decision to
servative treatment with functional rehabilitation. determine whether to accept the findings or ques-
On the other hand, as Foster et al. [7] intended to tion the statistical technique could be taken after
include as many data as possible in their meta- looking back at the original articles, reassessing
analysis of irradiated vs. nonirradiated allografts their inclusion and evaluating whether assump-
for ACL reconstruction, they performed a sensi- tions about the original research question are not
tivity analysis to evaluate whether imputing SD lost when the studies are combined.
would have influenced the final results. After per-
forming an analysis of only the studies reporting
SD, they repeated the analysis also adding those 47.8 C
onclusion: How to Prepare
studies in which the SD was imputed as the mean the Manuscript
value, reporting no substantial differences in the
results. In their final evaluation, they therefore The very last step in meta-analysis is to prepare a
disclosed this issue and presented the data rela- manuscript that is complete, essential, clear to
tive to all the studies, independently of the source the reader and suitable for publication in a peer-
of the SD. reviewed journal. First, the guideline of the target
journal should be consulted to “tailor” the manu-
script accordingly. Then, the PRISMA guidelines
47.7.3 Translate Your Findings into (Table 47.9) should be followed to fulfil the high-
Clinics est quality standard [25].
Title: should be concise and focused on the
The final difficulty in interpreting the meta- topic, identifying the paper as a meta-analysis.
analysis results lies in applying the results to Abstract: should be structured, including all
clinical practice [32]. It is important to correctly the sections of the paper.
disclose whether the individual studies pooled in Introduction: should be short and focused on
the meta-analysis can be generalised to a specific the topic, expressing the rationale of the meta-
clinical scenario. This includes ensuring similar analysis, the purpose and the hypothesis.
patient populations, interventions and outcomes Methods: should mention all the information
of interest. For example, Jiang et al. [18] reported regarding the databases used, the timing of the
no differences in the rates of return to sport search, the keywords, the eligibility criteria, the
between patients undergoing surgical repair or methods of data extraction and the items evalu-
non-surgical treatment after Achilles tendon rup- ated. The statistical method used to combine the
ture. However, since the RCTs included in the results, the bias evaluation and eventual sensitiv-
meta-analysis evaluated patients with a mean age ity or subgroup analysis should be mentioned as
of around 40 years, this recommendation should well.
be applied with extreme caution in young, pro- Results: should be clear and easy to under-
fessional athletes. stand. All included and excluded studies should
Finally, attention should be paid when inter- be described in a flow diagram. It is recom-
preting inconclusive or counter-intuitive results, mended to present the data through forest plots
which is one of the most common errors in scien- and summary tables. The results of sensitivity or
tific manuscripts. When there is inconclusive evi- subgroup analysis and of bias evaluation should
dence, it is not appropriate to state that “there is be provided in this section.
evidence of no effect”. It is instead more appro- Discussion: should not be too long and should
priate to state that “there is no evidence of an preferably focus on the main findings of the
effect”. meta-analysis. The evidence should be sum-
47 A Practical Guide to Writing (and Understanding) a Scientific Paper: Meta-Analyses 493
Table 47.9 Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) checklist as guideline
for the final production of a meta-analysis (adapted from: Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gøtzsche PC,
Ioannidis JP, Clarke M, Devereaux PJ, Kleijnen J, Moher D. The PRISMA statement for reporting systematic reviews
and meta-analyses of studies that evaluate healthcare interventions: explanation and elaboration. BMJ. 2009 Jul
21;339:b2700)
Preferred reporting items for systematic reviews and meta-analyses (PRISMA) checklist
Section Item Description Page
Title 1 Title Identify the report as a systematic review, meta-analysis,
or both
Abstract 2 Structured summary Include, as applicable: background; objectives; data
sources; eligibility criteria, participants and interventions;
synthesis methods; results; limitations; conclusions
Introduction 3 Rationale Describe the rationale for the review in the context of
what is already known
4 Objectives Provide an explicit statement of questions being addressed
with reference to participants, interventions, comparisons,
outcomes and study design (PICOS)
Methods 5 Protocol and registration Indicate if a review protocol exists and, if available,
provide registration information including registration
number
6 Eligibility criteria Specify study characteristics and report characteristics
used as criteria for eligibility, giving rationale
7 Information sources Describe all information sources and date last searched
8 Search Present full electronic search strategy for at least one
database, including any limits used, such that it could be
repeated
9 Study selection State the process for selecting studies
10 Data collection process Describe the method of data extraction from reports
11 Data items List and define all variables for the data that were sought
and any assumptions and simplifications made
12 Risk of bias in individual Describe the methods used for assessing risk of bias of
studies individual studies and how this information is to be used
in any data synthesis
13 Summary measurements State the principal summary measurements (e.g. risk ratio,
difference in means)
14 Synthesis of results Describe the methods for handling data and combining
results of studies
15 Risk of bias across studies Specify any assessment of risk of bias that may affect the
cumulative evidence (e.g. publication bias)
16 Additional analyses Describe methods of additional analyses (e.g. sensitivity
or subgroup analyses, meta-regression)
Results 17 Study selection Give numbers of studies screened, assessed for eligibility
and included in the review, with reasons for exclusion at
each stage (flow diagram)
18 Study characteristics For each study, present characteristics of the data that
were extracted
19 Risk of bias within studies Present data on risk of bias of each study
20 Results of individual For all outcomes considered, present for each study
studies simple summary data for each intervention group and
effect estimates with CI (forest plot)
21 Synthesis of results Present results of each meta-analysis conducted, including
confidence intervals and measurements of consistency.
22 Risk of bias across studies Present results of any assessment of risk of bias across
studies
23 Additional analysis Give results of additional analyses, if performed (e.g.
sensitivity or subgroup analyses, meta-regression)
(continued)
494 A. Grassi et al.
Table 47.9 (continued)
Preferred reporting items for systematic reviews and meta-analyses (PRISMA) checklist
Section Item Description Page
Discussion 24 Summary of evidence Summarise the main findings including the strength of
evidence for each main outcome
25 Limitations Discuss limitations at study and outcome level and at
review level
26 Conclusions Provide a general interpretation of the results in the
context of other evidence and implications for future
research
Funding 27 Funding Describe sources of funding for the systematic review and
other support; role of funders for the systematic review
4. Deeks JJ, Altman DG, Bradburn MJ. Statistical 16. Huwiler-Müntener K, Jüni P, Junker C, Egger
methods for examining heterogeneity and combin- M. Quality of reporting of randomized trials
ing results from several studies in meta-analysis. as a measure of methodologic quality. JAMA.
In: Egger M, Davey Smith G, Altman DG, editors. 2002;287(21):2801–4.
Systematic reviews in health care: meta-analysis in 17. Israel H, Richter RR. A guide to understanding meta-
context. 2nd ed. London: BMJ Publication Group; analysis. J Orthop Sports Phys Ther. 2011;41(7):496–
2001. 504. https://doi.org/10.2519/jospt.2011.3333.
5. Deeks JJ, Higgins JPT, Altman DG. Chapter 9: 18. Jiang N, Wang B, Chen A, Dong F, Yu B. Operative
Analysing data and undertaking meta-analyses. In: versus nonoperative treatment for acute Achilles ten-
Higgins JPT, Green S, editors. Cochrane handbook don rupture: a meta-analysis based on current evi-
for systematic reviews of interventions. Chichester: dence. Int Orthop. 2012;36(4):765–73. https://doi.
Wiley; 2008. org/10.1007/s00264-011-1431-3.
6. DerSimonian R, Laird N. Meta-analysis in clinical tri- 19. Koretz RL, Lipman TO. Understanding systematic
als. Control Clin Trials. 1986;7:177–88. reviews and meta-analyses. JPEN J Parenter Enteral
7. Foster TE, Wolfe BL, Ryan S, Silvestri L, Kaye Nutr. 2016. pii: 0148607116661841.
EK. Does the graft source really matter in the out- 20. Lefebvre C, Manheimer E, Glanville J. Chapter 6:
come of patients undergoing anterior cruciate liga- Searching for studies. In: Higgins JPT, Green S, edi-
ment reconstruction? An evaluation of autograft tors. Cochrane handbook for systematic reviews of
versus allograft reconstruction results: a systematic interventions. Chichester: Wiley; 2008.
review. Am J Sports Med. 2010;38(1):189–99. https:// 21. Lefaivre KA, Slobogean GP. Understanding system-
doi.org/10.1177/0363546509356530. atic reviews and meta-analyses in orthopaedics. J
8. Grassi A, Ardern CL, Marcheggiani Muccioli GM, Am Acad Orthop Surg. 2013;21(4):245–55 . Review.
Neri MP, Marcacci M, Zaffagnini S. Does revision https://doi.org/10.5435/JAAOS-21-04-245.
ACL reconstruction measure up to primary surgery? 22. Magnussen RA, Carey JL, Spindler KP. Does auto-
A meta-analysis comparing patient-reported and graft choice determine intermediate-term outcome
clinician-reported outcomes, and radiographic of ACL reconstruction? Knee Surg Sports Traumatol
results. Br J Sports Med. 2016;50(12):716–24. Arthrosc. 2011;19(3):462–72.
https://doi.org/10.1136/bjsports-2015-094948. 23. Marx RG. Anatomic double-bundle anterior cruciate
9. Grassi A, Zaffagnini S, Marcheggiani Muccioli GM, ligament reconstruction was superior to conventional
Neri MP, Della Villa S, Marcacci M. After revision single-bundle reconstruction. J Bone Joint Surg Am.
anterior cruciate ligament reconstruction, who returns 2013;95(4):365. https://doi.org/10.2106/JBJS.9504.
to sport? A systematic review and meta-analysis. Br ebo804.
J Sports Med. 2015;49(20):1295–304. https://doi. 24. Mantel N, Haenszel W. Statistical aspects of the anal-
org/10.1136/bjsports-2014-094089. ysis of data from retrospective studies of disease. J
10. Grassi A, Zaffagnini S, Marcheggiani Muccioli
Natl Cancer Inst. 1959;22:719–48.
GM, Roberti Di Sarsina T, Urrizola Barrientos F, 25. Moher D, Liberati A, Tetzlaff J, Altman DG, PRISMA
Marcacci M. Revision anterior cruciate ligament Group. Preferred reporting items for systematic
reconstruction does not prevent progression in one reviews and meta-analyses: the PRISMA statement.
out of five patients of osteoarthritis: a meta-analysis BMJ. 2009;339:b2535. https://doi.org/10.1136/bmj.
of prevalence and progression of osteoarthritis. J b2535. No abstract available.
ISAKOS. 2016;1(1):16–24. https://doi.org/10.1136/ 26. O’Connor D, Green S, Higgins JPT. Chapter 5:
jisakos-2015-000029. Defining the review question and developing crite-
11. Greco T, Zangrillo A, Biondi-Zoccai G, Landoni
ria for including studies. In: Higgins JPT, Green S,
G. Meta-analysis: pitfalls and hints. Heart Lung editors. Cochrane handbook of systematic reviews of
Vessel. 2013;5(4):219–25. Review. interventions. Chichester: Wiley; 2008.
12. Higgins JPT, Altman DG. Chapter 8: Assessing risk 27. Olivo SA, Macedo LG, Gadotti IC, Fuentes J, Stanton
of bias in included studies. In: Higgins JPT, Green S, T, Magee DJ. Scales to assess the quality of random-
editors. Cochrane handbook for systematic reviews of ized controlled trials: a systematic review. Phys Ther.
interventions. Chichester: Wiley; 2008. 2008;88(2):156–75.
13. Higgins JPT, Deeks JJ. Chapter 7: Selecting studies 28. Parlamas G, Hannon CP, Murawski CD, Smyth
and collecting data. In: Higgins JPT, Green S, editors. NA, Ma Y, Kerkhoffs GM, van Dijk CN, Karlsson
Cochrane handbook for systematic reviews of inter- J, Kennedy JG. Treatment of chronic syndesmotic
ventions. Chichester: Wiley; 2008. injury: a systematic review and meta-analysis. Knee
14. Higgins JPT, Deeks JJ, Altman DG. Chapter 16:
Surg Sports Traumatol Arthrosc. 2013;21(8):1931–9.
Special topics in statistics. In: Higgins JPT, Green S, Review. https://doi.org/10.1007/s00167-013-2515-y.
editors. Cochrane handbook for systematic reviews of 29. Reeves BC, Deeks JJ, Higgins JPT, Wells GA. Chapter
interventions. Chichester: Wiley; 2008. 13: Including non-randomized studies. In: Higgins
15. Higgins JPT, Green S, editors. Cochrane handbook JPT, Green S, editors. Cochrane handbook for sys-
for systematic reviews of interventions. Chichester: tematic reviews of interventions. Chichester: Wiley;
Wiley; 2008. 2008.
47 A Practical Guide to Writing (and Understanding) a Scientific Paper: Meta-Analyses 497
30.
Russo MW. How to review a meta-analysis. analysis of randomized trials. J Bone Joint Surg
Gastroenterol Hepatol (N Y). 2007;3(8):637–42. Am. 2012;94(23):2136–43. https://doi.org/10.2106/
31. Schunemann HJ, Oxman AD, Higgins JPT, Vist GE, JBJS.K.00917.
Glasziou P, Guyatt GH. Chapter 11: Presenting results 34. Stang A. Critical evaluation of the Newcastle-Ottawa
and ‘summary of findings’ tables. In: Higgins JPT, scale for the assessment of the quality of nonran-
Green S, editors. Cochrane handbook for systematic domized studies in meta-analyses. Eur J Epidemiol.
reviews of interventions. Chichester: Wiley; 2008. 2010;25(9):603–5. https://doi.org/10.1007/
32. Schunemann HJ, Oxman AD, Vist GE, Higgins
s10654-010-9491-z.
JPT, Deeks JJ, Glasziou P, Guyatt GH. Chapter 12: 35. Sterne JAC, Egger M, Moher D. Chapter 10:
Interpreting results and drawing conclusions. In: Addressing reporting biases. In: Higgins JPT, Green
Higgins JPT, Green S, editors. Cochrane handbook S, editors. Cochrane handbook for systematic reviews
for systematic reviews of interventions. Chichester: of interventions. Chichester: Wiley; 2008.
Wiley; 2008. 36. Wright JG, Swiontkowski MF, Tolo VT. Meta-
33. Soroceanu A, Sidhwa F, Aarabi S, Kaufman A,
analyses and systematic reviews: new guidelines for
Glazebrook M. Surgical versus nonsurgical treat- JBJS. J Bone Joint Surg Am. 2012;94(17):1537. No
ment of acute Achilles tendon rupture: a meta- abstract available.
A Practical Guide to Writing (and
Understanding) a Scientific Paper:
48
Clinical Studies
Riccardo Compagnoni, Alberto Grassi,
Stefano Zaffagnini, Corrado Bait,
Kristian Samuelsson, Alessandra Menon,
and Pietro Randelli
familiar with relevant guidelines and the structure free digital archive of articles, accessible to any-
and content of specific documents, and, finally, one from anywhere via a basic web browser. The
have a good set of writing skills [19]. full text of all PubMed Central articles is free to
The aim of this paper is to provide tips read, with varying provisions for reuse.
acquired from authors attempting to help to write To obtain more effective results, there are some
good-quality clinical papers without wasting tips to take into consideration in terms of using
their time and how to publish them in the appro- PubMed. The home page has a link to some tutori-
priate journals. We have selected specific topics, als, which will help the researcher to use the search
which are analyzed in dedicated paragraphs. The engine correctly. MeSH (Medical Subject
different selections are how to choose the topic, Headings) is the NLM (National Library of
find the current literature, analyze the data, struc- Medicine)-controlled vocabulary thesaurus used
ture the paper, write the paper, and handle refer- for indexing articles for PubMed. This vocabulary
ences and some final tips on how to manage the helps to identify the correct words that should be
submission process. used in medical research and helps the authors to
find the correct subject headings. It is very impor-
tant to remember that, in MeSH, the subject head-
48.1.1 How to Choose the Topic ings are arranged in a hierarchy and a search for a
of Your Work descriptor will include all the descriptors in the
hierarchy below the given one [8].
Before beginning to enroll patients, to collect Regional health and medical databases have
data, or to research the literature, the first step is been compiled by the WHO (World Health
to choose a topic of interest for your work [17, Organization) to complement the internationally
23]. To do this, the best way is to ask your leader known bibliographic indices such as MEDLINE.
what is of interest, considering the trends in sur- The regional medical indexes, published by or
gical procedures or the introduction of innovative under the auspices of WHO Regional Offices,
techniques [21]. Once a topic has been identified, provide access to bibliographic information
an accurate search of the literature, using the about the health material published locally. They
most common databases, can help the researcher thus add a further dimension to the retrieval of
to identify the actual “hot” topic in that specific information from developed country-oriented
field of research. It is necessary to know the time- databases [10].
lines to follow, the data that should be collected, The Cochrane Library is a collection of data-
and the review/approval process to be followed. bases in medicine and other health-care special-
ties provided by Cochrane and other organizations.
At its core is the collection of Cochrane Reviews,
48.1.2 Find Current Literature a database of systematic reviews and meta-analy-
ses that summarize and interpret the results of
Medical science is based on data available in the medical research. The Cochrane Library aims to
literature. The widespread availability of informa- make the results of well-conducted controlled tri-
tion due to the use of the Internet has given clini- als readily available and is a key resource in evi-
cians an opportunity to access the literature well. dence-based medicine [6, 9].
The most used database in which life sciences and
biomedical papers are collected is MEDLINE
(Medical Literature Analysis and Retrieval 48.1.3 How to Decide on the Kind
System Online, MEDLARS Online). PubMed is a of Clinical Paper
free search engine primarily accessing the
MEDLINE database of references and abstracts There are different types of scientific article, some
on life sciences and biomedical topics [8]. of which require original research (primary litera-
It is important to remember that PubMed ture) and some that are based on other published
should not be confused with PubMed Central, a work (secondary literature). It is important to have
48 A Practical Guide to Writing (and Understanding) a Scientific Paper: Clinical Studies 501
correct and, if researchers do not write fluent Many reviewers regard the introduction as a
English, it is suggested that they should use section of little interest and general—and
online services or mother tongue colleagues/ lengthy—considerations about the topic or com-
copy editors. There are limits to word count, mon knowledge should be avoided.
usually 3000–4000 words (different for differ- The introduction should finish with a state-
ent journals), and many papers are too long and ment of the clear aim of the study, with primary
may be rejected for that reason alone. and secondary outcomes considered, and the
The following section will provide some tips hypothesis of the investigation, at least for clini-
for the correct writing of the different sections of cal studies.
a manuscript.
48.2.3 Methods
48.2.1 Abstract
The methods section is an exact description of
The usual sections defined in a structured abstract the work done by researchers involved in the
are background/purpose, methods, results, and study and should explain the structure of the
conclusions. The abstract should provide the study, the way in which the results were obtained,
background to the study and should state the aim, and all the steps relating to patient management,
the basic procedures/methods (selection of study from enrollment to final evaluation.
participants, settings, measurements, and analyti- The first statements have to describe the
cal methods), the main findings (giving specific research design, the clinical diagnosis of the
effect sizes and their statistical and clinical sig- patients recruited, and in most cases the setting of
nificance—but not repeat which statistical meth- the study.
ods were used), and principal conclusions. The If the study compares the results of the treat-
results section is the most important part of the ment in different groups, these groups have to be
abstract, and nothing should compromise its described carefully. Randomization is of central
range and quality. It should emphasize new and importance in clinical trials because it reduces bias
important aspects of the study or observations, and represents a basis for ensuring the validity of
without overinterpreting findings. Most journals data analysis using statistical testing. The genera-
require abstracts that conform to a formal struc- tion of an unpredictable allocation sequence repre-
ture within a word count of 200–250 words. Even sents the first crucial element of randomization in
if the abstract is the first part of any article and a randomized, controlled trial [20]. The two funda-
the only section freely available in the most used mental characteristics of randomization are that
search engine, it should be written after the paper researchers must be unable to predict the group to
is finished. which a patient will be assigned until the patient is
The ICMJE recommends that journals publish unambiguously registered in the study and that
the clinical trial registration number at the end of researchers are unable to change a patient’s alloca-
the abstract [3, 7]. Level of Evidence is also fre- tion once he/she has been randomized. Remember
quently added at the end of the abstract. that randomization with a low-quality allocation
sequence can result in a biased estimation of the
treatment effect [14, 22].
48.2.2 Introduction The treatment given to patients has to be
described in a detailed manner. The length of the
The scope of the introduction is to show what is study from enrollment to conclusion has to be
already known on the subject of the paper and— reported. Outcome measurements using instru-
more important—to introduce what is unknown ments or scoring systems should be described
to justify the structure of the study and what is and explained. Care should be taken to select the
intended to be examined. appropriate outcomes for the specific disease or
48 A Practical Guide to Writing (and Understanding) a Scientific Paper: Clinical Studies 503
The aim of the discussion section is to interpret references should be formatted and ordered
and describe the significance of the study find- according to the specific journal guidelines. All
ings in the light of what was already known about too often this is not adhered to.
the research problem being investigated and to
explain the impact of the results on that specific
topic. Each of the main results of the study should 48.3 How to Manage
be analyzed and discussed, possibly in the same the Submission Process
order as they are reported in the results section.
The discussion section is based on interpretation, The submission process is the last step in the
which is a subjective exercise. For this reason, preparation of the article. This step is often com-
the author must avoid overinterpreting the results plex, due to the different submission manager
of the study, one of the most frequent errors platforms at different journals. One suggestion to
made. On the other hand, the discussion section help researchers to make it easier is to prepare all
is not a repetition of the introduction or methods the materials before starting the submission, also
sections. The discussion section has to acknowl- collecting information on ethics committees and
edge any limitations of the study, especially in clinical trial registration. More and more journals
terms of the methodology, and any alternative nowadays require a disclosure of financial con-
explanations of the findings [2]. A concluding flicts of interest, which should be collected from
take-home message can restate the answer one every author, possibly before submission, espe-
last time and/or indicate the importance of the cially in the case of multicenter studies.
work by stating implications, applications, or Sometimes, minor mistakes such as line number-
recommendations [24]. ing or line spacing determine the need for revi-
sion and time loss.
One useful tip is to check the PDF generated
48.2.6 References by the submission system to check for errors or
mistakes.
Authors should provide direct references to origi-
nal research sources whenever possible.
Secondary references should be avoided. Science 48.4 Conclusion
is based on the results produced by the scientific
community, and the method the community uses Becoming a good scientific writer requires a long
to share results is publication in scientific jour- learning curve. The authors of this short guide
nals. This database of information has become feel that, to obtain good results in writing, a pas-
larger in recent years, thanks to the use of the sion for clinical practice and being interested in
Internet. This is an important opportunity for analyzing the results of the clinical work are cru-
researchers who can access a great deal of infor- cial. Writing and sharing the results of clinical
mation about a specific topic. Every published work with others means taking part in something
paper cites previous articles supporting the state- greater, whose aim is to obtain better results in
ments and background of the research, and these the treatment of the patients.
articles are cited at the end of the paper. Good luck!
References should not be used by authors, edi-
tors, or peer reviewers to promote self-interest.
One suggestion for researchers is to use only ref- References
erences useful for the article, supporting specific
purposes. Considering that science is always 1. Altman DG, Simera I, Hoey J, Moher D, Schulz
K. EQUATOR: reporting guidelines for health
evolving and the impact of journals is different, research. Lancet. 2008;371(9619):1149–50.
references must be updated and, if possible, they 2. Annesley TM. The discussion section: your closing
should attempt to cite articles in the most relevant argument. Clin Chem. 2010;56(11):1671–4. https://
journals [15]. It is not superfluous to mention that doi.org/10.1373/clinchem.2010.155358.
48 A Practical Guide to Writing (and Understanding) a Scientific Paper: Clinical Studies 505
3. Andrade C. How to write a good abstract for a sci- 16. O’Connor TR, Holmquist GP. Algorithm for writ-
entific paper or conference presentation. Indian ing a scientific manuscript. Biochem Mol Biol Educ.
J Psychiatry. 2011;53(2):172–5. https://doi. 2009;37(6):344–8.
org/10.4103/0019-5545.82558. 17. Randelli PS, et al. Needs and wishes from the arthros-
4. Barron JP. The uniform requirements for manuscripts copy community. In: Karahan M, Kerkhoffs G,
submitted to biomedical journals recommended by Randelli P, Tuijthof G, editors. Effective training of
the International Committee of Medical Journal arthroscopic skills. Berlin: Springer; 2015.
Editors. Chest. 2006;129(4):1098–9. 18. Shah J, Shah A, Pietrobon R. Scientific writing of nov-
5. Burns PB, Rohrich RJ, Chung KC. The levels of ice researchers: what difficulties and encouragements
evidence and their role in evidence-based medicine. do they encounter? Acad Med. 2009;84(4):511–6.
Plast Reconstr Surg. 2011;128(1):305–10. https://doi. https://doi.org/10.1097/ACM.0b013e31819a8c3c.
org/10.1097/PRS.0b013e318219c171. 19. Sharma S. How to become a competent medical
6. http://www.cochranelibrary.com. writer? Perspect Clin Res. 2010;1(1):33–7. PubMed
7. http://www.icmje.org/recommendations/browse/man- PMID: 21829780.
uscript-preparation/preparing-for-submission.html. 20. Schulz KF, Grimes DA. Generation of allocation
8. https://www.ncbi.nlm.nih.gov/pmc/. sequences in randomised trials: chance, not choice.
9. https://en.wikipedia.org/wiki/Cochrane_Library. Lancet. 2002;359:515–9.
10. http://www.who.int/library/country/regional/en/. 21. Tuijthof G, Cabitza F, Ragone V, Compagnoni R,
11. Kallestinova ED. How to write your first research Dutch Arthrocopy Society Teaching Committee,
paper. Yale J Biol Med. 2011;84(3):181–90. Randelli P. What arthroscopic skills need to be trained
12. Liumbruno GM, Velati C, Pasqualetti P, Franchini before continuing safe training in the operating room?
M. How to write a scientific manuscript for publica- J Knee Surg. 2017;30(7):718–24.
tion. Blood Transfus. 2013;11(2):217–26. 22. Vickers AJ. How to randomize. J Soc Integr Oncol.
13. Levin KA. Study design IV. Cohort studies. Evid
2006;4(4):194–8.
Based Dent. 2006;7(2):51–2. 23. Whitlock EP, Lopez SA, Chang S, et al. Identifying,
14. Moher D, Pham B, Jones A, et al. Does quality of selecting, and refining topics. In: Methods guide for
reports of randomised trials affect estimates of inter- effectiveness and comparative effectiveness reviews
vention efficacy reported in meta-analyses? Lancet. [Internet]. Rockville: Agency for Healthcare Research
1998;352(9128):609–13. and Quality (US); 2008.
15. Nahata MC. Tips for writing and publishing an article. 24.
Zeiger M. Essentials of writing biomedical
Ann Pharmacother. 2008;42(2):273–7. https://doi. research papers. New York: McGraw-Hill; 2000.
org/10.1345/aph.1K616. p. 176–219.
Reporting Complications in
Orthopaedic Trials
49
S. Goldhahn, Norimasa Nakamura,
and J. Goldhahn
ested in the surgeons’ perspective nor in the On top of the hierarchy, the legal perspective
legal perspective. They simply want to get func- determines the relation to any tested implant or
tion and quality of life re-established, and they treatment and classifies the severity according to
regard everything as a complication that devi- established guidelines. It is mostly a subset of the
ates from the normal course of healing and reha- complications that may matter to the patients as
bilitation. In addition, they should get unbiased described above.
information about expected complication risk as In accordance with the guidelines of the
a base for shared decision-making [1]. International Conference on Harmonization
The outlined consequences demonstrate that (ICH) [E2A and E6(R2)] Integrated Addendum
it seems almost impossible to satisfy all per- to GCP guideline] and the ISO 14155:2011(E), a
spectives at the same time. Therefore, a prag- serious adverse event (SAE) is defined as any
matic approach is required that should untoward medical occurrence that:
acknowledge relevance. A severe complication
may lead to a decrease in surgical reputation • Results in death
and/or a withdrawal of an implant with some • Is life-threatening (note: the term life-
financial consequences for the manufacturer. threatening in the definition of “serious” refers
However, a patient may suffer from the conse- to an event in which the patient was at risk of
quences of a complication for the rest of his/her death at the time of the event; it does not refer
life or even die. Therefore, complications have to an event which hypothetically might have
the highest relevance for the person experienc- caused death if it were more severe)
ing it. Consequently, definitions of complica- • Requires inpatient hospitalization or prolon-
tions should be patient-centred. gation of existing hospitalization
This leads to a hierarchical approach to com- • Results in persistent or significant disability/
plications (see Fig. 49.1). Whereas the surgical incapacity
perspective is based on experience and always • Necessitates medical or surgical intervention
includes reasoning and causality, the patient per- to prevent permanent impairment to a body
spective serves as a filter. Any event without any structure or a body function
harm or consequences to the patients might not • Leads to foetal distress, foetal death or con-
be considered as a complication. genital abnormality or birth defect
49 Reporting Complications in Orthopaedic Trials 509
Fig. 49.2 Clear distinctions should be made between the J. Karlsson et al. A practical guide to research: design,
complication and adverse event themselves, their most execution, and publication. Arthroscopy. 2011 April 27, 4
likely causal factors, their treatment (which could actually Suppl:S94)
be no action) and their consequences or outcomes (Source:
49 Reporting Complications in Orthopaedic Trials 511
Table 49.2 Proposed classification of complications based on their most likely causative factor
Category Class Number Example
Treatment Related to the surgical technique 1a Malpositioning of screws, wrong procedure
related Related to the device/treatment 1b Loosening of polyethylene glenoid due to wear
Patient related Related to local tissue condition 2a Cut-out of correct placed screw due to poor
bone quality
Related to overall patient condition, e.g. 2b Myocardial infarction
systemic
512 S. Goldhahn et al.
The final complication review should be con- Table 49.3 Example of presentation of complication
ducted based on complication/AE forms, as well risks
as additional diagnostics to complete the case. Type of complication n Risk (%) 95% CI
Complication data is reviewed for its clinical per- Post-operative local implant/ 18 10.2 (6.1–15.6)
bone complications
tinence, classification, severity as well as relation
Implant
to the investigated treatment or medical device. Blade migration 1 0.6 (0.01–3.1)
All changes and data corrections should be thor- Implant breakage 3 1.7 (0.35–4.9)
oughly justified and documented. Cut-out 2 1.1 (0.14–4.0)
Other implant complications 2 1.1 (0.14–4.0)
Bone/fracture
49.8 Analysis Loss of reduction 1 0.6 (0.01–3.1)
Neck shortening 8 4.5 (2.0–8.7)
Other bone complications 6 3.4 (1.3–7.2)
A minimum set of complication analyses should
Number of patients N = 177
be conducted in any study. However, it should be n—number of patients with at least one complication (it
noted that if regulatory requirements oblige means the patient can have more than one complication,
investigators to document all complications but for the risk calculation, the number of patients experi-
occurring during a study, only a specific clini- encing complication(s) is used)
Risk—number of patients having a specific complication
cally relevant subset may be analysed to answer a divided by the number of patients being enrolled in the
study objective. It is critical to clearly define study
which complications are included in such a sub- 95% CI—95% binomial exact confidence interval
set and specify the period of observation (e.g.
intraoperative, post-operative, follow-up periods) related to potential causal factors or severity [7];
to allow appropriate interpretation of the results. even mild anticipated complications in the frame-
In the context of prospective clinical investiga- work of clinical research require official reporting
tions, the timing of observation for each patient to authorities if their rate of occurrence is higher
starts with the initiation of treatment or primary than what can be reasonably be expected in any
surgery and ends at the end of the study. For the study. While all complications must be docu-
report of complications, complication risks can mented from a regulatory viewpoint, the primary
be calculated and presented as shown in analysis can be focused on the patient- relevant
Table 49.3. complications.
Conducting an independent review of compli-
cations is important for the credibility of safety
Complication risks should be presented data. Complication rates in the literature are
based on the number of patients experienc- most often elusive [5]. In addition, they are likely
ing complications and not on the total num- underestimated when documented by the
ber of documented complications. inventor(s) of any surgical techniques. Despite
all efforts at standardization, the assessment and
reporting of complications will always require
According to our experience, for many surgeons clinical judgement and therefore remain partly
an event that is unrelated to the treatment may not subjective. A Complication Review Board
be considered as a complication and must therefore (CRB) can address such limitation, and can be
not be documented. In addition, clinicians may established also for single-centre studies. A CRB
sometimes feel they do not need to document can, for instance, consist of two to four orthopae-
events that have no or limited consequences for the dic surgeons (at least one of them should not be
patients to avoid documentation overload. involved in the study), a radiologist and a meth-
Nevertheless, harmonized standards for the con- odologist. It is to be distinguished from any Data
duct of clinical trials define complication as “any Monitoring Committee (DMC) [9] established
untoward medical occurrence” not necessarily as part of large multicentre studies; while the
49 Reporting Complications in Orthopaedic Trials 513
The unregulated use of medical technologies are protected, consistent with the principles that
can result in significant harm to patients as well have their origin in the Declaration of Helsinki,
as poor research. Failure by clinicians or research- and that the clinical trial data are credible. The
ers to comply with regulatory requirements can objective of the ICH GCP Guideline is to provide
result in significant penalties, including being a unified standard for the European Union (EU),
banned from performing certain research or clini- Japan, and the United States to facilitate the
cal activities, civil liability, or criminal punish- mutual acceptance of clinical data by the regula-
ment including fines or even imprisonment. tory authorities in these jurisdictions. The guide-
Therefore it is critical to be aware of the different line was developed with consideration of the
regulatory issues regarding research and to be current good clinical practices of the European
meticulously compliant. Union, Japan, and the United States, as well as
The burden of meeting regulations may be those of Australia, Canada, the Nordic countries,
quite high and can limit the speed of progress in and the World Health Organization (WHO) [1].
research. There is a balance between the goals of The ICH E6 (R1) [2, 3] guidelines were devel-
protecting the public from inadequately tested oped in 1996 to harmonize the requirements for
drugs or devices and permitting the expeditious registering medicines in Europe, Japan, and the
approval of new therapies that are of benefit to United States. Since they are internationally rec-
patients. The kind and type of regulation can vary ognized, they permit all clinical trial evidence
related to politics, societal needs, or different from one country to be accepted by another coun-
weightings of ethical considerations. try. The general principles of ICH expand on the
Declaration of Helsinki which are a set of ethical
principles applicable to human research devel-
50.2 Good Clinical Practice oped by the World Medical Association (WMA).
The ICH GCP E6 (R2) [4, 5] was finalized in
The new researchers must be aware of their moral 2016 and addresses the increased complexity of
and ethical obligation to conduct research in a clinical trials and electronic data recording and
manner that assures the public that the rights, reporting. The new draft is the biggest revision of
safety, and well-being of human research partici- the guideline in over 20 years.
pants (i.e., subjects) are protected. A new clinical researcher should be aware of
Good Clinical Practice (GCP) is defined by the terms sponsor and investigator to understand
the International Conference on Harmonisation his/her role and responsibilities. Clinical trials
(ICH) as an international ethical and scientific may be initiated by a sponsor, investigator, or
quality standard for designing, conducting, someone serving as both. Clinical trials involve
recording, and reporting trials that involve the several teams responsible for specific roles. The
participation of human subjects. Compliance investigator may delegate certain roles to his/her
with this standard provides public assurance that team, but is responsible for the conduct of the
the rights, safety, and well-being of trial subjects clinical trial.
Fact Box 50.1: ICH Guidance Is Divided into Four Categories, and ICH Topic Codes Are Assigned
According to These Categories [1]
Q: quality S: safety E: efficacy M: multidisciplinary
Chemical and Safety of the medicinal Clinical studies in human Issues that do not fall in
pharmaceutical quality of product (toxicology, subjects (dose response, the other categories
a drug (stability, carcinogenicity, GCP, trial design, (MedDRA—standardized
validation, impurity genotoxicity) conduct, analysis, AE medical coding for
testing) reporting) adverse events)
50 Understanding and Addressing Regulatory Concerns in Research 517
Fact Box 50.3: The ICH GCP E6 (R2) Addendum Introduced 26 New Items Covering 3 Main Areas of
Clinical Research: Data Management and Sponsor and Investigator Responsibilities [6, 7]
Number of
amended Name of the amended
items section Number and name of the amendment
1 Introduction NA
4 Glossary • Certified Copy 1.11.1
• Monitoring Plan 1.38.1
• Monitoring Report 1.39
• Validation of computerized systems 1.60.1
1 The principles of ICH • The principles of ICH GCP 2.10
GCP
3 Investigator • Adequate Resources 4.2.5, 4.2.6
• Records and Reports 4.9.0
16 Sponsor • Quality Management 5.0, 5.0.1, 5.0.2, 5.0.3, 5.0.4, 5.0.5, 5.0.6,
5.0.7
• Contract Research Organization (CRO) 5.2.1, 5.2.2
• Trial Management, Data Handling, and Record Keeping 5.5.3(b),
5.5.3 (h)
• Monitoring 5.18.3, 5.18.6(e), 5.18.7
• Noncompliance 5.20.1
1 Essential documents for • Essential documents for the conduct of Clinical Trial 8.1
the conduct of a clinical
trial
518 J. L. Koh et al.
Fact Box 50.5: Clinical Trial Registry Fact Box 50.6: Important Dates Related to
Resources ICMJE Data Sharing [14, 15]
www.clinicaltrials.gov [16]: ClinicalTrials. • As of July 1, 2018, manuscripts submit-
gov is a resource provided by the US ted to ICMJE journals that report the
National Library of Medicine. results of clinical trials must contain a
http://www.isrctn.com [17]: The data-sharing statement.
ISRCTN registry is a primary clinical trial • Clinical trials that begin enrolling par-
registry recognized by WHO and ICMJE ticipants on or after January 1, 2019,
that accepts all clinical research studies must include a data-sharing plan in the
(whether proposed, ongoing, or com- trial’s registration. If the data-sharing
pleted), providing content validation and plan changes after registration, this
curation and the unique identification num- should be reflected in the statement sub-
ber necessary for publication. mitted and published with the manu-
http://www.who.int/ictrp/network/pri- script and updated in the registry record.
mary/en/ [18]: Primary Registries in the
WHO Registry Network meet specific cri-
teria for content, quality and validity, (ISRCTN) [17] meet those criteria and have
accessibility, unique identification, techni- become the most widely used registries to meet
cal capacity, and administration. Primary the ICMJE requirement.
Registries meet the requirements of the These registries contain information about the
ICMJE. clinical studies such as the investigation product
under research, study objectives, study design,
participating investigators and institutions, and
2005, will be entered in a public registry at the funding.
beginning of enrollment in order to be considered The purpose of registering a clinical trial is to
for publication. The ICMJE did not specify a regis- prevent bias and selective reported outcomes, to
try, but provided criteria for a qualifying registry. prevent unnecessary duplication of research, to
The ICMJE defines a clinical trial as any keep the public informed about planned or
research project that prospectively assigns people ongoing trials, and to give ethics boards a plat-
or a group of people to an intervention, with or form to review similar works they may be
without concurrent comparison or control groups, considering.
to study the relationship between a health-related The ICMJE now requires that authors include a
intervention and a health outcome. Health-related plan for data sharing as a component of clinical
interventions are those used to modify a biomedi- trial registration. This plan must include where the
cal or health-related outcome; examples include researchers will house the data and, if not in a pub-
drugs, surgical procedures, devices, behavioral lic repository, the mechanism by which they will
treatments, educational programs, dietary inter- provide others access to the data, as well as other
ventions, quality improvement interventions, and data-sharing plan elements outlined in the 2015
process-of-care changes. Health outcomes are any Institute of Medicine report (e.g., whether data
biomedical or health-related measures obtained in will be freely available to anyone upon request or
patients or participants, including pharmacoki- only after application to and approval by a learned
netic measures and adverse events. The ICMJE intermediary, whether a data use agreement will be
does not define the timing of first participant required) [19, 20]. ClinicalTrials.gov has provided
enrollment, but best practice dictates registration data-sharing plans.
by the time of first participant consent [14, 15]. Data-sharing statements must indicate the fol-
ClinicalTrials.gov [16] and the International lowing: whether individual de-identified partici-
Standard Randomized Controlled Trial Number pant data (including data dictionaries) will be
520 J. L. Koh et al.
shared; what data in particular will be shared; and related institutional and government require-
whether additional, related documents will be ments for the protection of the human subjects
available (e.g., study protocol, statistical analysis involved in the research.
plan, etc.); when the data will become available The ethics of clinical research have developed
and for how long; and by what access criteria data from historical lessons that had few rules and
will be shared (including with whom, for what regulations to protect the clinical subject. Some
types of analyses, and by what mechanism). ethical guides that have influenced current ethics
As the clinical research industry moves toward principles include the Nuremberg Code (1947),
increased transparency, the new researcher Declaration of Helsinki (2000), and the Belmont
should be aware of the considerations when reg- Report (1979). The National Institutes of Health
istering clinical trial results and the parameters [21, 22] defines seven main principles as a guide
surrounding manuscript publication. to conduct clinical research. We review these
below, along with examples of questions that the
researcher should consider in the development of
50.4 Basic Principles, an ethical and robust clinical program:
Considerations,
and Essential Requirements 1. Social and clinical value: what is the potential
of Any Clinical/Human value of this research to the subjects involved,
Research Project to the community, and to the future of medi-
cine and improvement in medical care?
Prior to the start of any human clinical research, 2. Scientific validity: what study design features
it is important to define the research and hypoth- will reduce bias, and support appropriate
eses, understand the experimental vs. investiga- hypotheses testing for the results to be inter-
tional nature of the research, understand and be pretable, and applied to future research and/or
able to articulate risks, and to have a plan for risk medical practice?
mitigation to minimize risks prior to and during 3. Fair subject selection: what a priori criteria
the study. Additionally, there are local, regional, should be in place to avoid inclusion of sub-
and federal requirements for conduction of jects that are outside the specific population
research involving human subjects that one must for research and to avoid exclusion of subjects
be aware of. Most institutions have criteria and due to bias? What screening practices and
other requirements for the evaluation, approval, documentation of the rationale for inclusion
and implementation of human research done on- and exclusion of subjects need to be in place
site or with affiliates, so these are also important to reduce potential for bias and to allow for
to understand prior to embarking on the human appropriate subject selection into the study?
research project. Key responsibilities of the 4. Favorable risk-benefit rationale: what clinical
investigator, the institution, and those involved in and protocol risks are evident? How are risks
supporting the study from a sponsorship perspec- mitigated and managed prior to the study and
tive that are applicable to most human research during the study? How will subjects be
projects are discussed below. informed of the risks prior to enrollment?
What action will be taken if new risks are
identified?
50.4.1 Responsibilities 5. Independent review: what processes are in
of the Investigator, place to support the requirement for indepen-
Institution, and Sponsor dent review of the protocol and informed con-
sent prior to starting the study and during the
One of the most important responsibilities of study? Will a separate data review committee
each party involved is to ensure that the study is be provided to support safety and medical
conducted in accordance with ethical principles monitoring? Will the study be audited by an
50 Understanding and Addressing Regulatory Concerns in Research 521
independent party or parties? Will FDA review Each country as has a federal authority, such
be required prior to initiating the clinical as the Food and Drug Administration in the
research? United States, which oversees and allows the
6. Informed consent (written and signed): what research to occur in accordance with federal law
do subjects need to know about the research, and regulation. In some cases, the federal
the materials used in the research (e.g., drugs, authority may require review and approval of
biologics, devices, measures and tools applied, the research prior to initiation, in addition to the
surgery or other interventions), the time com- local/regional IRB or ethics committee review
mitment involved, their rights as a human sub- and approval. Most often the deciding factor as
ject, how their data will be used and what to pre-approval of research by a federal health
security will be applied to ensure privacy, and authority is the risk to human subjects associ-
what to do if they have a question or an issue ated with the protocol and the risk associated
during the study, for example? with the materials or products involved. See
7. Respect for potential and enrolled subjects: Sect. 50.3, below, for additional details regard-
what effort will be made to ensure that sub- ing federal health authority considerations and
jects are properly treated with respect and dig- investigational medicinal products, therapies,
nity throughout the study? How will access to and/or medical devices. The investigator should
subject information be controlled, have lim- check with each country-specific regulatory
ited access, and maintained? What training authority for application requirements and ver-
will study staff be given to ensure respect, pri- ify the level of federal authority involvement
vacy, and best research practices? with the IRB or ethics committee involved in
research review.
Two of the most important aspects of clinical
research include (1) independent review com-
mittees to assess and review the protocol prior 50.4.2 Purpose of the Research
to approving clinical research and (2) the
informed consent documentation, process, and The purpose of the research and intent for data
assignment/training of personnel involved in use are important to define early on. An investi-
administration of informed consent. Independent gator may be conducting clinical research for the
review committees (e.g., often referred to as an purposes of publication, institutional quality
IRB or ethics committee) are comprised of a improvement, addressing questions regarding the
group of individuals who will assess the clinical safety or efficacy of a new procedure, and/or
trial and will review the protocol and any mate- assessing the safety and effectiveness of an inves-
rials that are intended for the subject such as the tigational product. Plans as to what level of over-
informed consent, recruitment phone scripts, sight and documentation required will be driven
and advertisements for supporting community by the type of research involved, the purpose of
awareness of the study. The clinical trial should the research, and what parties are interested.
not begin until written approval of the protocol Additionally, a clinical trial proposal should
of the submitted documents is received. Only be supported by publications in a literature review
the approved documents may be used for the that will serve as the foundation for manuscript
study, and if there are any changes desired to the references and also to support the proposed
documents approved, the committee must research scientific validity and study design. Be
review and approve the changes prior to the use descriptive of the approach, discuss what was
of those new materials. This oversight during changed since previous works were conducted,
human research supports proper consideration and propose the new research to fully develop
and independent perspective and reduces bias hypotheses. Pilot studies may be used to support
and the potential for gaps in human subject hypothesis(es) and to demonstrate that subse-
protection. quent studies are warranted.
522 J. L. Koh et al.
and protocol amendments. Sections of a protocol informed consent goes beyond a signature. The
should include: informed consent should allow for dialogue about
the potential subject’s participation. The poten-
• General Information tial subject should be given adequate time to
• Background Information review the information, consider all options, and
• Trial Objectives and Purpose ensure the potential subject understands what is
• Trial Design being asked in order to participate and volun-
• Selection and Withdrawal of Subjects tarily agree to participate in the study. ICH GCP
• Treatment of Subjects [25, 26] outlines informed consent of trial sub-
• Assessment of Efficacy jects in detail.
• Assessment of Safety
• Statistics
• Direct Access to Source Data/Documents 50.4.8 Data Collection
• Quality Control and Quality Assurance and Documentation
• Ethics
• Data Handling and Record Keeping Data collection can be paper-based or collected
• Financing and Insurance via electronic data capture (EDC), each having
• Publication Policy their pros and cons. No matter whether paper or
• Supplements electronic data capture is the method to be used
in a clinical trial, the same supporting framework
should be established to ensure minimal data
50.4.6 Qualified Medical Personnel errors. When developing case report forms
and Trained Team (CRFs), ensure you are collecting pertinent data
to be analyzed. Extra data points that are not
The principal investigator is responsible for the meaningful to the research cause burden to the
clinical trial conducted at his/her site. He/she may investigational sites. A sponsor or investigational
delegate qualified personnel to conduct certain site may use source document worksheets to cap-
aspects of the cynical trial; however the principal ture data that they may not typically capture in
investigator ultimately remains responsible for private practice. Check with your regulatory
their actions. A Delegation of Authority log is pro- agency to ensure worksheets are acceptable. Any
vided to each site, and each member of the clinical type of source where the data is first documented
trial team will outline the duties they will perform. is auditable, such as electronic medical records
ICH GCP [25, 26] outlines the investigator’s quali- (EMR), radiology, and even the miscellaneous
fications and agreements which outlines that the paper used to record data such as height, weight,
investigator should be qualified by education, and vital signs.
training, and experience, they should be familiar Ensure a data management plan (DMP) is in
with the investigational product, they should com- place prior to the study. Proper data quality man-
ply with GCP, the investigator/institution should agement will help to ensure minimal errors and
permit monitoring and auditing, and the investiga- quality data that will support the study and
tor should maintain a list of qualified persons to hypothesis.
whom he/she has delegated trial-related duties.
major global regulatory or competent authorities The competent authorities, singular or multi-
(Tables 50.1, 50.2, and 50.3). ple in some cases, oversee clinical studies in
The websites can provide a source of informa- their respective countries. Initially their respon-
tion regarding regulatory requirements, processes, sibilities include reviewing and approving clini-
or expectations with regard to all aspects of the cal trial protocols and then ensuring that clinical
product. The competent authorities may also offer trials comply with national regulations and inter-
guidance documents which more definitively national guidelines. After the clinical studies,
describe information and data requirements for the development, and subsequent approval, they also
development, clinical studies, and regulatory pro- have quality assurance authority to ensure the
cess for defined indications. The guidance docu- production, distribution, labeling, and safety
ments may also outline the expectations for each monitoring of new or existing devices, drugs,
phase of clinical development (Phases I–III). and biologics.
Table 50.1 National competent authorities in major global markets: European Medicines Agency
European Medicines Agency
Austria Austrian Agency for Health and Food http://www.ages.at/
Safety
Belgium Federal Agency for Medicines and www.fagg-afmps.be/
Health Products
Croatia Agency for Medicinal Products and http://www.halmed.hr/en/
Medical Devices (HALMED)
Denmark Danish Health and Medicines Authority www.laegemiddelstyrelsen.dk
Germany Federal Office of Consumer Protection and www.bvl.bund.de
Food Safety
Greece National Organization for Medicines www.eof.gr
Italy Ministry of Health http://www.salute.gov.it/
Netherlands Healthcare Inspectorate www.igz.nl
Poland Office for Registration of Medicinal www.bip.urpl.gov.pl
Products, Medical Devices, and Biocidal
Products
Spain Spanish Agency for Medicines and Health www.aemps.gob.es
Products
United Medicines and Healthcare products https://www.gov.uk/government/organisations/
Kingdom Regulatory Agency medicines-and-healthcare-products-regulatory-agency
50.5.1 Role of Sponsors and Clinical sometimes all of the sponsor’s trial responsibili-
Research Organizations (CRO) ties. In some cases, central laboratory services are
an important ingredient of clinical trials, conduct-
If the research involves a sponsor, many times the ing work such as processing blood samples and
investigator has limited interaction with the com- reading radiographic images. Sponsors and some-
petent authority. A clinical trial sponsor can be an times competent authorities may require that one
individual, an investigator, company, institution, or single-source process sample so there is a stan-
organization that takes responsibility for the initia- dardized process, results are reliable and repro-
tion, management, and financing of a clinical trial. ducible, and data is collected and stored in a
A sponsor can be a device, pharmaceutical or bio- centralized facility. The central laboratory ser-
tech company, a non-profit organization such as a vices can be considered a specialized CRO.
research fund, a government organization or the Of note, investigators must be able to report
institution where the trial is to be conducted, or the all results of the clinical trial, regardless of out-
individual investigator. The sponsor will assume come. In the United States, there is a requirement
the responsibilities such as protocol development, that results of trials of FDA-approved products to
financing the trial, and will seek permission for be posted within 12 months of trial completion
trial initiation from the competent authority or on www.clinicaltrials.gov.
authorities. The competent authority will interact
with the sponsor and approves the trial protocol
that is provided to the investigator. The sponsor 50.6 Elements of the Technical
can have the following responsibilities: Document
• Submitting the plan for the clinical trial to the At the conclusion of the clinical research on a
competent authority for approval device, drug, or biologic, the next steps may
• Informing clinical investigators about the test involve seeking regulatory approval for commer-
article, its safety and instructions for use, as cial marketing of the investigational product. The
well as training the staff and facility regarding compilation of the technical file is a critical step in
its use and handling the regulatory approval process and includes
• Making sure there are an appropriate number detailed information about the design, function,
of test articles for the investigation composition, use, claims, and clinical evaluation
• Ensuring the trial protocol is properly of your drug, device, or biologic. The Common
reviewed by an experienced EC Technical Document (CTD) was designed to pro-
• Monitoring the trial to ensure the protocol is vide a common format between Europe, the
being followed, data collection is accurate, United States, and Japan for the technical docu-
adverse events are reviewed and reported, and mentation included in an application for the regis-
all regulations are complied with tration of a human device, drug, or biologic
product. The agreement to harmonize and assem-
If the research involves a clinical research ble all the quality, safety, and efficacy information
organization (CRO), many of the responsibilities in a common format was a major advance in the
of the investigator are delegated to the CRO. CROs global regulatory review process and enabled
are independent companies providing research implementation of good review practices. For
services for the device, drug, and biologic indus- device, drug, or biologic companies, it eliminated
try and function as outsourcing of tasks related to the need to reformat the information for submis-
clinical trials. Such outsourcing services can be sion to the different ICH competent authorities. A
related to project management, trial monitoring, CTD is a comprehensive description of your prod-
data collection, and medical statistics work. When uct and demonstrates compliance with the require-
a CRO is contracted by a sponsor, it and their des- ments of the applicable regulatory requirements.
ignated clinical trial monitor (CTM) or clinical For devices in the European Union, the directives
research associate (CRA) takes on many and that need to be met include Medical Devices
50 Understanding and Addressing Regulatory Concerns in Research 527
CTD Triangle
Not part
of the CTD
Regional
administrative
information
Module 1
Non-clinical
overview Clinical
Module 2
overview
The CTD
Quality overall Non-clinical Clinical
summary summary summary
The CTD triangle. The common Technical Document is organized into five modules. Module 1 is
region specific and modules 2, 3, 4 and 5 are intended to be common for all regions.
Fig. 50.1 The Common Technical Document triangle has five modules. Module 1 is region specific; Modules 2–5 are
common to all regions [30]
528 J. L. Koh et al.
around Good Clinical Practice (quality, safety, 15. International Committee of Medical Journal Editors.
efficacy, and multidisciplinary) guidance, ICMJE | recommendations | clinical trials. http://
icmje.org/recommendations/browse/publishing-
including documentation. and-editorial-issues/clinical-trial-registration.html.
• Research should be performed in accordance Accessed July 18, 2018.
with ethical principles. 16. National Library of Medicine. https://www.clinicaltri-
• Failure to comply with regulation can result in als.gov/. Accessed 18 July 2018.
17. ISRCTN Registry. http://www.isrctn.com/. Accessed
subject harm and can result in civil liability or 18 July 2018.
criminal charges. 18. World Health Organization. International Clinical
Trials Registry Platform (ICTRP). http://www.who.
int/ictrp/network/primary/en/. Accessed 18 July 2018.
19. http://www.nejm.org/doi/full/10.1056/NEJMe1515172.
20. Taichman DB, Backus J, Baethge C, Bauchner H, de
References Leeuw PW, Drazen JM, et al. Sharing clinical trial data—
a proposal from the International Committee of Medical
1. ICH GCP. http://ichgcp.net/. Accessed 18 July 2018. Journal Editors. N Engl J Med. 2016;374:384–6. https://
2. http://www.ich.org/fileadmin/Public_Web_Site/ICH_ doi.org/10.1056/NEJMe1515172.
Products/Guidelines/Efficacy/E6/E6_R1_Guideline. 21. https://clinicalcenter.nih.gov/recruit/ethics.html.
pdf 22. National Institutes of Health. NIH Clinical Center:
3. International Council for Harmonisation of Technical Ethics in Clinical Research. https://www.ncbi.nlm.
Requirements for Pharmaceuticals for Human Use nih.gov/pubmed/. Accessed 18 July 2018.
(ICH). Guideline for good clinical practice E6(R1); 23. http://ichgcp.net/8-essential-documents-for-the-con-
1996. p. 59. duct-of-a-clinical-trial.
4. http://www.ema.europa.eu/docs/en_GB/document_ 24. Essential documents for the conduct of a clinical trial.
library/Scientific_guideline/2009/09/WC500002874. Documents. http://ichgcp.net/8-essential-documents-
pdf for-the-conduct-of-a-clinical-trial. Accessed 18 July
5. Committee for Human Medicinal Products. Guideline 2018.
for good clinical practice E6(R2). London: European 25. http://www.ich.org/fileadmin/Public_Web_Site/
Medicines Agency; 2017. ICH_Products/Guidelines/Efficacy/E6/E6_R2__
6. http://www.ich.org/fileadmin/Public_Web_Site/ICH_ Step_4_2016_1109.pdf.
Products/Guidelines/Efficacy/E6/E6_R2__Step_4_ 26. International Council for Harmonisation of Technical
Presentation_06Feb2017.pdf. Requirements for Pharmaceuticals for Human Use
7. ICH E6(R2) Expert Working Group. Integrated adden- (ICH). Integrated addendum TO ICH E6(R1): guide-
dum to ICH E6(R1): guideline for good clinical prac- line for good clinical practice E6(R2). ICH; 2016.
tice E6(R2); 2017. 27. http://ichgcp.net/411-safety-reporting.
8. h t t p s : / / a b o u t . c i t i p r o g r a m . o r g / e n / s e r i e s / 28. Safety Reporting. http://ichgcp.net/411-safety-report-
good-clinical-practice-gcp/. ing. Accessed 18 July 2018.
9. Collaborative Institutional Training Initiative (CITI 29. http://www.ich.org/products/ctd.html.
Program). Good clinical practice (GCP)—CITI 30. International Council for Harmonisation of Technical
Program. https://about.citiprogram.org/en/series/ Requirements for Pharmaceuticals for Human Use
good-clinical-practice-gcp/. Accessed 18 July 2018. (ICH). CTD: ICH: ICH; http://www.ich.org/products/
10. h t t p : / / w w w . b a r n e t t i n t e r n a t i o n a l . c o m / ctd.html. Accessed 18 July 2018.
Publications/Good-Clinical-Practice%2D%2DA- 31. h t t p s : / / w w w. f d a . g o v / d o w n l o a d s / D r u g s /
Question%2D%2D-Answer-Reference-Guide- GuidanceComplianceRegulatoryInformation/
2017/?gclid=EAIaIQobChMI783lydi_2QIVyLbACh Guidances/UCM073280.pdf.
0ORwRVEAAYASAAEgL82vD_BwE. 32. US Department of Health and Human Services, Food
11. Good clinical practice: a question & answer refer- and Drug Administration, Center for Drug Evaluation
ence guide 2018 (electronic). Needham: Barnett and Research. Guidance for industry M4Q: the
International Publication; 2018. https://www.barnet- CTD—quality. Rockville: U.S. Department of Health
tinternational.com/good-clinical-practice-a-question- and Human Services; 2001.
amp-answer-reference-guide-2018-electronic-0. 33. h t t p s : / / w w w. f d a . g o v / d o w n l o a d s / D r u g s /
12. http://www.clinicaldevice.com/mall/Workshops.aspx. GuidanceComplianceRegulatoryInformation/
13. Medical Device Workshops. http://www.clinicalde- Guidances/UCM073299.pdf.
vice.com/mall/Workshops.aspx. Accessed 18 July 34. US Department of Health and Human Services, Food
2018. and Drug Administration, Center for Drug Evaluation
14. http://icmje.org/recommendations/browse/publish- and Research. Guidance for industry M4S: The
ing-and-editorial-issues/clinical-trial-registration. CTD—safety. Rockville: U.S. Department of Health
html. and Human Services; 2001.
50 Understanding and Addressing Regulatory Concerns in Research 531
trust within the team is predicated on reliability. important to establish. At our lab, we find our
To be an effective team, everyone needs to be weekly lab meetings a perfect opportunity for
relied upon to do their job, such as meeting dead- diverse opinions. During the lab meetings, we
lines and fulfilling expectations. In addition, team critically evaluate each project being conducted
members must trust each other’s abilities and within our research group and discuss lab phi-
expertise. There needs to be a balance of keeping losophy. The lab meetings present an opportunity
each other accountable for their work without to not only critically evaluate the projects but also
wasting time and resources. As a team member, it provide the presenter the opportunity to practice
is important to trust one another and be trustwor- effective communication of their project to a
thy yourself, in terms of your character and work. diverse audience (i.e., undergraduate students,
Without trust, the productivity of the research Ph.D. students, medical students, residents, fel-
team and quality of work may be compromised. lows, clinicians, engineers, professors, etc.).
Communicating with people within one’s own Support from your institution for research collabo-
discipline can be difficult. Interdisciplinary com- ration efforts cannot be understated. Institutional
munication is even more challenging, yet essen- support that provides adequate time, funding, facil-
tial for the success of collaboration between basic ities, equipment, and personnel puts your collabo-
scientists and clinicians. Ineffective communica- ration efforts in a position to succeed. Moreover,
tion can be due to not understanding each other institutional support can help maintain long-lasting
and is generally due to poor terminology and collaborations and allow performance of research
overuse of technical jargon. When communicat- at a high level. For example, our lab was developed
ing between disciplines, it is important to use through the support of the Departments of
consistent terminology to what is accepted by the Orthopaedic Surgery and Bioengineering. Their
scientific community and limit technical jargon. combined support provides an ideal environment
Thus, effective communication can be achieved for training and completion of research projects in
within your own research group, as well as the a multidisciplinary environment.
scientific community at large. Furthermore,
effective communication can improve research
efforts to produce high-quality studies. The key is 51.6 Have Fun!
to promote a diversity of opinions between team
members. Diverse opinions may promote dis- At the end of the day, what makes collaborations
agreement, but they can push your collaboration work is truly enjoying what you are doing and
to become better (i.e., strengthen relationships who you are doing it with. Working with people
and trust, new ideas and solutions to problems). who share the same amount of enthusiasm and
However, too much disagreement can lead to passion as you about what you are doing is fun.
major conflicts that drive collaborations apart. A Answering complicated healthcare questions and
formal mechanism for conflict resolution is pioneering scientific breakthroughs are very
536 R. E. Debski and G. A. Ferrer
spending in 2008 [10, 11], and in 2012, that num- ing persons with multiple chronic conditions, the
ber grew to 83%. Not to mention those with five current mechanism of coordination is lacking and
or more chronic conditions had an average of needs reconfiguring to increase efficiency and
almost 15 physician visits and filled over 50 pre- ensure safety and proper treatment. The ultimate
scriptions in a year. Osteoarthritis, a degenerative goal is to help, not hinder a patient. It appears
joint disease, affected 54 million Americans in obvious that coordination should be as smooth
2014. According to the CDC, that number will and with the least number of hand-offs as possible
rise to 67 million in the year 2025 [12]. to minimize time delay in health-care delivery.
and reproducible evaluation of published medical trials conducted on one specific antidepressant,
literature. Evidence-based CPGs are designed to 38 produced positive results and 36 found the
withstand the type of scrutiny and review its drug to have “questionable or no efficacy” [18].
“expert group/consensus-based” predecessor However, only 8% of the “questionable or no
could not. They serve as an effective synthesis of efficacy” studies were published, while 94% of
an enormous literature database, providing a the positive studies were published. Moreover,
complete yet concise summary of available 15% of the 8% “questionable or no efficacy”
knowledge and a detailed treatment plan for a studies were published in such a way as to spin
specific topic or condition. These “evidence- the results in a positive form [19]. As drug com-
based guidelines” undergo a rigorous protocol to panies can cherry pick which data they wish to
deliver the optimum care route for the patient [2]. present, it is easy for physicians and medical pro-
Such a guideline should streamline patient care viders to inadvertently develop a biased opinion
while ensuring patient safety and increasing out- about the drug. This can influence clinical prac-
come success. As with all information and tech- tice and prescribing habits. Unsurprisingly, addi-
nologies, CPGs are subject to regular updates as tional studies have found that industry-sponsored
new research and clinical studies are published studies are significantly more likely to report
[3]. CPGs are beneficial in that they provide an favorable results and less likely to report unfavor-
efficient source of information for the best course able outcomes than their federally funded coun-
of treatment while allowing for flexibility in a terparts [15, 17]. This is troubling as many drugs
treatment pathway [3]. are associated with serious adverse effects.
As such, it is vital that developers of CPGs
look closely at the research evidence and develop
52.2.1 Trustworthy CPGs CPGs in a trustworthy, reproducible, and trans-
parent manner. Developers must consider not
The need for trustworthy guidelines is one of the only the findings but also who sponsored the
main driving factors for the new guidelines. study. They must rigorously scrutinize if the
Guidelines must be developed by a qualified and results are reported truthfully. Only then can
diverse group of individuals. The development guideline development occur with minimal bias
process critically analyzes the data, the source of and maintain reliability.
the data, and those who conducted the study to
ensure limited guideline bias. Bias and COIs can
influence the efficacy of and impact published 52.3 Development of CPGs
research findings have on a community [15–17].
Bias must be minimized in order to provide the In response to Congress and to develop trustwor-
public with the most trustworthy guideline. When thy guidelines, the IOM has established eight
bias and COIs are allowed to traverse the bound- standards of developing CPGs [1]:
ary line between good research and bad, the
effect can be detrimental. 1. Transparency.
For instance, therapeutic drug research often 2. Management of conflicts of interest (COI).
is run more like a promotional campaign for 3. Development group diversity.
pharmaceutical companies, rather than a clinical 4. Systematic review.
research study, intended to increase sales rather 5. Evidence and recommendation strength.
than improve drug performance [18]. In 2008 the 6. Articulation of recommendations.
New England Journal of Medicine published a 7. External review.
study [19] which reviewed the selective publica- 8. Updating.
tion process of antidepressant drugs and the
effect those selected publications have on drug Each of these standards is intended to create
efficacy. The study found that of the 74 clinical the most well-researched, trustworthy, and clini-
52 A Clinical Practice Guideline 541
cally relevant guideline possible. These stan- 52.3.3 Development Group Diversity
dards, or “guidelines for the guidelines,” are
imperative in ensuring the reproducibility and A trustworthy CPG depends greatly on its team
clarity of the guideline development process—a of developers. A diverse team—One that includes
factor the previous recommendation develop- a member from every discipline or party associ-
ment process lacked. ated with its implementation or consumption—
Can provide a well-rounded guideline with the
interest of all parties protected [4]. This team
52.3.1 Transparency includes primary care physicians, specialists,
nurses, other providers, and any other party who
A transparent guideline serves two main may utilize the guideline. Patients, or other prox-
purposes: ies, who may advocate for patients, must be also
The first is to ensure unambiguous and repro- present. Patients and other proxies need no prior
ducible guidelines. Transparency ensures the medical experience with the guideline topic as
guideline is clear and easy to follow. The treat- their role is to provide a voice for patients.
ment pathway should be well articulated. Such a diverse group ensures that patients’
A transparent guideline also fully discloses needs are protected and concerns are respected.
author information, conflicts of interest (COIs),
and guideline funding.
Transparency allows physicians and patients 52.3.4 Systematic Review
to evaluate for themselves the reliability of and
potential biases within a guideline. Ideally, this The systematic review (SR) process determines
translucent nature of the guideline development the inclusion and exclusion criteria for the litera-
process will deter biases from crossing into the ture search. During the SR process, articles are
development process, further cementing the gathered, analyzed, and interpreted, and relevant
trustworthy quality of the guideline. data is summarized. The SR process begins with
an all-inclusive search of the medical literature
and ends with a preliminary draft of the
52.3.2 Management of Conflicts guideline.
of Interest
Another factor the guideline team must con- For example, a pharmaceutical company has
sider while making recommendations is the fact two drugs, Drug A and Drug B, which both
that a statistically significant finding may not be decrease anxiety. Drug A has a success rate of 95%,
clinically relevant [21]. To resolve this discrep- whereas Drug B has a success rate of 89%. Drug A
ancy, the America Academy of Orthopaedic costs five times as much as Drug B and has much
Surgeons (AAOS) has applied the minimal clini- more severe side effects than Drug B. Statistically,
cally important improvement (MCII) method for Drug A has a significantly greater success rate;
determining clinical significance in research. however, clinically, Drug B is far more appealing in
This is similar to the minimally important differ- the eyes of the clinician and the patient. It saves the
ence (MID) or the smallest amount of change a patient money and potentially harmful side effects
patient may distinguish. Identifying clinical and yields nearly the same treatment outcome. In
significance is important because a research find- this case, the statistically significant finding isn’t
ing that may be statistically significant to a applicable in the clinical setting as the patient
researcher may not be relevant to patient treat- wouldn’t be able to distinguish a difference
ment. Thus, certain research findings may not between the successes of both drugs.
actually bear enough clinical weight to warrant a Medical literature is analyzed and ranked by
change in clinical treatment. its quality of study design—the highest-quality
evidence corresponds to the lowest risk of bias.
The AAOS has developed a reliable
A statistically significant finding may not “Clinical Practice Guideline Strength of
be clinically relevant [21]. Recommendation” rubric (Table 52.1) [3] that
has been proven to generate strong CPGs. The
Table 52.2 AAOS recommendation language table care maps or flow diagrams may be the best way
Strength of to convey information as they are concise and
Guideline language recommendation easy to read and understand even when multiple
Strong evidence supports that the Strong
practitioner should/should not do X,
variables are present. Multiple recommendations
because… are included in the guideline to account for the
Moderate evidence supports that the Moderate variances that may arise. No two clinical cases
practitioner could/could not do X, are the same; as such the guideline provides a
because…
myriad of recommendations, so the physician
Limited evidence supports that the Limited
practitioner might/might not do X, may alter treatment as needed.
because…
In the absence of reliable evidence, it is Consensusa
the opinion of this work group that…
52.3.6 Articulation
a
Consensus-based recommendations are made according of Recommendations
to specific criteria. These criteria can be found in
Appendix VII
Recommendations must be clearly written. They
strength of a recommended treatment pathway must be presented in the standardized format that
in a CPG is based off the quality of its support- includes a detailed treatment pathway as well as
ing evidence (Table 52.2) [3]. Evidence quality circumstances in which each recommendation
is based on the following hierarchy of study should be used. Particular language is used to
design [3]: properly express the strength of the recommen-
dation as well as the level of confidence the
• High quality: <2 study design flaws. development team has in the recommendation.
• Moderate quality: ≥2 and <4 study design This information is vital as it allows the reader to
flaws. evaluate how closely the guideline should be
• Low quality: ≥ 4 and <6 study design flaws. followed.
• Very low quality: ≥6 study design flaws.
2. Reliable guidelines are reproducible. They are change the overarching treatment plan but
guidelines in which a peer reviewer comes to changes minor steps. This may be due to new
the same conclusion as the focus group. research findings.
3. Feasible guidelines are easily understood by 3. Major revisions: A major revision is any revi-
both patients and physicians and allow for sion that significantly alters the treatment
both routine use and case-by-case modifica- plan, course of action, or main conclusion of
tions when necessary. guideline.
The review team’s written comments are col-
lected into a single response form which is Updates and amendments undergo an inde-
then reviewed and responded to by the chair of pendent review and majority vote and are then
the guideline development team. Guideline published and distributed with an alert that the
development team members vote on all sug- guideline has been revised.
gested revisions to recommendation language Each guideline is accompanied by its “profile.”
and are accepted with a majority vote [24]. A short statement that discloses the entire deci-
The revision process is documented and sion-making process includes the development
reported in the guideline document until final team’s values, the evidence quality and harm-ben-
guideline approval [24]. efit assessments, and the level of confidence the
team has in the evidence. Limitations of the guide-
line are also expressed such as intentional vague-
52.3.8 Updating ness the team may have included [4]. CPGs will
not always be correct, as they must be revised with
CPGs are subject to routine updates and amend- new information as research is published, nor will
ments as new information presents itself or as time they be entirely bias-free. The CPG process out-
passes. Certain branches of medicine, such as the lined by the IOM aims to limit the amount of bias
American Academy of Orthopaedic Surgeons and that seeps into the recommendation development
American Association of Otolaryngology-Head process, increase consistency within patient care,
and Neck Surgery (AAO-HNS), update their and streamline the health-care delivery process.
guidelines at a minimum of 5 years after publica- Often a brief disclaimer is added to the begin-
tion [21, 25]. Situations that warrant guideline ning of the guideline abstract, such as this one
updating may include but are not limited to [4, 26]: from the American Academy of Otolaryngology-
Head and Neck Surgery Foundation (AAO-
• Changes/advancements in available treatment HNSF) [4]:
or intervention methods. This clinical practice guideline is not intended as a
• New evidence that impacts current treatment. sole source of guidance in managing [topic speci-
• Changes in health-care availability, affordabil- fied here]. Rather, it is designed to assist clinicians
by providing an evidence-based framework for
ity, or access. decision-making strategies. The guideline is not
intended to replace clinical judgment or establish a
In addition to updates, CPGs may undergo protocol for all individuals with this condition, and
amendments. There are three types of may not provide the only appropriate approach to
diagnosing the managing the problem.
amendments:
AAOS includes similar language in the intro-
1. Reaffirmations: This simply consists of a brief duction to the guidelines:
statement of the organization’s agreement with This guideline should not be construed as includ-
the current guideline. This occurs when the ing all proper methods of care or excluding meth-
guideline requires no significant alterations ods of care reasonably directed to obtaining the
such as when time passes but treatment meth- same results. The ultimate judgment regarding any
specific procedure or treatment must be made con-
ods and research findings have not changed. sidering all circumstances presented by the patient
2. Minor revisions: A minor revision includes and the needs and resources particular to the local-
any alteration to the guideline that doesn’t ity or institution.
52 A Clinical Practice Guideline 545
52.3.9 Implementation
options, and this condition impacts a sig-
Guidelines are only as effective as those who nificant number of patients in the USA. The
implement them. The National Guideline extensive literature had not yet been con-
Clearinghouse (NGC) is responsible for the cisely accumulated to distinguish best
announcement, promotion, and distribution of treatment options for appropriate cases. A
CPGs [14]. Once the guideline is ready for systematic review of the literature was con-
implementation, it is crucial that physicians and ducted in adherence with the aforemen-
all health-care providers use the guideline to tioned guideline criteria. In the ranking of
deliver the highest quality of care to the patient. their guideline recommendations, the
guidelines were assigned a star grade to
easily distinguish strength of recommenda-
52.3.10 Outcomes Assessment tion (4 stars for a strong recommendation,
3 stars for a moderate-strength recommen-
Outcomes assessments are important measures to dation, etc. in accordance with Table 52.1).
determine whether or not treatment has been suc- The stars aligned the IOM’s “strong,”
cessful [14]. A CPG, like any treatment, under- “moderate,” “limited,” and “consensus”
goes an “outcome assessment.” However, the rubric language terms.
organizations typically involved in outcome Of the 20 recommendations put forth by
assessments, such as the National Quality this ACL injury management CPG, 5 had
Measures Clearinghouse (NQMC) and others, strong supporting evidence (4 stars), 6 had
are not involved in the outcome assessment of moderate supporting evidence (3 stars), 7
CPGs. This is because the evaluation of outcomes had limited supporting evidence (2 stars),
is built into the CPG development process and and only 2 were consensus-based. This
expressed by the ranking of the recommenda- CPG shows not only great advancement in
tions. As such, the IOM Committee on Quality the strength of the orthopedic research
Health Care in America resolves that there is no being conducted but also great improve-
need for the rating of quality measure of CPGs as ment in the guideline themselves. Only
it would be redundant. Moreover, an additional 6 years prior, in 2009, AAOS published its
rating could create conflicts of interest as some first CPG on diaphyseal femur fractures in
CPG developers also develop related outcome pediatrics. This guideline, although much
assessment rubrics [14]. stronger than its consensus-based prede-
cessor, only had 1 recommendation out of
its 14 that would have received a 4-star
Clinical Vignette ranking and only 2 that would have received
Management of Anterior Cruciate 3 stars. The 2015 CPG on ACL injury man-
Ligament Injuries: agement shows great advances from both
As of 2017, the AAOS has completed 18 an orthopedic research perspective and a
CPGs. In 2015, the AAOS published a clinical practice guideline and care man-
guideline for the “Management of Anterior agement perspective.
Cruciate Ligament Injuries” providing The following are a few examples of the
physicians with a detailed, outlined plan to strongest recommendations from the ACL
aid in the prompt and accurate treatment of injury management CPG [27]:
ACL injuries [27].
The topic was chosen for guideline 1. “Strong evidence supports that the prac-
development as some controversy existed titioner should obtain a relevant history
over best treatment and management and perform a musculoskeletal exami-
546 A. Dingel et al.
CPGs provide clinical guidance for a wide vari- 1 . The process is time-consuming and expensive.
ety of topics and help manage specific condi- 2. Patient feedback is ideal, but often not avail-
tions. CPGs may be used to support care maps, able from in the literature published.
52 A Clinical Practice Guideline 547
Normal No
Yes
Exam?
DDH
No Trained Staff Yes
Accessible?
Wait Until
Baby is 16 Ultrasound
wks
AP Pelvis
X-ray
Yes Normal?
No
Yes Normal? No
Continue Routine DDH Well Baby Exams Refer to Peds Ortho Specialist
3 . Guidelines are subject to misinterpretation. database, members of the clinical practice guide-
4. Guidelines must be continuously updated, to line work groups are allowed to create a compan-
reflect changes in the published literature. ion consensus statement [14]. These are
5. Adequate literature is required to develop
statements based on expert opinion and are pub-
CPGs, so areas lacking in literature won’t lished separately from the CPGs to ensure sepa-
qualify for CPG development. ration of expert-opinion-based recommendations
6. Guidelines are only as effective as those who and evidence-based recommendations.
implement and abide by them.
7. For less common conditions, adequate
research literature may not support the devel- 52.6 Future Studies
opment of a CPG.
Although the guidelines have limitations, the
For those guideline patient care questions that guidelines are beneficial for many reasons. The
lack relevant research or have an insufficient guideline process identifies medical areas which
548 A. Dingel et al.
tate networking as well. To this end, armed with the timeline, and plan for execution to maintain
this intimate knowledge of conference content, it focus. However, understand that the pre-meeting
would be wise to contact and establish meetings plan is not “written in stone” [5], and those who
with people of interest at this juncture, well in truly benefit from meeting attendance often spend
advance of the meeting date, as it can be nearly time to reflect at the end of each conference day
impossible to exchange itineraries at the venue on the value obtained from the pre-plan—main-
itself. In-depth review of the conference outline taining flexibility to modify the remainder of the
and content also will facilitate advanced prepara- schedule to maximize experiences and follow-up
tion for technical workshops and maximize edu- on new connections made. Again, thorough
cational yield. It will enable one to contribute to knowledge of all opportunities a given meeting
as well as gain from interactions with others in presents ahead of time eases this transition.
these more focused sessions. Typically, during the check-in process, partici-
pants receive materials including the full program,
an identification badge, and a tote bag. It is impor-
53.4 Imaging tant at this juncture to identify key locations
including the resource center, exhibit hall, lecture
Though especially important for the large society hall, and food and restroom locations and obtain
annual meetings, reviewing the conference and Wi-Fi access at the conference location.
accommodation venue layouts is paramount.
Obtain the “big picture” of where large group
(didactic) sessions are located versus such other 53.5.1 Education
elements as symposia, podium presentations,
poster/e-poster presentations, surgical demon- While visiting symposia, podium presentations,
strations, instructional course lectures, panel dis- and poster exhibitions, among other conference
cussions/debates, technical/vendor exhibits, events, it is critical to be engaged. However, being
hands-on workshops, career development/prac- engaged oftentimes does not lead to retention of
tice management booths, etc. so that such knowl- learned information at the conference [17]. In
edge of both the content and layout will enable fact, 3 and 90 days following a conference, the
thoughtful planning of an individualized sched- mean retention rate was reported of only 14.9%
ule. Know when scheduled breaks are and take and 11.3%, respectively [17]. One solution for
them. Do not plan on filling every moment of the this may be purposeful self-selection and atten-
day with conference-related activity, and make a dance of poster presentations or podium presenta-
conscious effort to sample all the different types tions. Secondly, having access to conference
of settings for delivering content, all while bal-
ancing scheduled personal time to focus on
fatigue management [22] and sleep hygiene [4]. Fact Box 53.2
Flipped sessions involve exposing learners
to content before the classroom session, so
53.5 The Approach that learners are more prepared and the
classroom sessions can focus on interactive
Arrive early, well rested, and free from outside activities and questions the learners may
distractions. A “pearl,” if possible, would be to have [13]. This model works well for scien-
not bring unfinished work (i.e., dictations, manu- tific meeting sessions as learners have vary-
scripts to review, etc.) to the meeting. While in ing levels of experience, so the learners can
attendance, one should be wholeheartedly focused tailor the sessions to fit their needs. The
on being mentally and physically present. If nec- moderators of the sessions can then act as
essary, perform a pre-meeting “time-out” to again facilitators rather than lecturers [13].
review one’s intended and individualized goals,
554 D. de SA et al.
materials in advance and preparing for meaning- to a conference participant, it is equally critical
ful dialogue can promote and enhance a “flipped that it not be used to answer work-related emails
session” format, which is a contemporary model or provide means to other distractions. This only
of learning in which lecturer and viewers engage detracts from the purpose and opportunity of the
one another with question and answer—shown to conference. Furthermore, compartmentalization
aid learning and retention [13, 21]. As more con- of work-related materials to the end of the day,
ferences adapt this format [13, 21], preparation and focusing on it with full attention is a strat-
for conferences can greatly augment the confer- egy shown to provide improved efficiency and
ence experience. That being said, approach the results [11].
conference as a learner, and set aside personal
agenda and/or ego.
Conferences are significant in that current and 53.5.2 If Presenting
up-to-date concepts are discussed among experts
in the field. During meetings, listening to “buzz- Preparation is of the utmost importance for exe-
words” or “themes” may help flavor future cuting a successful presentation at a scientific
research directions as they provide a “snapshot” meeting. Initially it is important to research effec-
of the field. Likewise, learning about ongoing, tive presenting strategies and techniques in antic-
impactful research projects generates new ipation of the presentation. Constructing slides
research ideas and innovation. Often, new ideas that are readable and concise in a format that is
are invented during conversation among orthope- easily consumed by the audience is difficult and
dic surgeons at these conferences. must be consistently at the forefront of the pre-
As the concept of a scientific meeting evolves, senter’s mind during composition. Formatting
so too does technology, and as such, the internet and color schemes need to be consistent with
has drastically changed the way in which confer- other presentations within the department or lab-
ence participants interact. An increased number oratory where the work was performed.
of conferences are utilizing social media, such as Furthermore, formatting should adhere to any
Twitter® or smartphone meeting/conference listed guidelines for the meeting. Standard con-
applications, to engage conference participants tent includes a brief background, hypothesis and
as well [14]. For example, there are times when aims, methodology, results, conclusions, and dis-
traditional poster presentations fail to elicit cussion. Each slide should be purposeful, with a
engaging questions and enthusiastic viewers. “take-home message” that is easy to convey. Use
Traditional oral presentations can largely be lec- of images can be particularly helpful. Lastly, the
ture style and less interactive. Some conferences slides and material must be tailored to the tar-
have therefore utilized Twitter and other social geted audience. Attendees may be comprised of
media outlets to encourage conversation among scientists, clinicians, vendors, mathematicians,
conference participants and interested outsiders statisticians, engineers, residents and fellows, or
[14]. In a study among the urological community, any combination thereof. The emphasis and mes-
for example, conference attendees using Twitter® sage of the presentation may change depending
found it beneficial for networking (97%), spread- on the target audience.
ing information (96%), research (75%), advo- After the slides have been prepared and edited
cacy (74%), and career development (62%) [1]. by the presenter and advisor, live practice to hone
As the social media presence of orthopedic sur- timing and delivery and anticipate potential audi-
geons grows [18], the importance of utilizing ence questions is necessary. There should be no
Internet platforms to bolster conversation and difficulty complying with time limits established
dialogue cannot be ignored. under the presentation guidelines. Failure to do so
The Internet, however, can be a “double-edged demonstrates lack of preparation and will distract
sword,” and though it poses tremendous advantages from the presentation and “take-home messages.”
53 How to Navigate a Scientific Meeting and Make It Worthwhile? A Guide for Young Orthopedic Surgeons 555
Scripting the presentation so as to avoid unneces- be delivered with vigor and glean equal insights.
sary pauses or lapses is helpful. Lab meetings or Although often led by a moderator to keep to a
educational conferences are advantageous venues particular task at hand, it is important to remem-
to practice in front an audience that will be similar ber the group nature of a roundtable, and as such,
to the scientific meeting. Monitor the timing, elicit one must ensure that when participating, all mem-
feedback, and make final edits as needed to clarify bers present are being addressed and afforded
the message of the presentation. Remember, you opportunities to engage as well. Questions should
represent not only yourself but your institution— be posed with appropriate tone, so as to convey a
and this must not be overlooked. sense of open-mindedness and to be mindful of
At the scientific meeting, the first manner of the process. To that end, all action items of the
order is to locate the “Speaker Ready” room to roundtable may not be addressed during the ses-
upload presentation materials. Most meetings sion, and emphasis should remain on thoroughly
will specify a location with technical assistants, addressing a particular item, as opposed to rush-
ready to upload the appropriate materials. Follow ing transitions through topics or pressing for a
the listed instructions to avoid any difficulties. consensus resolution.
Maintain a backup of the presentation on a por-
table memory device and/or in an e-mail inbox, 53.5.2.2 H ow to Make the Most
should the upload encounter problems. of Panel Discussions?
Finally, visit the venue in advance of the pre- On the contrary, a panel discussion involves
sentation in order to become familiar with the members of the audience listening to the perspec-
room and the technical tools available. Ensure tives of experts who have been pre-selected to the
that any pointers, clickers, or other presentation panel. Each individual panel’s focus may be dif-
tools are in working order. It can be helpful to ferent, and as an audience member, it is important
attend another presentation in the same venue to realize that members of the panel have differ-
earlier, to observe other technical shortcomings ent roles, suited to the particular meeting. Not all
or mistakes so as to avoid repeating them. It can panel members may agree with each other. Panel
also help a great deal with any degree of public members may be pre-selected to present a view-
speaking anxiety. point in keeping with their individual practice, to
present a viewpoint contrary to their practice, to
53.5.2.1 H ow to Make the Most debate other panel members, etc. Thus, to maxi-
of Roundtable Discussions? mize the yield from attending panel discussions,
A roundtable provides a small group of partici- one should keep this in mind and anticipate a pas-
pants (typically 10–12) the opportunity to, over a sive learning experience for the most part. It is
short 60–90 minute time period, partake in a helpful to not only prepare for the topic and
group discussion across specific items of a topic anticipated viewpoints ahead of time but to also
of interest. Given its academic nature, a roundta- research background information on each panel
ble encompasses a rapid exchange of high-level member, to familiarize yourself with their pre-
information among peers, with its success resting rogative in advance. This will further enable one
on input from all members in attendance. Often, it to identify some key areas of controversy and
is an opportunity to clarify approaches to current possible areas that will be addressed. Though
problems and provide an avenue to present and opportunities do exist to pose questions to the
weigh all viewpoints. Therefore, it is paramount panel, it is advised to use this platform wisely, be
that you first attend a roundtable that not only is mindful of the others in attendance, be respectful
of particular interest to you but to which you have of everyone’s time, and not use this opportunity
a sufficient knowledgebase and that is being as an avenue to express your own perspectives, to
attended by peers, ideally those you are familiar debate the panelists, or to engage in lengthy one-
with, so that contributions can, at the very least, on-one discussions.
556 D. de SA et al.
physician development and training. In this institutions as well, as the latter provides an
respect, conferences are key opportunities to opportunity to reconnect, reinforce relationships,
meet and connect with other physicians with and share experiences in a less-judgmental and/
shared interests. or anxiety-provoking environment.
Nowadays, business cards are less important, For junior learners going to conferences, net-
and there are countless methods to network working can come in a variety of forms: among
online, such as through LinkedIn®, ResearchGate®, peers from other institutions, with faculty, and
Facebook®, Twitter®, and Instagram®. For exam- with potential mentors. Mentors are invaluable
ple, among nearly 1000 Pediatric Orthopaedic resources for learners and can provide intangible
Society of North America members, 95% of benefits such as letters of recommendation, job
members had a professional webpage, 36.8% a positions, and career-long advice and guidance.
LinkedIn® page, 33% at least one YouTube® video, Junior learners can use the meeting as a chance
25.8% a ResearchGate® page, 14.8% a profes- to spend time outside of the work environment
sional Facebook® page, and 2.2% a professional with their mentors and build a more personal
Twitter® page [10]. Among members, private- relationship. Furthermore, by following a senior
practice physicians had double the utilization of faculty member, learners can also have the oppor-
social media [20]. Growing an Internet presence tunity to meet and network with other senior
prior to the conference may additionally improve members of the community. For mid-career and
networking. senior leaders, it is mutually beneficial to develop
As alluded to earlier, further preparation for their mentorship capabilities. While it can be
networking can include identifying presenters daunting for learners to network, it is simultane-
that one may be interested in meeting, or profes- ously imperative that junior learners seize oppor-
sional societies one may be interested in joining. tunity to learn from more experienced physicians
Performing the necessary research on their areas and that they get sufficient face time with leaders
of interest, and finding a niche for potential col- in the field. This is especially important given the
laboration, can increase the likelihood for suc- “Internet age,” as the younger generation is more
cessful networking. Preparing a three-sentence likely to engage online rather than face to face
“Tell me about yourself” pitch should also be [15]. Junior learners should also take notice of
practiced, rehearsed, and delivered confidently. how more senior conference attendees act in
Recent trends suggest the content of these “eleva- order to learn and seek advice regarding the
tor pitches” has shifted toward why a business “unspoken rules of proper conference etiquette”
exists and that the average adult attention span is [19]. Further, similar to resident and attending
currently just 8 s—perhaps due to the advent of physicians, a junior learner can build his/her
smartphones [16]. Seizing the brief windows of reputation in the field by their behavior at a con-
opportunity that arises during a conference, with ference. Lastly, finding a genuine connection to
esteemed professionals in the field, can dramati- junior learner peers can go a long way, since
cally affect career path. As such, the importance these fellow learners will most likely continue to
of attending the social events, alumni and society become peers and leaders in the field in the
meetings, and approaching people has not gone future.
unnoticed. Besides addressing one’s nutritional
needs, noting the breakfast and lunch schedules
can be invaluable networking opportunities, and 53.5.4 Equipment/Vendors/Suppliers
it is key that one sits at an active table, engages in
discussion with others, and removes any distrac- Sponsors, vendors, equipment suppliers, and
tions (i.e., concomitant laptop or smartphone many medical companies occupy a large and cen-
use). Try to balance efforts on meeting new peo- tral role in scientific meetings. Many attendees
ple with time spent with members from one’s avoid these areas and interactions with represen-
own institution or previous colleagues at other tatives to avoid the stigmatized label of being an
558 D. de SA et al.
“industry-sponsored physician.” The relationship productive and what formats resulted in disen-
between industry and physicians has been scruti- gagement. This is important to help form future
nized and more regulated recently due to past conference planning, though a fair degree of tol-
unscrupulous practices by some physicians and erance should be exercised as poor experiences in
industry leaders [6]. However, attendees should one format may be one-off experiences due to a
not be apprehensive about going to these areas of multitude of factors. Often meetings have post-
the scientific meeting as industry representatives conference access to the program, video presen-
play vital roles in the hospital to provide patient tations, copies of presentations, etc., and one
care and assist physicians. Establishing relation- should obtain copies of these for future reference
ships and developing them over time have bene- materials.
fits for both sharpening surgical technique and
providing enhanced patient care. Mastering the
plethora of tools in the surgical armamentarium 53.7 Follow-Up
and recognizing the capabilities and deficiencies
of each are an arduous process of experience. It is important to establish a platform to share the
Vendors and representatives from equipment sup- knowledge gained from the meeting with others,
pliers are a wealth of technical product knowl- as this allows the reach of the particular confer-
edge and can be used to help expedite this ence to grow exponentially and further integrates
learning curve. They can share “tips and tricks” the key concepts. Preferably, these summary
from other surgeons and provide discounted meetings are established pre-departure. Lab
access to educational opportunities such as members can collaborate by assigning certain
cadaver labs, sawbones sessions, and research topics beforehand to each member who is attend-
opportunities to compare products. Lastly, these ing. Then, in the summary meeting, each member
connections may increase surgeon exposure to will summarize what he or she has learned for the
other techniques and equipment serving similar rest of the group. This way, all members can
existing purposes to those at home institutions expand their knowledge from the meeting even if
but at a decreased cost. As the landscape of medi- they did not attend.
cine transitions to a value-based healthcare The importance of following up with key con-
approach with an evolving financial payout meth- nections made during the meeting cannot be
odology [12], reduced equipment costs with sim- underestimated. Do not hesitate to implement
ilar function and outcome can play a pivotal role. social media, e-mail, etc. to send a short message
to new connections—not only does this reinforce
to them the value of their personal connection,
53.6 Closure but this may further lead to future collaborations
and networking opportunities. Finally, if not
At the conclusion of the scientific meeting, if already done, this remains a last opportunity to
done right, one should be refreshed, be reinvigo- not only join the society hosting the meeting, if of
rated in their chosen field, and possess a sense of interest, but to also provide meeting feedback and
accomplishment. The initial moments represent a ensure continuing medical education (CME) cer-
vital period for both reflection and following up tificates are received/statuses updated.
on connections established during the meeting.
Though not necessary, any notes made should be
revisited to reinforce key ideas, return to further 53.8 Conclusion
reading materials/resources, or identify the
springboard for the next set of personal activities. In conclusion, scientific meetings are excellent
Be critical of your experience, evaluating what avenues for pursuing continuing medical educa-
content and session format was most enjoyable/ tion. To take advantage of these meetings, a
53 How to Navigate a Scientific Meeting and Make It Worthwhile? A Guide for Young Orthopedic Surgeons 559
3. Bowman J, Mogensen L, Marsland E, Lannin N. The 12. Porter ME, Lee TH. From volume to value in health
development, content validity and inter-rater reli- care: the work begins. JAMA. 2016;316(10):1047–8.
ability of the SMART-goal evaluation method: a https://doi.org/10.1001/jama.2016.11698.
standardised method for evaluating clinical goals. 13. Ramnanan CJ, Pound LD. Advances in medical edu-
Aust Occup Ther J. 2015;62(6):420–7. https://doi. cation and practice: student perceptions of the flipped
org/10.1111/1440-1630.12218. classroom. Adv Med Educ Pract. 2017;8:63–73.
4. Brick CA, Seely DL, Palermo TM. Association https://doi.org/10.2147/AMEP.S109037.
between sleep hygiene and sleep quality in medi- 14. Randviir EP, Illingworth SM, Baker MJ, Cude M, Banks
cal students. Behav Sleep Med. 2010;8(2):113–21. CE. Twittering about research: a case study of the World’s
https://doi.org/10.1080/15402001003622925. first twitter poster competition. F1000Res. 2015;4:798.
5. Devlin R. Value of networking. Emerg Nurse. https://doi.org/10.12688/f1000research.6992.3.
2016;24(7):17. https://doi.org/10.7748/en.24.7.17.s23. 15. RitchieA. How to network with fellow physicians to land
6. Flacco ME, Manzoli L, Boccia S, Capasso L, the job you want. Med Econ. 2013. http://www.medi-
Aleksovska K, Rosso A, Scaioli G, De Vito C, caleconomics.com/modern-medicine-feature-articles/
Siliquini R, Villari P, Ioannidis JP. Head-to-head how-network-fellow-physicians-land-jobyou-want.
randomized trials are mostly industry sponsored 16. Robinson R. The art of the elevator pitch: 4 tips
and almost always favor the industry sponsor. J for making an impression. Forbes. 2017. https://
Clin Epidemiol. 2015;68(7):811–20. https://doi. www.forbes.com/sites/ryanrobinson/2017/09/05/
org/10.1016/j.jclinepi.2014.12.016. elevator-pitch-tips-making-impression/.
7. Kay J, Memon M, de Sa D, Duong A, Simunovic N, 17. Saperstein AK, Lennon RP, Olsen C, Womble L,
Athwal GS, Ayeni OR. Five-year publication rate of Saguil A. Information retention among attendees at
clinical presentations at the open and closed American a traditional poster presentation session. Acta Med
shoulder and elbow surgeons annual meeting from Acad. 2016;45(2):180–1. https://doi.org/10.5644/
2005-2010. J Exp Orthop. 2016;3(1):21. https://doi. ama2006-124.178.
org/10.1186/s40634-016-0059-z. 18. Sculco PK, McLawhorn AS, Fehring KA, De Martino
8. Kay J, Memon M, de Sa D, Duong A, Simunovic N, I. The future of social media in orthopedic surgery.
Ayeni OR. Does the level of evidence of paper pre- Curr Rev Musculoskelet Med. 2017;10(2):278–9.
sentations at the Arthroscopy Association of North https://doi.org/10.1007/s12178-017-9412-9.
America annual meetings from 2006-2010 correlate 19.
Sohn E. Networking: Hello, stranger. Nature.
with the 5-year publication rate or the impact factor of 2015;526(7575):729–31.
the publishing journal? Arthroscopy. 2017;33(1):12– 20. Sumrein BO, Huttunen TT, Launonen AP, Berg HE,
8. https://doi.org/10.1016/j.arthro.2016.05.032. Felländer-Tsai L, Mattila VM. Proximal humeral frac-
9. Kay J, Memon M, Rogozinsky J, de Sa D, Simunovic tures in Sweden-a registry-based study. Osteoporos
N, Seil R, Karlsson J, Ayeni OR. The rate of publi- Int. 2017;28(3):901–7. https://doi.org/10.1007/
cation of free papers at the 2008 and 2010 European s00198-016-3808-z.
Society of Sports Traumatology Knee Surgery and 21. Torre D, Manca A, Durning S, Janczukowicz J, Taylor
Arthroscopy Congresses. J Exp Orthop. 2017;4(1):15. D, Cleland J. Learning at large conferences: from the
https://doi.org/10.1186/s40634-017-0090-8. ‘sage on the stage’ to contemporary models of learn-
10.
Lander ST, Sanders JO, Cook PC, O’Malley ing. Perspect Med Educ. 2017;6(3):205–8. https://doi.
NT. Social media in pediatric orthopaedics. J Pediatr org/10.1007/s40037-017-0351-3.
Orthop. 2017;37(7):e436–9. https://doi.org/10.1097/ 22. Wong LR, Flynn-Evans E, Ruskin KJ. Fatigue
BPO.0000000000001032. risk management: the impact of Anesthesiology
11. Maloney S, Tunnecliff J, Morgan P, Gaida J, Keating Residents’ work schedules on job performance
J, Clearihan L, Sadasivan S, Ganesh S, Mohanty P, and a review of potential countermeasures. Anesth
Weiner J, Rivers G, Ilic D. Continuing professional Analg. 2018;126(4):1340–8. https://doi.org/10.1213/
development via social media or conference atten- ANE.0000000000002548.
dance: a cost analysis. JMIR Med Educ. 2017;3(1):e5.
https://doi.org/10.2196/mededu.6357.
How to Write a Scientific Article
54
Lukas B. Moser and Michael T. Hirschmann
Generally, a scientific paper is structured but comprehensive summary of your study and
following the IMRAD acronym. IMRAD most important findings. Letchford et al. investi-
stands for Introduction, Methods, Results, and gated the benefits of having a shorter title and
Discussion [16]. found that journals publishing papers with shorter
After the preparation of a proper study proto- titles are more frequently cited [13].
col (see Chap. 8), one should start the writing Jargon or colloquial wording has no place in
process with the Introduction and Methods fol- the title. Do not use abbreviations or acronyms in
lowed by the Results and Discussion. The the title. Never pose a question or use exclama-
Introduction and Methods can be written after tion marks in the title. For an easier identification
having finished the study protocol and might in search engines, it is recommended to use
need to be adapted later on [16]. words indexed as Medical Subject Headings
(MeSH) in your title [1].
Although many authors believe that a good title is Early isolated ACL reconstruction shows
found easily, the opposite is true. A good title is increased risk of arthrofibrosis—a prospective
brief and concise. However, it should give a brief randomized controlled study.
564 L. B. Moser and M. T. Hirschmann
54.4 H
ow to Write a Good The tibial and femoral components were graded
Abstract as internally rotated, neutral rotation, and exter-
nally rotated.
Together with the title, the abstract is often the
first and only part read by other researchers.
Hence, writing of a good abstract directly Checklist Abstract
influences the penetration and scientific power • Comprehensive summary
of your study. It decides if your study is • Avoid abbreviations
acknowledged and cited by your research • No passive voice
colleagues. • Sample size if percentages are reported
However, in practice often only limited time is • Effect size with confidence intervals
spent for the preparation of a good abstract. • Abstract can be read independently
Although the abstract is the last written part of a from the main text
scientific article, in a last effort, one should take • Purpose/introduction/background:
enough time to prepare it. What is known? Why is this study
The abstract should be a comprehensive sum- needed?
mary of the scientific article. Therefore, you • Methods: What did I do?
should carefully check the instructions for • Results: What did I find?
authors of your target journal before submitting • Discussion: What does it mean?
the abstract.
Most journals require rather structured than
unstructured formatting. Words are generally Results: Two groups were investigated—
limited to 150–350 words highlighting the impor- patients who underwent a medial parapatellar
tance of being brief and concise [1]. approach (MPA) and a lateral parapatellar
approach using a tibial tubercle osteotomy (LPA).
Means of tibial component rotation were 2.7323°
54.4.1 Practical Case Example ER ± 6.12323 (MPA) and 7.62323° ER ± 5.4232
(LPA). Patients of group LPA presented a signifi-
Prospective study using 3D-CT investigating the cantly less internally rotated (LPA, 18.43%;
influence of the approach in total knee arthro- MPA, 48.83%) and more externally rotated (LPA,
plasty (TKA) on TKA component rotation. 52.63%; MPA, 22.83%) tibial component
(p < 0.001). No significant differences were seen
for the femoral component position, tibial valgus/
54.4.2 Poor Abstract varus, and tibial slope.
Conclusion: Patients of group LPA presented
Purpose: TKA is a successful treatment option a significantly less internally rotated and more
for end-stage osteoarthritis of the knee. Outcome externally rotated tibial component. It appears
after TKA is influenced by numerous surgery and that a LPA tends to externally rotate the tibial
patient related factors. The purpose of this study TKA component.
was to investigate which factors influence TKA
position.
Methods: This study included 200 patients 54.4.3 Good Abstract
after TKA using either a parapatellar medial or
parapatellar lateral approach with tibial tubercle Purpose: The purpose of this study was to inves-
osteotomy. TKA components’ position and the tigate if the type of approach [medial parapatellar
whole leg axis were assessed on 3D reconstructed approach (MPA) versus lateral parapatellar
CT scans (sagittal, coronal, and rotational). Mean approach with tibial tubercle osteotomy (LPA)]
values of TKA component position and the whole influences rotation of femoral and/or tibial
leg alignment of both groups were compared. component in total knee arthroplasty (TKA). It
54 How to Write a Scientific Article 565
was the hypothesis that MPA leads to an inter- questions: What is the paper about? Why is it
nally rotated tibial TKA component. worth being read and published? Arrange your
Methods: This study included 200 consecutive paper from basic to more complex. You should
patients in whom TKA was performed using start with a paragraph giving very basic informa-
either a parapatellar medial (n = 162, MPA) or tion on the background of your topic. Imagine the
parapatellar lateral approach with tibial tubercle structure of the introduction as a funnel. The
osteotomy (n = 38, LPA). All patients underwent background information represents the broadest
clinical follow-up, standardized radiographs, and part at the top, which is narrowing down to the
computed radiography (CT). TKA components’ specific information of your research topic [2].
position and the whole leg axis were assessed on However, this first paragraph should not start
3D reconstructed CT scans (sagittal, coronal, and from “Adam and Eve” but furthermore jump
rotational). Mean values of TKA component posi- more directly into the topic covered.
tion and the whole leg alignment of both groups
were compared using a t-test. The tibial compo-
nent was graded as internally rotated (<3° of 54.5.1 Practical Case Example
external rotation (ER)), neutral rotation (equal or
between 3° and 6° of ER), and externally rotated Prospective randomized controlled study investi-
(>6° ER). The femoral component was graded as gating patients with stiffness after isolated early
internally rotated [>3° of internal rotation (IR)], and late ACL reconstruction.
neutral rotation (equal or between −3° IR and 3°
of ER), and externally rotated (>3° ER). Checklist Introduction
Results: There was no significant difference in • Background information from basic to
terms of whole leg axis after TKA between both complex
groups (MPA, 0.2° valgus ±3.4; LPA, 0.0° valgus • What is known and what is unknown in
±3.5). Means of tibial component rotation were this specific topic?
2.7° ER ± 6.1 (MPA) and 7.6° ER ± 5.4 (LPA). • Why is the study needed?
Patients of group LPA presented a significantly • Hypothesis (what did we want to know?)
less internally rotated (LPA, 18.4%; MPA, 48.8%) • Study aim (how did we answer the
and more externally rotated (LPA, 52.6%; MPA, research question?)
22.8%) tibial component (p < 0.001). No signifi-
cant differences were seen for the femoral compo-
nent position, tibial valgus/varus, and tibial slope.
Conclusion: The type of approach (medial ver- Don’t lose the red thread of your
sus lateral) significantly influenced tibial TKA introduction!
component rotation. It appears that a MPA tends to
internally rotate the tibial TKA component and a
54.5.2 Poor Introduction
LPA tends to externally rotate the tibial TKA. The
anterior cortex should not be used as landmark for
“ACL tears are common injuries. ACL recon-
tibial TKA component placement when using the
struction is the most frequently performed
lateral approach with tibial tubercle osteotomy.
surgery in orthopedics. There are many surgical
methods described. The following ACL grafts
can be used. The purpose of this study was…”.
54.5 H
ow to Write a Good
Introduction
54.5.3 Good Introduction
This is the part in which you introduce the topic
to your reader. Typically, it consists of one page An unsolved problem in patients suffering from
and guides the reader into the topic of your arti- ACL insufficiency is the timing of ACL reconstruc-
cle. The introduction should answer the two key tion. [Then describe what is known about it!]
566 L. B. Moser and M. T. Hirschmann
Table 54.1 What you write in your introduction and published. Therefore, you need to perform a
what the reader probably understands [12] thorough literature review.
Introduction 7. End with a clear hypothesis and purpose of
What the reader your study. The more clear and easy it is, the
What you write… understands…
better it is. Just one or maximum two study
It has been known that… I haven’t been bothered to
It is well known… look up the original questions are optimal.
To the best of our reference but… 8. Clearly separate the major from the minor
knowledge… research question.
Of great theoretical and Interesting to me… 9. Leave comparison with other studies for dis-
practical importance…
cussion [2, 20].
While it has not been The experiment didn’t work
possible to provide out, but I figured I could at
definite answers to these least get a publication out
questions… of it… 54.6 H
ow to Write a Good
Future studies need toa Methods Section
clarify the meaning of
these results…
This part of a scientific article could be written
after having finished your study protocol (see
Stiffness is an associated problem with early ACL
Chap. 8). However, it might need to be adjusted
reconstruction… It was our hypothesis that…
once the study is finished.
[Finally end with “The purpose of this study was...”].
Proceed with explaining what is already
known, and describe open questions with regard Ask yourself: “Will my introduction sell my
to this topic. The reader should be able to follow paper to readers, reviewers, and editors?”
the red thread of your introduction.
Like in a good novel, you should guide the reader
from basic to complex and climax with your study
hypothesis and purpose of the study undertaken. Checklist Methods Section
The last paragraph should include a clear • Study design
description of your hypothesis and purpose of • Setting and subjects
your study. This part needs to answer the ques- • Data collection
tion why you undertook this study and which • Data analyses
open research question should be answered. • Ethical approval
Please consider the following tips for writing
a good introduction (Table 54.1):
The methods section should meticulously
1. Perform a detailed and thorough review of describe the study design and ideally serve as an
current literature. instruction for the readership to redo the study.
2. Do not include unnecessary background infor- Consider your research study as a specific dish
mation! Do not give well-known information! and your methods section as its recipe. You need
(“Do not start from Adam and Eve!”)! to list all ingredients and give a detailed descrip-
3. Keep it as short as possible and as long as tion of the cooking process. Only then the dish
needed. Be brief and concise! can be prepared repetitively with a reproducible
4. Do not overstate the clinical value and impor- result [9].
tance of your work! Try to create a clear story line. The methods
5. Ask yourself what you need to know to under- section links the introduction with the results.
stand the research topic and purpose of your Hence, it should be structured from basic to more
study. Try to obtain the reader’s perspective. complex. Generally, one should start with the
6. Explain to the reader what is novel in your study design. Here, the authors need to clarify the
study, which makes it worth being read or type of study done (Table 54.2).
Table 54.2 Possible study types for a scientific article [18]
Study type
Basic research
54 How to Write a Scientific Article
Then, you need to proceed with a clear Poor description of inclusion and exclusion
description of your study sample. The study sam- criteria: A consecutive series of patients who
ple description should include the exact number underwent a computed tomography (CT) after
of patients or subjects included. With regards to primary TKA as clinical routine follow-up or
patients’ basic demographics mean age ± stan- because of knee pain in a university-affiliated
dard deviation, gender, body mass index, align- hospital were analyzed. Patients were divided
ment information and other important variables into two groups with regard to the used surgical
should be given here [9]. approach.
Good description of inclusion and exclusion
criteria: A consecutive series of 200 patients who
54.6.1 Practical Case Example 1 underwent a computed tomography (CT) after
primary TKA from 2013 to 2016 as clinical rou-
Prospective study using 3D-CT investigating the tine follow-up or because of knee pain in a
influence of the approach in total knee arthro- university-affiliated hospital were prospectively
plasty (TKA) on TKA component rotation. collected and retrospectively analyzed. Indication
Poor description of study sample: A consecu- for TKA was end-stage osteoarthritis. Only pri-
tive series of patients who underwent a computed mary TKA were included. Exclusion criteria
tomography (CT) after primary TKA as clinical were any history of infection, tumor. A team of
routine follow-up or because of knee pain in a two senior surgeons performed the surgeries
university-affiliated hospital were analyzed. using either cruciate retaining or posterior stabi-
Patients were divided into two groups with regard lized TKA. The decision to perform a CR or PS
to the used surgical approach. TKA was based on the integrity of the posterior
Good description of study sample: A consecutive cruciate ligament and was done independently
series of 200 patients who underwent computed from alignment (varus versus valgus knees).
tomography (CT) after primary TKA from 2013 to Patients were divided into two groups with regard
2016 as clinical routine follow-up or because of knee to the used surgical approach.
pain in a university-affiliated hospital were prospec- As a general rule, this part should be suffi-
tively collected and retrospectively analyzed. ciently detailed so that an independent researcher
Patients were divided into two groups with regard to could reproduce the results and thereby validate
the used surgical approach. Group A (lateral peri- the study findings [20].
patellar approach (LPA)) included 38 patients (male/ A common error of many authors is to describe
female = 14:24; 67.5 ± 10.4 years), and Group B the study sample in the results section, but it
(medial parapatellar approach (MPA)) included 162 clearly belongs to material and methods and
patients (male/female = 56:106; 67.2 ± 9.8 years). should be given here. This is also true for review
A clear flow chart should allow the reader to papers [9].
understand how many patients were screened, The description of the study sample is fol-
how many were excluded, and how many finally lowed by a detailed description of tests and
included for this study. It is important to give experiments done for experimental studies and
exact numbers here. Along with an exact descrip- description of outcome instruments used for clin-
tion of inclusion and exclusion criteria, the article ical studies. In the case a novel methodology is
needs to allow the reader to judge a possible bias applied, a more detailed description is required.
in patient selection. If standard methods are applied such as well-
established outcome instruments, it is only nec-
essary to refer to these [20].
54.6.2 Practical Case Example 2 For all measurements done, inter- and intra-
observer reliability needs to be tested and pre-
Prospective study using 3D-CT investigating the sented. All measurements, in particular
influence of the approach in total knee arthro- measurements on any image such as radiographs,
plasty (TKA) on TKA component rotation. CT, MRI, or other imaging modalities, should be
54 How to Write a Scientific Article 569
done by at least two independent blinded observ- Table 54.3 What you write in your methods section and
ers twice with an interval of 6 weeks [7]. what the reader probably understands [12]
Inter- and intra-observer reliability as well as Methods
accuracy values need to be presented. This could What the reader
What you write… understands…
be done, for example, as intra-class correlation
Three of the samples were The results on the others
coefficients (ICCs), kappa values, or Bland- chosen for detailed study… didn’t make sense…
Altman plots [21]. Accidentally strained during Dropped on the floor…
A statement that ethical approval was obtained mounting…
from the local ethical committee or institutional Handled with extreme care Not dropped on the
review board approval should be included. An throughout the floor…
experiment…
increasing number of journals require a docu-
ment showing ethical approval to be included in
your submission. For clinical studies also state report if this was done and the result. Every sta-
that informed consent was obtained by each tistical test used needs to be mentioned here. The
patient or subject in the study [20]. level of statistical significance needs to be
Some journals require a statement that the reported as p-value. Generally, it is considered to
study was done in agreement with the ethical be p < 0.05 [6].
standards of the institutional and/or national A sample size calculation needs to be pre-
research committee and with the 1964 Helsinki sented in all clinical studies. The sample size is
declaration and its later amendments or compa- an estimation of the number of subjects required
rable ethical standards [17]. to detect a significant difference in a study. If the
sample size is too small, then a true difference
might turn out nonsignificant although being sig-
Describe your study sample as detailed as nificant. If the sample size is too large, this would
possible! mean that you unnecessarily waste scientific
resources, and it might also make even small dif-
ference significant (Table 54.3) [5].
Standard methods should be referred to!
Only describe methods which are also pre-
sented in the results!
2.8 ± 2.4, and the WOMAC daily activities However, a pure repetition of your results should
(0–68) was 17.4 ± 16.4. The mean total score be avoided. In the further course of the discussion,
(0–96) was 25.4 ± 22.00 after HTO (Fig. 5). Less the current evidence with regard to the study
stiffness with regard to the WOMAC score cor- question needs to be discussed. One common pit-
related significantly with a higher decrease in fall of the discussion part is to present a review of
SPECT/CT BTU (p < 0.05). a considerable number of published studies lack-
Higher postoperative bone tracer uptake sig- ing the interpretation of these papers with regard
nificantly correlated with more pain (p < 0.05).
A Spearman correlation analysis revealed no
statistical significant associations between Checklist Discussion
SPECT/CT BTU and alignment correction by • Summary of main findings
HTO. • Comparison and interpretation with
It should be clear that all data is reported and current literature
not only the data supporting your hypothesis. Do • Strength and limitations
not report data which has not been mentioned in
material and methods. No referencing is allowed
in the results section. to your study question.
Sometimes it appears difficult to decide what Finally, the strength and more importantly the
exactly is considered a result. For example, if limitations of your study need to be discussed in
inter- and intra-observer variability is tested for detail [3].
measurements on radiographs for validation of
your measurement method, however, it is not the
main study question but the radiological results, 54.8.1 Practical Case Example
then the measurements of inter- and intra-
observer reliability should be reported not in One year clinical and MR imaging outcome after
results but in material and methods. partial synthetic meniscal replacement in stabi-
Figure, tables, and graphs might help to make lized knees using a collagen meniscus implant.
it better understandable to the reader. Do not Poor discussion: The purpose of the present
duplicate results in text and figures, tables, or study was to evaluate the clinical and radiological
graphs. Often not much supporting text is needed outcomes of patients who underwent a medial or
here [10]. lateral collagen meniscus implantation. We
hypothesized that good functional results with
maintained sport capacity and activity level could
54.8 H
ow to Write a Good be achieved. We further hypothesized that MRI
Discussion would show no changes in size and signal inten-
sity of the CMI over time. The median preinjury
The discussion part should put your results into Tegner score was 7 (range 2–10); it decreased
the context of the current literature. In contrast to preoperatively to 3 (range 0–123). At 1-year fol-
the previous parts of a scientific article, here the low-up, the median Tegner score was 6 (range
findings need to be explained, interpreted, and 2–10). The mean Lysholm score before surgery
debated. was 68 ± 20 and 93 ± 9 at 1-year follow-up. The
The key questions to be answered are: What is mean flexion and extension ±125 standard devia-
similar and what is different to other studies pub- tion at 1-year follow-up was 140° ± 5° and
lished? How do your findings help to answer the 5° ± 1°, respectively. Meniscal substitution with
study question posed? Generally, the discussion the collagen meniscal implant showed excellent
should start with a sentence such as “The most clinical 1-year results in a highly active patient
important findings of the present study were...”. group. Significant pain relief and functional
572 L. B. Moser and M. T. Hirschmann
improvement throughout all scores were noted at a Table 54.5 What you write in your discussion and what
minimum of 1-year follow-up. The collagen menis- the reader probably understands [12]
cus implant undergoes significant remodeling, Discussion
degradation, and extrusion in most of the patients. What the reader
What you write… understands…
No difference in outcomes between the medial
It is generally believed A couple of other fellows
and lateral CMI was observed. that… think so too.
Good discussion (shortened): The most Correct within an order of Wrong!
important findings of this study were twofold: magnitude
Firstly, the meniscal substitution with the colla- It is clear that much I don’t understand it!
gen meniscal implant (CMI) showed excellent additional work will be The results did not turn
required before a complete out to be good enough to
clinical 1-year results in this large patient series understanding… make us understand the
in which most patients also underwent ACL Future studies are needed… problem
reconstruction. Significant pain relief and func-
tional improvement throughout all scores were Another weakness is that the patients included
noted at a minimum of 1-year follow-up. also underwent a variety of concomitant surger-
A variety of studies have been performed ies. In this study, the majority of patients under-
investigating the early to longer-term clinical went CMI due to prophylactic reasons, which
experience using CMI [1–8]. In agreement with currently could be considered a dubious indica-
the present study, most of the authors reported tion. This fact might have influenced the results of
that mean Tegner activity and Lysholm scores as the study, although this was not shown statisti-
well as pain values significantly improved from cally (Table 54.5).
pre- to postoperatively [5, 6, 9]. There was no sig-
nificant clinical difference whether a medial or
54.9 H
ow to Write a Good
lateral CMI was implanted. Monllau et al. investi-
Conclusion
gated 25 non-consecutive patients at minimum
10 years after CMI implantation [2]. They
The conclusion should be not more than one or
included almost the same number of patients
two sentences long and provide the reader with a
undergoing an ACL reconstruction and found sig-
summary of results and discussion. In particular,
nificantly improved Lysholm and VAS pain scores
the clinical implications of the findings should be
at 1-year follow-up.
highlighted. Please explain why the study is sig-
Secondly, with regard to the MRI results, the
nificant to the research world. This part should
CMI undergoes significant remodeling, degrada-
provide the key message you want to convey with
tion, and extrusion. The new meniscus tissue was
regard to your study. In addition, it is also possi-
well integrated. The size of the meniscus tissue
ble to provide an outlook of future research direc-
was reduced when compared to the normal menis-
tion. However, do not just write further
cus which could be only partially explained by
investigation is needed or do not announce future
compressive joint loading forces. However, these
studies that might not be completed [3].
findings are in agreement with Monllau et al. who
found less meniscus volume than expected at
minimum 10 years after CMI implantation [2]. 54.9.1 Practical Case Example
One major limitation of this study is the lack
of a control group. This study only presents the A prospective, longitudinal, single-cohort study
clinical and radiological outcomes of a consecu- investigating the correlation of depression, con-
tive series of patients undergoing collagen menis- trol beliefs, anxiety, and a variety of other psy-
cus implantation in stable and unstable knees. chological factors with outcomes of patients
However, it is one of the biggest case series of undergoing total knee arthroplasty (TKA) in 104
patients showing clinical and MRI results 1 year consecutive patients.
after CMI.
54 How to Write a Scientific Article 573
Poor conclusion: Self-efficacy did not influence (See Tables 54.6 and 54.7).
clinical scores. More depressed patients showed
higher pre- and postoperative WOMAC scores,
but no difference in amelioration. 54.11 Tips and Tricks for Figures
Good conclusion: Depression, anxiety, a ten-
dency to somatize, and psychological distress Always consider illustrating your article with
were identified as significant predictors for high-quality figures, which include photo-
poorer clinical outcomes before and/or after graphs or drawings. When including any fig-
TKA. Standardized preoperative screening and ures, you should ask yourself the question if
subsequent treatment should become part of the this figure really adds to the article. Often one
preoperative work-up in orthopedic practice. is too attached to photographs taken to
have an objective view on this matter. It might
be beneficial to get a different independent
54.10 Tips and Tricks for Tables perspective.
The number of figures accepted is dependent
Tables allow to present a great amount of data from the journal. You will find the actual limits
in a listed way. It is in particular beneficial for along with resolution and image quality require-
large datasets, which can hardly be described in ments in instructions for authors of each journal.
narrative form. It helps to keep the results sec- Often figures need to be recut or enlarged to high-
tion concise and brief. Ask a colleague of yours light the important part of the figure used. For
to check if your table is self-explanatory. Tables such purpose, arrows can also be used.
Table 54.6 Bad example of a table included in a scientific article. Clinical outcome scoring preoperatively, 6 weeks, 4
months and 1 year after surgery
Clinical scoring Pre-operative p 6 weeks p.o. p 4 months p.o. p 1-year p.o. Overall p
WOMAC pain 52 ± 24 49 *** 25 ± 14 22 *** 14 ± 16 9 n.s. 13 ± 17 6 ***
WOMAC stiffness 54 ± 26 50 *** 36 ± 20 30 *** 22 ± 20 15 n.s. 20 ± 20 15 ***
WOMAC function 48 ± 20 44 *** 27 ± 15 22 *** 16 ± 15 12 n.s. 16 ± 18 9 ***
n.s., p > 0.05. *p < 0.05, **p < 0.01, ***p < 0.001
Table 54.7 Good example of a table included in a scientific article. Clinical outcome scoring preoperatively, 6 weeks,
4 months and 1 year after surgery
Preoperative 6 weeks p.o. 4 months p.o. 1-year p.o.
Clinical scoring M ± SD Md p M ± SD Md p M ± SD Md p M ± SD Md Overall p
WOMAC pain 52 ± 24 49 *** 25 ± 14 22 *** 14 ± 16 9 n.s. 13 ± 17 6 ***
WOMAC stiffness 54 ± 26 50 *** 36 ± 20 30 *** 22 ± 20 15 n.s. 20 ± 20 15 ***
WOMAC function 48 ± 20 44 *** 27 ± 15 22 *** 16 ± 15 12 n.s. 16 ± 18 9 ***
Mean ± standard deviation (M ± SD), Median (Md), p between measuring points (Wilcoxon-test), and overall p
(Friedman-test)
n.s., p > 0.05, *p < 0.05, **p < 0.01, ***p < 0.001
574 L. B. Moser and M. T. Hirschmann
F 1s 2s F
1s 2s sa sp
1i 2i ia ip
1s 3s 2s 10mm 10mm sa sp
1p 3p 2p
1i 3i 2i ia ip
50mm 50mm
1a 3a 2a
T 4 60mm 60mm 4 T
shaft shaft
20 20
15
15
Max BTU
10 *
10 *
*
*
5
5
0
a b
a b
8. Try to cite relevant papers of the target journal. 5. Guller U, Oertli D. Sample size matters: a guide for
Editors appreciate it as an interest in their jour- surgeons. World J Surg. 2005;29:601–5.
6. Harris JD, Brand JC, Cote MP, Faucett SC, Dhawan
nal, and it might improve citation scores [4]. A. Research pearls: the significance of statistics and
perils of pooling. Part 1: clinical versus statistical sig-
nificance. Arthroscopy. 2017;33:1102–12.
Checklist References 7. Hassink G, Testa EA, Leumann A, Hugle T, Rasch
H, Hirschmann MT. Intra- and inter-observer reli-
• Use reference manager software ability of a new standardized diagnostic method
• Use the requested output style using SPECT/CT in patients with osteochondral
• Cite the original reference lesions of the ankle joint. BMC Med Imaging.
• Recheck the final reference list 2016;16:67.
8. Kotz D, Cals JW. Effective writing and publishing
scientific papers—part I: how to get started. J Clin
Epidemiol. 2013;66:397.
9. Kotz D, Cals JW. Effective writing and publishing
Take-Home Message scientific papers, part IV: methods. J Clin Epidemiol.
• Writing the first scientific paper appears to be 2013;66:817.
a difficult step to master. Nevertheless, scien- 10. Kotz D, Cals JW. Effective writing and publishing
scientific papers, part V: results. J Clin Epidemiol.
tific writing follows a well-defined structure. 2013;66:945.
Your first step is to complete a detailed review 11. Kotz D, Cals JW. Effective writing and publishing
of current literature with regard to your topic. scientific papers, part VII: tables and figures. J Clin
Afterward it is advisable to start with an out- Epidemiol. 2013;66:1197.
12. Lacroix JR. A key to scientific research literature. Can
line. Now you can benefit from your previ- Med Assoc J. 1971;104:1080.
ously written comprehensive study protocol. 13. Letchford A, Moat HS, Preis T. The advantage of
• Accurate, clear, and unambiguous expression short paper titles. R Soc Open Sci. 2015;2:150266.
of findings is crucial and improves the quality 14. Mathis DT, Hirschmann A, Falkowski AL, Kiekara T,
Amsler F, Rasch H, et al. Increased bone tracer uptake
of your manuscript. Even though you have in symptomatic patients with ACL graft insufficiency:
exciting and original results, your manuscript a correlation of MRI and SPECT/CT findings. Knee
may not be accepted for publication in a peer- Surg Sports Traumatol Arthrosc. 2018;26(2):563–73.
reviewed journal if the presentation and illus- https://doi.org/10.1007/s00167-017-4588-5.
15. Mathis DT, Kaelin R, Rasch H, Arnold MP,
tration are of mediocre quality. Therefore, Hirschmann MT. Good clinical results but moderate
it is of utmost importance to acquire good osseointegration and defect filling of a cell-free multi-
scientific writing skills for your research layered nano-composite scaffold for treatment of
armamentarium. osteochondral lesions of the knee. Knee Surg Sports
Traumatol Arthrosc. 2018;26(4):1273–80. https://doi.
• Finally, you need to be certain that you metic- org/10.1007/s00167-017-4638-z.
ulously follow the instructions stipulated by 16. Peh WC, Ng KH. Basic structure and types of scien-
your target journal prior to submission. tific papers. Singap Med J. 2008;49:522–5.
17. Rickham PP. Human experimentation. Code of eth-
ics of the World Medical Association. Declaration of
Helsinki. Br Med J. 1964;2:177.
References 18. Rohrig B, du Prel JB, Wachtlin D, Blettner M. Types
of study in medical research: part 3 of a series on
1. Cals JW, Kotz D. Effective writing and publishing evaluation of scientific publications. Dtsch Arztebl
scientific papers, part II: title and abstract. J Clin Int. 2009;106:262–8.
Epidemiol. 2013;66:585. 19. Schulz KF, Altman DG, Moher D, CONSORT Group.
2. Cals JW, Kotz D. Effective writing and publishing sci- CONSORT 2010 Statement: updated guidelines for
entific papers, part III: introduction. J Clin Epidemiol. reporting parallel group randomized trials. Open
2013;66:702. Med. 2010;4:e60–8.
3. Cals JW, Kotz D. Effective writing and publishing sci- 20. Vitse CL, Poland GA. Writing a scientific paper—
entific papers, part VI: discussion. J Clin Epidemiol. a brief guide for new investigators. Vaccine.
2013;66:1064. 2017;35:722–8.
4. Cals JW, Kotz D. Effective writing and publishing sci- 21. Watson PF, Petrie A. Method agreement analysis:
entific papers, part VIII: references. J Clin Epidemiol. a review of correct methodology. Theriogenology.
2013;66:1198. 2010;73:1167–79.
Common Mistakes in Manuscript
Writing and How to Avoid Them
55
Eleonor Svantesson, Eric Hamrin Senorski,
Kristian Samuelsson, and Jón Karlsson
of the methods section, keep it simple and pre- section should not be longer than one manuscript
cise. The most important aspects to focus on are page.
the inclusion and the exclusion criteria, the
description of the intervention or the experiment,
and a clear presentation of the outcome measure- 55.2.6 Discussion
ments. Consider using flow diagrams or tables to
illustrate your test setup. There is a number of The way you start the discussion is important. In
reporting guidelines published that can help you a few initial sentences, you should preferably
structure your methods section depending on the summarize the most important findings of your
type of study. Examples of such guidelines are study, which function as a foundation for the rest
the CONSORT [4] and the Preferred Reporting of your discussion. The discussion should be
Items for Systematic Reviews and Meta-Analyses written based on your results, and these should be
(PRISMA) statement [3]. The statistical analysis compared and contrasted with previous research,
should be described under a separate subheading, not the other way around.
where all calculations should be clearly reported. Thus, the discussion is not a forum for a gen-
This may include power, sample size, sensitivity eral discussion and presentation of other studies.
analysis, and drop-out analysis. In fact, sample Primarily ask yourself: What did your study
size calculation is always necessary in order to show? Thereafter, discuss these findings in rela-
ensure that the statistical power is adequate. tion to previous findings. Is likely that your
Statistics are really a separate science in itself results are true? What supports your findings
and do therefore not hesitate to ask for profes- compared with other studies and what findings of
sional help in order to ensure that the statistical your study might be contradicted based on previ-
analysis is correctly presented. Two common ous research? Also, focus on the clinical rele-
mistakes are to leave out information about the vance of your findings in the discussion. This is
sample size calculation and the IRB approval. It especially important if you have conducted pre-
can hardly be stated enough that (almost) all clinical research and are aiming for a clinically
studies need an IRB approval. oriented journal. Finally, all researchers know
that no study is perfect. There are always limita-
tions and confounding factors that could influ-
55.2.5 Results ence a result of a study. To be honest and humble
about such, potential factors increase the trust-
Now you have reached the part where you finally worthiness of a study. Furthermore, an under-
are allowed to present the results of your study. standing of limitations of a study will generate
The most important thing of this section is that new ideas for future studies and encourage hon-
the results are presented objectively. Preferably, est research. Therefore, think through your study
you have prepared a thorough study protocol limitations, and clearly present and discuss them
before conducting your trial where you already at the end of the section. All too often, limitations
have prepared an outline for your results section. are not as well reported as they should be.
There should be no subjective influence in the
result section whatsoever; save that for the dis-
cussion part. Take time to “get to know the data” 55.2.7 Conclusion
and to decide how you best present it. There is no
need to present all the findings in the text; choose The conclusion should be based on statistically
the most important one(s) to present in detail in significant findings from your study and nothing
the text. The text is in turn complemented by else. There is room for some slight speculation;
tables and figures in order to display all data. Aim however, such speculations should mainly be
for writing a short results section where the included in the discussion and never in the
results are not duplicated in text and in tables or conclusion. The conclusion should be a brief,
figures. A good rule of thumb is that the results true, and concrete statement of the evidence that
55 Common Mistakes in Manuscript Writing and How to Avoid Them 583
• Mentors that can share experiences and tips 3. Moher D, Liberati A, Tetzlaff J, Altman DG. Preferred
reporting items for systematic reviews and meta-
about how to prepare a manuscript are a highly analyses: the PRISMA statement. J Clin Epidemiol.
valuable asset when aiming to develop a good 2009;62(10):1006–12. https://doi.org/10.1016/j.
skill for writing. jclinepi.2009.06.005.
4. Schulz KF, Altman DG, Moher D. CONSORT 2010
statement: updated guidelines for reporting paral-
lel group randomised trials. BMJ. 2010;340:c332.
References https://doi.org/10.1136/bmj.c332.
5. Swales JM, Feak CB. Academic writing for graduate
1. Katz MJ. From research to manuscript: a guide to sci- students: essential tasks and skills. Ann Arbor: The
entific writing. New York: Springer; 2009. University of Michigan Press; 2012.
2. MacArthur CA, Graham S, Fitzgerald J. Handbook of
writing research. New York: Guilford; 2006.