You are on page 1of 559

Volker Musahl · Jón Karlsson

Michael T. Hirschmann
Olufemi R. Ayeni · Robert G. Marx
Jason L. Koh
Norimasa Nakamura
Editors

Basic Methods
Handbook for Clinical
Orthopaedic Research
A Practical Guide and Case Based
Research Approach
Basic Methods Handbook for Clinical
Orthopaedic Research
Volker Musahl  •  Jón Karlsson
Michael T. Hirschmann
Olufemi R. Ayeni  •  Robert G. Marx
Jason L. Koh  •  Norimasa Nakamura
Editors

Basic Methods
Handbook for Clinical
Orthopaedic Research
A Practical Guide and Case Based
Research Approach
Editors
Volker Musahl Jón Karlsson
UPMC Rooney Sports Complex Department of Orthopaedics
University of Pittsburgh Sahlgrenska Academy
Pittsburgh, PA Gothenburg University
USA Sahlgrenska University Hospital
Gothenburg
Michael T. Hirschmann Sweden
Department of Orthopaedic Surgery
and Traumatology Olufemi R. Ayeni
Kantonsspital Baselland McMaster University
(Bruderholz, Laufen und Liestal) Hamilton, ON
Bruderholz Canada
Switzerland
Jason L. Koh
Robert G. Marx Department of Orthopaedic Surgery
Hospital for Special Surgery NorthShore University HealthSystem
New York, NY Evanston, IL
USA USA

Norimasa Nakamura
Institute for Medical Science in Sports
Osaka Health Science University
Osaka
Japan

ISBN 978-3-662-58253-4    ISBN 978-3-662-58254-1 (eBook)


https://doi.org/10.1007/978-3-662-58254-1

Library of Congress Control Number: 2018966868

© ISAKOS 2019
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher,
whether the whole or part of the material is concerned, specifically the rights of translation,
reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any
other physical way, and transmission or information storage and retrieval, electronic adaptation,
computer software, or by similar or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are
exempt from the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in
this book are believed to be true and accurate at the date of publication. Neither the publisher nor
the authors or the editors give a warranty, express or implied, with respect to the material
contained herein or for any errors or omissions that may have been made. The publisher remains
neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer-Verlag GmbH, DE part
of Springer Nature.
The registered company address is: Heidelberger Platz 3, 14197 Berlin, Germany
Contents

Part I Evidence Based Medicine in Orthopaedics

1 What Is Evidence-Based Medicine? ����������������������������������������������   3


Eleonor Svantesson, Eric Hamrin Senorski, Jón Karlsson,
Olufemi R. Ayeni, and Kristian Samuelsson
2 What Is the Hierarchy of Clinical Evidence?��������������������������������  11
Vishal S. Desai, Christopher L. Camp, and Aaron J. Krych
3 Bias and Confounding ��������������������������������������������������������������������  23
Naomi Roselaar, Magaly Iñiguez Cuadra, and Stephen Lyman
4 Ethical Consideration in Orthopedic Research����������������������������  33
Jason L. Koh and Diego Villacis
5 Conflict of Interest ��������������������������������������������������������������������������  43
Michael Hantes and Apostolos Fyllos
6 Ethics in Clinical Research ������������������������������������������������������������  49
Naomi Roselaar, Niv Marom, and Robert G. Marx

Part II How to Get Started with Clinical Research?

7 How to Get Started: From Idea to Research Question����������������  57


Lachlan M. Batty, Timothy Lording, and Eugene T. Ek
8 How to Write a Study Protocol ������������������������������������������������������  65
Lukas B. Moser and Michael T. Hirschmann
9 The Ethical Approval Process ��������������������������������������������������������  75
Karren Takamura and Frank Petrigliano
10 How to Assess Patient’s Outcome? ������������������������������������������������  83
Yuichi Hoshino and Alfonso Barnechea
11 Basics of Outcome Assessment in Clinical Research��������������������  89
Monique C. Chambers, Sarah M. Tepe,
Lorraine A. T. Boakye, and MaCalus V. Hogan
12 Types of Scoring Instruments Available����������������������������������������  97
José F. Vega and Kurt P. Spindler

v
vi Contents

13 Health Measurement Development and Interpretation ������������  111


Andrew Firth, Dianne Bryant, Jacques Menetrey, and
Alan Getgood
14 How to Document a Clinical Study and Avoid
Common Mistakes in Study Conduct?����������������������������������������  121
Caroline Mouton, Laura De Girolamo, Daniel Theisen, and
Romain Seil
15 Framework for Selecting Clinical Outcomes
for Clinical Trials ��������������������������������������������������������������������������  133
Adam J. Popchak, Andrew D. Lynch, and James J. Irrgang
16 Advances in Measuring Patient-­Reported Outcomes:
Use of Item Response Theory and Computer Adaptive Tests ����  143
Andrew D. Lynch, Adam J. Popchak, and James J. Irrgang

Part III Basics in Statistics: Statistics Made Simple!

17 Common Statistical Tests��������������������������������������������������������������  153


Stephan Bodkin, Joe Hart, and Brian C. Werner
18 The Nature of Data������������������������������������������������������������������������  163
Clair Smith
19 Does No Difference Really Mean No Difference? ����������������������  171
Carola F. van Eck, Marcio Bottene Villa Albers,
Andrew J. Sheean, and Freddie H. Fu
20 Power and Sample Size ����������������������������������������������������������������  185
Stephen Lyman
21 Visualizing Data ����������������������������������������������������������������������������  193
Stephen Lyman, Naomi Roselaar, and Chisa Hidaka

Part IV Basic Toolbox for the Young Clinical Researcher

22 How to Prepare an Abstract ��������������������������������������������������������  209


Elmar Herbst, Brian Forsythe, Avinesh Agarwalla, and
Sebastian Kopf
23 How to Make a Good Poster Presentation����������������������������������  219
Baris Kocaoglu, Paulo Henrique Araujo, and
Carola Francisca van Eck
24 How to Prepare a Paper Presentation? ��������������������������������������  227
Timothy Lording and Jacques Menetrey
25 How to Write a Clinical Paper ����������������������������������������������������  235
Brendan Coleman
26 How to Write a Book Chapter������������������������������������������������������  243
Thomas R. Pfeiffer and Daniel Guenther
Contents vii

27 How to Write a Winning Clinical Research Proposal?��������������  249


Christian Lattermann and Janey D. Whalen
28 How to Review a Clinical Research Paper?��������������������������������  255
Neel K. Patel, Marco Yeung, Kanto Nagai, and Volker Musahl

Part V How to Perform a Clinical Study: A Case Based Approach

29 Level 1 Evidence: A Prospective Randomized


Controlled Study����������������������������������������������������������������������������  265
Seper Ekhtiari, Raman Mundi, Vickas Khanna,
and Mohit Bhandari
30 Level 1 Evidence: Long-Term Clinical Results ��������������������������  285
Daisuke Araki and Ryosuke Kuroda
31 Level 2 Evidence: Prospective Cohort Study������������������������������  289
Naomi Roselaar, Niv Marom, and Robert G. Marx
32 Level III Evidence: A Case-Control Study����������������������������������  295
Andrew D. Lynch, Adam J. Popchak, and James J. Irrgang
33 Level 4 Evidence: Clinical Case Series����������������������������������������  301
Mitchell I. Kennedy and Robert F. LaPrade
34 How to Perform a Clinical Study: Level 4
Evidence—Case Report����������������������������������������������������������������  307
Andrew J. Sheean, Gregory V. Gasbarro,
Nasef M. N. Abedelatif, and Volker Musahl
35 Level 5: Evidence ��������������������������������������������������������������������������  313
Seán Mc Auliffe and Pieter D’Hooghe

Part VI How to Perform a Review Article?

36 Type of Review and How to Get Started ������������������������������������  323


Matthew Skelly, Andrew Duong, Nicole Simunovic, and
Olufemi R. Ayeni

Part VII How to Perform a Systematic Review or Meta-analysis?

37 What Is the Difference Between a Systematic Review


and a Meta-analysis? ��������������������������������������������������������������������  331
Shakib Akhter, Thierry Pauyo, and Moin Khan
38 Reliability Studies and Surveys����������������������������������������������������  343
Kelsey L. Wise, Brandon J. Kelly, Michael L. Knudsen, and
Jeffrey A. Macalena
39 Registries����������������������������������������������������������������������������������������  359
R. Kyle Martin, Andreas Persson, Håvard Visnes,
and Lars Engebretsen
viii Contents

Part VIII How to Perform an Economic Health Care Study?

40 How to Perform an Economic Healthcare Study������������������������  373


Jonathan Edgington, Xander Kerman, Lewis Shi, and
Jason L. Koh

Part IX Multi-Center Study: How to Pull It Off?

41 Conducting Multicenter Cohort Studies:


Lessons from MOON ��������������������������������������������������������������������  383
José F. Vega and Kurt P. Spindler
42 MARS: The Why and How of It ��������������������������������������������������  391
Rick W. Wright, Amanda K. Haas, and Laura J. Huston
43 Multicenter Study: How to Pull It Off? The PIVOT Trial ��������  403
Eleonor Svantesson, Eric Hamrin Senorski, Alicia Oostdyk,
Yuichi Hoshino, Kristian Samuelsson, and Volker Musahl
44 Conducting a Multicenter Trial: Learning from the
JUPITER (Justifying Patellar Instability Treatment
by Early Results) Experience ������������������������������������������������������  415
Jason L. Koh, Shital Parikh, Beth Shubin Stein,
and The JUPITER Group
45 How to Organise an International Register in Compliance
with the European GDPR: Walking in the Footsteps
of the PAMI Project (Paediatric ACL Monitoring Initiative) ����  427
Daniel Theisen, Håvard Moksnes, Cyrille Hardy,
Lars Engebretsen, and Romain Seil

Part X Helpful Further Information

46 Common Scales and Checklists in Sports Medicine Research ����  437


Alberto Grassi, Luca Macchiarola, Marco Casali,
Ilaria Cucurnia, and Stefano Zaffagnini
47 A Practical Guide to Writing (and Understanding)
a Scientific Paper: Meta-Analyses ����������������������������������������������  471
Alberto Grassi, Riccardo Compagnoni, Kristian Samuelsson,
Pietro Randelli, Corrado Bait, and Stefano Zaffagnini
48 A Practical Guide to Writing (and Understanding)
a Scientific Paper: Clinical Studies����������������������������������������������  499
Riccardo Compagnoni, Alberto Grassi, Stefano Zaffagnini,
Corrado Bait, Kristian Samuelsson, Alessandra Menon, and
Pietro Randelli
49 Reporting Complications in Orthopaedic Trials������������������������  507
S. Goldhahn, Norimasa Nakamura, and J. Goldhahn
Contents ix

50 Understanding and Addressing Regulatory Concerns


in Research ������������������������������������������������������������������������������������  515
Jason L. Koh, Denise Gottfried, Daniel R. Lee,
and Sandra Navarrete
51 What Is Needed to Make Collaboration Work? ������������������������  533
Richard E. Debski and Gerald A. Ferrer
52 A Clinical Practice Guideline ������������������������������������������������������  537
Aleksei Dingel, Jayson Murray, James Carey,
Deborah Cummins, and Kevin Shea
53 How to Navigate a Scientific Meeting and Make
It Worthwhile? A Guide for Young Orthopedic Surgeons ��������  551
Darren de SA, Jayson Lian, Conor I. Murphy, Ravi Vaswani,
and Volker Musahl
54 How to Write a Scientific Article ������������������������������������������������  561
Lukas B. Moser and Michael T. Hirschmann
55 Common Mistakes in Manuscript Writing
and How to Avoid Them����������������������������������������������������������������  579
Eleonor Svantesson, Eric Hamrin Senorski,
Kristian Samuelsson, and Jón Karlsson
About the Editors

Editors

Volker Musahl, MD  is Associate Professor


and Chief of Sports Medicine in the
Department of Orthopaedic Surgery at the
University of Pittsburgh and the Deputy
Editor-in-chief of KSSTA. He is one of the
top 10 published ACL scientists worldwide
and co-Principal investigator for the multi-
center STaR trial (Surgical Timing and
Rehabilitation for multiple knee ligament
injury) and the POETT study (Exercise
Therapy for rotator cuff tears). He is the co-
head team physician for the University of
Pittsburgh Football and Fellowship Director
for the Sports Medicine and Shoulder
Fellowship at the University of Pittsburgh.

Jón  Karlsson, MD, PhD is Professor of


Orthopaedic Surgery at the University of
Gothenburg, Sweden, and Senior Consultant
at the Sahlgrenska University Hospital. He
has published more than 400 scientific
papers, more than 100 book chapters and is
the author/co-editor of 40 books on
Orthopaedics and Sports Traumatology. He
has mentored more than 50 PhD students in
their scientific work. He is currently
Secretary of ISAKOS Board of Directors
and former chair of ISAKOS Scientific
Committee. He is currently Editor-in-Chief
of KSSTA (Knee Surgery Sports
Traumatology Arthroscopy), one of the
leading journals in the category of knee sur-
gery and sports trauma. He has published

xi
xii About the Editors

several papers on research methodology and


evidence-based medicine. He has been team
physician for IFK Göteborg, soccer club for
more than 30 years.
Michael  T.  Hirschmann, MD is Professor
of Orthopaedic Surgery and Traumatology at
the University of Basel, Switzerland. At
Kantonsspital Baselland (Bruderholz, Liestal,
Laufen) he is Co-Chair of Orthopaedic
Surgery and Traumatology, Head of knee sur-
gery, and DKF Head of knee research. He has
published over 250 academic peer-reviewed
articles and book chapters. He was also editor
of numerous books dealing with knee surgery.
During the last years, he and his team have
won more than 30 research awards for their
contributions to knee surgery. His main clini-
cal and research interest is all about knee sur-
gery, in particular sports injuries and the
treatment of the degenerative knee including
osteotomy and primary and revision knee
arthroplasty. In addition, he serves as Deputy
Editor-in-Chief of the Knee Surgery Sports
Traumatology Arthroscopy Journal (KSSTA)
for which he also annually runs a Basic
Science Writing Course. He is also on the edi-
torial board of numerous other journals.

Olufemi R. Ayeni, MD, PhD, FRCSC  is an


Associate Professor in the Division of
Orthopaedic Surgery at McMaster University.
He has published over 230 academic peer-
reviewed articles, numerous book chapters,
and secured over one million dollars in
research funding. He has presented interna-
tionally on sports medicine-related topics and
evidence-based medicine. He is currently the
Medical Director of the Hamilton Tiger Cats
of the Canadian Football League and
Fellowship Director for the Sports Medicine
and Arthroscopy Fellowship at McMaster
University.
About the Editors xiii

Robert G. Marx, MD, MSc  is Professor of


Orthopedic Surgery and vice-chair of
Orthopedic Surgery at Hospital for Special
Surgery/ Weill Cornell Medical College in
New York City. He has published four books
and over 200 peer-reviewed scientific articles.
Dr. Marx is Deputy Editor for sports medi-
cine and Associate Editor for Evidence-Based
Orthopedics for the Journal of Bone and Joint
Surgery. He has previously served as chair-
man of the scientific committee and also on
the Board of Directors of ISAKOS.

Jason  L.  Koh, MD, MBA is Chairman of


the Department of Orthopaedic Surgery and
Director of the Orthopaedic Institute at
NorthShore University HealthSystem,
Chicago. He is also Clinical Professor of
Surgery at the University of Chicago Pritzker
School of Medicine and Adjunct Professor at
the Northwestern University McCormick
School of Engineering. His research regard-
ing tissue engineering, cartilage, rotator cuff,
and ligament has been recognized with
numerous honors and awards. Dr. Koh has
held leadership positions with multiple ortho-
paedic societies and has published over 100
papers and book chapters. He has served as
team physician for the Chicago Cubs, Chicago
Fire Soccer, and the Joffrey Ballet.

Norimasa  Nakamura, MD, PhD is an


orthopaedic surgeon and Professor at the
Institute for Medical Science in Sports, Osaka
Health Science University, Osaka City, Japan.
Dr. Nakamura is a Fellow of the Royal
College of Surgeons (FRCS) (England). He is
President of the International Cartilage Repair
Society (ICRS) and a former chair of the
Scientific Committee of ISAKOS.  He is a
member of the editorial boards of various
leading journals in the field of orthopaedics
and author of more than 120 peer-reviewed
papers.
xiv About the Editors

Associate Editors

Neel K. Patel, MD  is an orthopaedic surgery


resident in the Clinician-Scientist track at the
University of Pittsburgh Medical Center. He
completed his undergraduate education at the
Massachusetts Institute of Technology and his
medical education at the Geisel School of
Medicine at Dartmouth. His research interests
include shoulder and ankle biomechanics and
complex multi-ligament knee injuries.

Darren  de SA, MD, FRCSC  is a Canadian


orthopaedic surgeon from Hamilton, Ontario,
Canada, whose clinical scope includes adult
and paediatric sports medicine, arthroscopic
surgery, and trauma. He has a special interest
in complex knee reconstruction and shoulder
surgery. Darren completed both his medical
and orthopaedic training at McMaster
University, and subsequently completed
Fellowships in Sports Medicine/Arthroscopy
and Adolescent Trauma at the University of
Pittsburgh and Western University, respec-
tively. Darren has contributed to a productive
research and medical education program
focused on outcomes in athletic injuries and
non-arthritic conditions of the knee, shoulder,
and hip. He is an Editorial Board Member for
Arthroscopy and is a manuscript reviewer for
both Arthroscopy and KSSTA.  His contribu-
tions have been recognized at the institutional
and national level.
Part I
Evidence Based Medicine in Orthopaedics
What Is Evidence-Based Medicine?
1
Eleonor Svantesson, Eric Hamrin Senorski,
Jón Karlsson, Olufemi R. Ayeni,
and Kristian Samuelsson

1.1 History of EBM appraisal of the current evidence, was fundamen-


tal for a trustworthy conclusion on the optimal
Throughout the history of medicine, it is obvious care of patients. One of the most famous early
that certain treatment strategies have been clinical trials was James Lind’s study of scurvy in
adapted based on preferences, belief, and ratio- the British navy, which was published in 1753
nalism. Some of this “knowledge” was conveyed [16]. Despite this early recorded “trial”, it took as
by senior clinicians to the next generations of long as until 1962 for the US Food and Drug
­clinicians and thereby remained as the way to Administration to declare the Kefauver-Harris
­practice medicine without questioning the perfor- Act which declared that it was legally required to
mance of it. Gradually, some pioneers started to perform clinical trials involving human beings
alter this perspective by advocating that empiri- prior to establishing claims regarding drug effi-
cal evidence should be incorporated in the prac- cacy [18]. As clinicians and researchers of today,
tice of medicine, indicating that a scientific it appears obvious that an establishment of such
approach, including observation and critical claims entails years of investigation. Any sce-
nario other than performing thorough preclinical
E. Svantesson (*) and clinical studies before this would be unthink-
Department of Orthopaedics, Institute of Clinical able for us. We should therefore be reminded to
Sciences, The Sahlgrenska Academy, University reflect over the relatively fast development in the
of Gothenburg, Gothenburg, Sweden
empirical assessment of evidence that our field
E. H. Senorski has undergone. The question arises, what is the
Department of Health and Rehabilitation, Institute
of Neuroscience and Physiology, The Sahlgrenska reason for this development? What made us
Academy, University of Gothenburg, abandon the collection of uncontrolled experi-
Gothenburg, Sweden ences as the major foundation of clinical decision-­
J. Karlsson · K. Samuelsson making and, instead, strive toward the practice of
Department of Orthopaedics, Institute of Clinical evidence-based medicine (EBM)? Or has it really
Sciences, The Sahlgrenska Academy, University happened?
of Gothenburg, Gothenburg, Sweden
In the beginning of the twentieth century, the
Department of Orthopaedics, Sahlgrenska University orthopedic surgeon Earnest A.  Codman at the
Hospital, Mölndal, Sweden
e-mail: jon.karlsson@telia.com Massachusetts General Hospital in Boston pro-
posed the end result system [6], which could be
O. R. Ayeni
Division of Orthopaedic Surgery, McMaster considered as a groundbreaking system toward
University, Hamilton, ON, Canada modern EBM.  Dr. Codman meant that only by
e-mail: ayenif@mcmaster.ca evaluating the outcome, “the end result” of a

© ISAKOS 2019 3
V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_1
4 E. Svantesson et al.

treatment, it was possible to assess the clinical vidual patient. Thus, it is about using scientific
effectiveness of such a treatment. He advocated evidence conscientiously and objectively for this
that it was required to monitor every patient who purpose. To ascertain this, it is required that a suf-
received treatment for a follow-up time long ficient amount of clinically relevant sources of
enough to establish if the treatment led to a satis- information is acquired, which may include pub-
factory outcome. If a treatment failure was deter- lished literature such as basic science research,
mined based on the records of outcome, he meant clinical trials, diagnostic testing, predictive fac-
that the reason for the failure should be investi- tors, the efficacy of therapeutic interventions, etc.
gated and proper actions should be undertaken to However, it could also include individual obser-
prevent similar future failures. This was a radical vations and expert opinions. The expertise of a
idea in the early twentieth century, and Dr. highly experienced clinician should also be
Codman’s work was encountered by such a incorporated into the concept of EBM.  EBM is
strong criticism that he lost his staff position at about using the accumulated scientific and statis-
the Massachusetts General Hospital. Opposingly, tical knowledge derived by several of the afore-
we nowadays consider Dr. Codman’s approach a mentioned sources of information. Moreover, it
milestone for medicine based on empirical evi- involves using critical appraisal of such sources
dence. The term EBM, which has been regarded to evaluate: what does the strongest evidence sug-
as one of the most important paradigm shifts in gest in terms of decision-making in this clinical
medical history [27], was defined in 1991 by situation? Nonetheless, to solely rely on scien-
Gordon Guyatt as a component of the medical tific evidence without clinical expertise for clini-
residency program at McMaster University in cal decision-making would be a misconception
Hamilton, Ontario, Canada [4]. The concept of the EBM concept. An important component is
aimed to educate clinicians on how to perform also to incorporate patients’ perspectives and val-
and interpret scientific evidence in terms of ues, which is in line with providing the best care
assessment of credibility, critical appraisal of the for every individual patient. Establishing a good
results, and integration of scientific evidence in relationship between the medical professional
the everyday work [10]. This concept was subse- and the patient is important for this purpose, and
quently adapted worldwide and several guides, the practice of EBM should be characterized by
functioning as a foundation of the understanding the synergistic values between the brain and the
of the EBM concept, have been published [13, heart [21]. Thus, EBM is characterized by three
15, 26]. One contributing factor for the almost equally important fundamental principles: the
explosive entrance and integration of the EBM best available research, the clinical experience,
was the rapid evolvement of modern technology and the patient’s perspective [24].
during this time, which enabled the field of infor-
matics to grow tremendously with, for example,
large online databases and scientific journals. 1.3 The Best Available Evidence
Therefore, once the concept was established, the
requisites of implementing EMB in a variety of The large amount of available literature needs a
specialties enabled the emergence of a new era of systematic approach to evaluate and synthesize
a scientific approach to the practice of medicine. data. Before the term EBM was even established,
efforts to describe how to systematically examine
the scientific literature for “critical appraisal” and
1.2 What Defines the Practice for extraction of evidence were made by David
of EBM? Sackett of McMaster University [24]. The idea of
summarizing evidence was developed as a cor-
The practice of EBM entails the integration of the nerstone of EBM, and creating such summaries
current best evidence in the clinical decision-­ has been facilitated by the rapid development
making process in terms of the care of every indi- of  venues for finding information, as well as
1  What Is Evidence-Based Medicine? 5

advanced degrees in librarian science to help studies, diagnostic studies, prevalence studies,
extract data efficiently. Today, finding the best and economic/decision analyses. Thus, each
possible evidence is closer than ever with various study type has its own system for determining the
software programs and databases that can gener- level of evidence, and subgroups within a certain
ate information almost instantly. There are sev- level have also been established (e.g., level 1a,
eral valuable sources that provide clinicians with 1b, etc.). Fact Box 1.1 summarizes the hierarchy
the best available evidence, including systematic of evidence for therapeutic studies.
reviews and evidence-based clinical guidelines.
Perhaps the most extensive database providing
in-depth systematic reviews on a numerous of
Fact Box 1.1: Levels of Evidence
topics is the Cochrane Database, which also com-
1 Meta-analyses
prises a list of randomized clinical trials in ortho- Systematic reviews
pedics and many other subspecialty areas. Randomized controlled trials
2 Cohort studies
3 Case-control studies
1.3.1 The Hierarchy of Evidence 4 Retrospective case series
5 Expert opinions
The hierarchy of evidence needs to be appreci-
ated to facilitate interpretation of the large
amount of literature on a topic. Moreover, a sys- The hierarchy of evidence is proposed to
tematic quality appraisal needs to be carried out reflect the applicability, reproducibility, and gen-
when searching for the best possible evidence eralizability of a study [9], and a study of high
because not all research is reliable research. The level of evidence should therefore show superi-
hierarchy of evidence has been established as lev- ority in terms of these factors. Grant et al. [11]
els of evidence, which mainly depends on the conducted a systematic review to evaluate the
quality of study design and the expected risk of level of evidence among published articles in
bias [5, 23, 28]. Although there are several ver- three major journals of sports medicine over the
sions of the hierarchy of evidence, the most fre- past 15 years. It was concluded that the percent-
quently used version is the one available from the age of level 1 and 2 studies had increased over
Oxford Centre for Evidence-Based Medicine time, and in 2010 nearly 25% of all studies were
website, www.cebm.net. In this version, the ran- level 1 or 2. Although level 4 and 5 studies had
domized controlled trials (RCTs) are regarded as decreased over time, these were still the most
the highest level of evidence (level 1) of individ- common level of evidence among sports medi-
ual studies, while expert opinions and uncon- cine literature (53% in 2010) [11]. Similarly,
trolled studies are considered as of the lowest another systematic review, specifically investi-
level (level 5). Controlled observational studies gating the level of evidence among literature
intake a position in the middle of these two, and related to anterior cruciate ligament reconstruc-
in turn, the levels are further depending on tion, concluded that a minority of published lit-
whether a prospective or retrospective approach erature between 1995 and 2011 were of level 1
to the study design has been undertaken. evidence (approximately 10%) [25]. However, it
Moreover, the assessment of level of evidence is important to point out that EBM does not
differs slightly depending on the study type, and solely rely on level 1 RCTs, which is a common
specific criteria have been established for each misunderstanding of the EBM [3]. All types of
study type. According to the systemic approach study designs contribute to EBM, where the
of the level of evidence assessment proposed in strength and disadvantages of every study design
the version by the Oxford Centre for Evidence-­ need to be considered when accumulating evi-
Based Medicine, study types are stratified in the dence. A valuable instrument for rating evidence
following groups: therapeutic studies, prognostic quality is the Grading of Recommendations
6 E. Svantesson et al.

Assessment, Development, and Evaluation


(GRADE) system [2]. The GRADE system has Quality of
changed the way to address the credibility of Code evidence Definition
B Moderate Further research is likely to
various aspects in studies and has provided a
impact or change current
standardized fashion to evaluate current evi- confidence in the estimate of
dence. The GRADE system includes not only effect based on:
appraisal of the study design but also a stringent  •  One high-quality study
 • Several studies with
evaluation of risk of bias, precision, variability
some limitations
in results between studies, applicability, effect C Low It is very likely that further
size, and dose-­response gradients. The GRADE research will impact or change
rating system is presented in Fact Box 1.2. When current confidence in the
systematically synthesizing the results of stud- estimate of effect based on:
 • One or more studies
ies, the GRADE system ensures that an in-depth with severe limitations
assessment is undertaken and enables informa- D Very low The estimate of effect is very
tion from different types of study designs to be uncertain based on:
evaluated, regardless of each study’s level of evi-   • Expert opinion
 • No direct research
dence. This is important since it precludes a evidence
myopic focus on the confidence in results based  • One or more studies
solely on study design, for example, the RCTs. with very severe
The RCTs remain the gold standard for deter- limitations
mining the efficacy of an intervention; however,
the RCTs are not without limitations. For exam-
ple, the generalizability of the results from RCTs
must be critically evaluated since patient enroll- 1.3.2 Evaluation of Hypothesis
ment is based on strict criteria and the trials are Testing
often performed in one part of the world, per-
haps in highly specialized centers. For evaluat- The evaluation of study power is an important
ing the effectiveness of an intervention in a part of the EBM concept. An underpowered trial
“real-world population,” studies of observational is subject to beta errors (Type II errors), and it
design are instead an important asset. Thus, we may have detrimental effects if such trials are to
should acknowledge the strengths and limita- impact clinical practice. A Type II error is present
tions of each study design, and evidence should when a study concludes no difference between
be established based on the cumulative results two interventions when such a difference in fact
from several types of study designs. exists. When conducting a study, the investiga-
tors should aim to have a sample size large
enough to minimize the risk of Type II errors
while maximizing the probability to conclude a
Fact Box 1.2: The GRADE Rating System
difference between two interventions when a real
Quality of difference exists. This is commonly referred to as
Code evidence Definition
A High It is unlikely that further
the power of a study. Investigators generally
research will change current accept a probability of 20% to conclude Type II
confidence in the estimate of errors, meaning that the correct conclusion will
effect. There are consistent be present in 80% of the time, corresponding to a
results in:
 • Several studies of high
study power of 80% (1 − β). A power analysis is
quality performed prior to study start to ensure that a
 • One large high-quality sample size large enough is enrolled in the trial.
multicenter trial A power analysis also ensures that the study is
feasible, since too large of a sample size can lead
1  What Is Evidence-Based Medicine? 7

to wasted resources, wasted time, and incomplete


studies. Furthermore, enrollment of more patients C— Relative to what should the
than necessary in a trial would be ethically incor- comparison intervention be compared with?
This could be another type of
rect. A priori power analysis is the most valid
intervention, current standard
method; however, such an analysis could also be practice, placebo, or no
performed after study completion to test the intervention at all
validity of the study results. Interestingly, O—outcome Define what measurements to use
Lochner et al. conducted a systematic review that for measuring the effects of the
intervention and the comparison.
assessed Type II error rates in randomized trials The outcome could be a direct
in orthopedic trauma and concluded that, among result of the intervention/
the 117 trials included, the Type II error rate was comparison and may also include
91% [17]. The high risk of concluding falsely side effects. Define primary and
perhaps secondary outcome
negative results among orthopedic trials empha- measurements that are valid,
sizes the importance of a critical appraisal of feasible, and reproducible
study power.

1.4  et Evidence-Based Research


L A specific and feasible clinical question will
Impact Clinical Work facilitate the next step, which is to find the best
available evidence.
A well-built research question is a fundamental In combination with a well-formulated ques-
component of EBM. However, a relevant research tion, a precise search strategy will facilitate find-
question can only be created through reflection of ing relevant literature for your research questions,
current practice and the identification of knowl- in an almost innumerable amount of available
edge gaps. For instance, the research question may scientific articles. The Medical Literature
address the presentation of a specific condition, Analysis and Retrieval System Online
prognosis, treatment, or diagnosis. A valuable aid (MEDLINE) database is considered one of the
for this could be to systematically identify impor- most comprehensive databases and is an excel-
tant components of the research question identi- lent primary choice for health-care providers
fied by the acronym PICO, which most commonly since it provides both primary and secondary lit-
is used in randomized controlled trials. The acro- erature for medicine. The identified articles
nym is defined by patient characteristics, interven- should thereafter be assessed based on study
tion, comparison, and outcome. The features of the design and level of evidence. Especially valuable
PICO are summarized in Fact Box 1.3. for a busy practicing clinician are the “filtered
resources,” which are highly rated in the hierar-
chy of evidence. Examples of filtered resources
are well-conducted systematic reviews, in which
Fact Box 1.3: The PICO to Formulate a synthesis of evidence has already been per-
Research Questions formed. A systematic review has been defined by
P—patient Define the population for which the Cochrane Collaboration as “A review of a
characteristics the investigation is aimed. This clearly formulated question that uses systematic
includes demographics, the clinical and explicit methods to identify, select, and criti-
characteristics (e.g., diagnoses and cally appraise relevant research, and to collect
conditions), and the clinical setting
I— What is the exact intervention
and analyze data from the studies that are
intervention aimed to be investigated? This included in the review. Statistical methods (meta-­
includes all types of treatments and analysis) may or may not be used to analyze and
diagnostic tests summarize the results of the included studies”
[12]. Although the information in the systematic
8 E. Svantesson et al.

review has already been scrutinized, it is the takes an evidence-based approach toward the
readers’ responsibility to critically evaluate the topic on his or her own. This requires skills of the
included studies and to relate them to the ratio- reader that may take time and education to mas-
nale of the systematic review and the specific ter. Nevertheless, to develop such skills is some-
question. A systematic review should be carried thing that is nearly mandatory in the modern
out based on strict, predefined inclusion and world of EBM, and increased understanding of
exclusion criteria for all eligible articles. The how to independently determine study validity
search strategy should be clearly illustrated, and and applicability is encouraged. Another usable
it should be obvious and reproducible how the instrument in the arsenal of evaluating study
articles have been assessed for inclusion and crit- quality is the strength of recommendation taxon-
ical quality appraisal [1, 14]. It is of importance omy (SORT) [8] system which is useful when the
that all articles found in the literature search are clinician has failed to find the requested informa-
evaluated objectively and that all articles meeting tion via randomized controlled trials, meta-­
eligibility criteria are included, regardless of analyses, or systematic reviews. The SORT
what results the studies present. Otherwise, there system could be used alone or as a complement
is an evident risk of selection bias in the to the level of evidence hierarchy for quality
­systematic review [20]. Finally, if a meta-analy- appraisal of individual studies and yields a code
sis is performed, the reader should evaluate data ranging from A to C where A represents the best
extraction, pooling of data, and the statistical possible evidence (Fact Box 1.4).
methodology for quantitative synthesis of data,
including data heterogeneity assessment. The
Preferred Reporting Items for Systematic Fact Box 1.4: The SORT Rating System
Reviews and Meta-Analyses (PRISMA) state- Code Definition
ment [19] has been developed to facilitate the A Consistent, high-quality, patient-oriented
evidence
conduction of a systematic review and could also B Inconsistent or limited-quality patient-
be applied when evaluating a systematic review. oriented evidence
In general, most journals require that the authors C Consensus, disease-oriented evidence, usual
have applied the PRISMA statement to publish a practice, expert opinion, or case series for
systematic review. Other valuable resources for studies of diagnosis, treatment, prevention,
or screening
systematic reviews and meta-analyses are, for
example, the Cochrane Database of Systematic
Reviews (the Cochrane Collaboration) and the
Database of Abstracts of Reviews of Effects After undertaking a thorough review of the lit-
(DARE; National Institute for Health Research). erature and appreciating what is the best available
A disadvantage of filtered resources is that it evidence, the most important thing for a clinician
takes time to conduct such a synthesis of evi- is to subsequently incorporate this into clinical
dence and that more recent research therefore is practice. This entails a combination of having the
at risk of being missed if solely relying on such actual requisites to provide such health-care and
sources of information. Moreover, primary stud- also to ascertain that the patient’s needs and pref-
ies related to, for example, newer conditions, erences are incorporated in the algorithm of treat-
interventions, or technologies are often insuffi- ment. A complete utilization of EBM relays on a
cient for a conduction of a systematic review. patient-centered health-care as well as a scientific
Unfiltered resources, or primary studies, provide approach to treatment. This means that the clini-
the most up-to-date research and can include cian has a responsibility to function as the expert
important conclusions that might have been pub- of the area, presenting evidence-based alterna-
lished since the filtered resources were published. tives to treatment, while the ultimate decision is
Finding the best available evidence among pri- characterized by a shared decision-making
mary studies demands that the clinician under- between the clinician and the patient.
1  What Is Evidence-Based Medicine? 9

1.5 The Future for EBM individually and to continuously reevaluate cur-


rent practice and perform further research when
Gordon Guyatt and Benjamin Djulbegovic knowledge gaps are identified.
recently published an article in The Lancet that
reviewed the historical progress in EBM and Take-Home Message
highlighted the future directions of EBM [7]. • The practice of EBM is characterized by three
The authors emphasized that the EBM move- equally important fundamental principles: the
ment has led to several related initiatives, such as best available research, the clinical experi-
measurements of the quality of care, registration ence, and the patient’s perspective.
of trials, improved publishing standards, discon- • The best available research should be evalu-
tinuation of inaccurate interventions in clinical ated while acknowledging the strengths and
practice, and identification of over- or underdi- limitations of each study design, and evi-
agnosis and over- or undertreatment. However, dence should be established based on the
they also emphasized that EBM still has several cumulative results from several types of
future challenges to face and that EBM needs to study designs.
evolve [7]. One main challenge is to ensure rapid • There are several useful practical tools that
and efficient production of systematic reviews could aid in critical appraisal of the literature.
and practice guidelines, matching the pace of • The concept of EBM is an essential part of the
which new evidence is published today. This modern medicine and a continuous process,
ambition requires experienced research teams including reevaluation of current practice, to
dedicated to produce comprehensive summaries identify knowledge gaps for further research.
of evidence rapidly, which would be facilitated
by electronic interventions such as electronic
platforms and automatized and text mining soft- References
ware [22]. The authors concluded that EBM
must be developed alongside the technological 1. Aromataris E, Fernandez R, Godfrey CM, Holly C,
Khalil H, Tungpunkom P.  Summarizing system-
evolution in terms of the use of different types of
atic reviews: methodological development, conduct
electronical devices to access medical records and reporting of an umbrella review approach. Int J
and data, as well as social media to reach out to Evid Based Healthc. 2015;13(3):132–40. https://doi.
the patients and the society. Furthermore, an org/10.1097/xeb.0000000000000055.
2. Atkins D, Best D, Briss PA, Eccles M, Falck-­Ytter Y,
established theory of health-care decision-mak-
Flottorp S, et al. Grading quality of evidence and strength
ing is yet to be generated, and the authors stress of recommendations. BMJ. 2004;328(7454):1490.
that collaboration with other disciplines, such as https://doi.org/10.1136/bmj.328.7454.1490.
cognitive and decision sciences, is necessary to 3. Bhandari M, Giannoudis PV. Evidence-based medicine:
what it is and what it is not. Injury. 2006;37(4):302–6.
reach this goal. Clinicians should be provided https://doi.org/10.1016/j.injury.2006.01.034.
with practical tools that facilitate shared deci- 4. Bhandari M, Jain AK.  The need for evidence-based
sion-making in terms of making it both efficient orthopedics. Indian J Orthop. 2007;41(1):3. https://
and a positive experience for both clinicians and doi.org/10.4103/0019-5413.30517.
5. Bhandari M, Swiontkowski MF, Einhorn TA, Tornetta
patients [7]. P 3rd, Schemitsch EH, Leece P, et  al. Interobserver
Progress of EBM is certain, and it will con- agreement in the application of levels of evidence
tinue to be a cornerstone for providing the best to scientific papers in the American volume of the
possible health-care worldwide. Thus, EBM is Journal of Bone and Joint Surgery. J Bone Joint Surg
Am. 2004;86-a(8):1717–20.
considered as an essential part of medicine, 6. Codman EA.  The classic: a study in hospital effi-
which entails that it is required for clinicians and ciency: as demonstrated by the case report of first
researchers to embrace the concept and to learn five years of private hospital. Clin Orthop Relat
how to master it. Situations where EBM and clin- Res. 2013;471(6):1778–83. https://doi.org/10.1007/
s11999-012-2751-3.
ical experience contrast might still occur, which 7. Djulbegovic B, Guyatt GH.  Progress in evidence-­
stresses the importance of evaluating each case based medicine: a quarter century on. Lancet.
10 E. Svantesson et al.

2017;390(10092):415–23. https://doi.org/10.1016/ 18. Matthews J.  Quantification and quest for medical cer-
s0140-6736(16)31592-6. tainty. Princetion, NY: Princeton University Press; 1995.
8. Ebell MH, Siwek J, Weiss BD, Woolf SH, Susman J, 19. Moher D, Liberati A, Tetzlaff J, Altman DG. Preferred
Ewigman B, et al. Strength of recommendation taxon- reporting items for systematic reviews and meta-­
omy (SORT): a patient-centered approach to grading analyses: the PRISMA statement. J Clin Epidemiol.
evidence in the medical literature. J Am Board Fam 2009;62(10):1006–12. https://doi.org/10.1016/j.
Pract. 2004;17(1):59–67. jclinepi.2009.06.005.
9. Evans D.  Hierarchy of evidence: a framework for 20. Murad MH, Montori VM, Ioannidis JP, Jaeschke R,
ranking evidence evaluating healthcare interventions. Devereaux PJ, Prasad K, et al. How to read a system-
J Clin Nurs. 2003;12(1):77–84. atic review and meta-analysis and apply the results to
10. Evidence-Based Medicine Working Group. Evidence-­ patient care: users’ guides to the medical literature.
based medicine. A new approach to teaching the prac- JAMA. 2014;312(2):171–9. https://doi.org/10.1001/
tice of medicine. JAMA. 1992;268(17):2420–5. jama.2014.5559.
11.
Grant HM, Tjoumakaris FP, Maltenfort MG, 21. Oxman AD, Sackett DL, Guyatt GH.  Users’ guides
Freedman KB.  Levels of evidence in the clinical to the medical literature. I.  How to get started. The
sports medicine literature: are we getting better over Evidence-Based Medicine Working Group. JAMA.
time? Am J Sports Med. 2014;42(7):1738–42. https:// 1993;270(17):2093–5.
doi.org/10.1177/0363546514530863. 22. Paynter R, Banez LL, Berliner E, Erinoff E, Lege-­
12. Green SHJ.  Glossary of terms in the cochrane col- Matsuura J, Potter S, et  al. EPC methods: an
laboration. The cochrane collaboration. 2015. http:// exploration of the use of text-mining software in
www.cochrane.org/sites/deafault/files/uploads/glos- systematic reviews. Rockville, MD: Agency for
sary.pdf. Healthcare Research and Quality (US); 2016.
13. Guyatt G, Rennie D, Meade OM, Cook JD.  Users’ 23. Sackett DL.  Rules of evidence and clinical recom-
guide to the medical literature: a manual for evidence-­ mendations on the use of antithrombotic agents.
based clinical practice. Boston, MA: McGraw-Hill; Chest. 1986;89(Suppl 2):2s–3s.
2014. 24. Sackett DL, Rosenberg WM, Gray JA, Haynes RB,
14. Harris JD, Quatman CE, Manring MM, Siston RA, Richardson WS. Evidence based medicine: what it is
Flanigan DC.  How to write a systematic review. and what it isn't. BMJ. 1996;312(7023):71–2.
Am J Sports Med. 2014;42(11):2761–8. https://doi. 25. Samuelsson K, Desai N, McNair E, van Eck CF,

org/10.1177/0363546513497567. Petzold M, Fu FH, et al. Level of evidence in anterior
15. Jackson R, Ameratunga S, Broad J, Connor J, Lethaby cruciate ligament reconstruction research: a system-
A, Robb G, et al. The GATE frame: critical appraisal atic review. Am J Sports Med. 2013;41(4):924–34.
with pictures. Evid Based Med. 2006;11(2):35–8. https://doi.org/10.1177/0363546512460647.
https://doi.org/10.1136/ebm.11.2.35. 26. Straus ES, Glasziou P, Richardson W, Haynes

16. Lind J. A treatise of scurvy. In three parts. Containing R.  Evidence-based medicine. How to practice and
an enquiry into the nature, causes and cure, of that dis- teach EBM. Edinburgh: Churchill Livingstone; 2005.
ease. Together with a critical and chronological view 27. Watts G. Let’s pension off the “major breakthrough”.
of what has been published on the subject. Edinburgh: BMJ. 2007;334(Suppl 1):s4. https://doi.org/10.1136/
Printed by Sands, Murray, and Cochran; 1753. bmj.39034.682778.94.
17. Lochner HV, Bhandari M, Tornetta P 3rd. Type-II 28. Wright JG, Swiontkowski MF, Heckman JD. Introducing
error rates (beta errors) of randomized trials levels of evidence to the journal. J Bone Joint Surg Am.
in orthopaedic trauma. J Bone Joint Surg Am. 2003;85-a(1):1–3.
2001;83-a(11):1650–5.
What Is the Hierarchy of Clinical
Evidence?
2
Vishal S. Desai, Christopher L. Camp,
and Aaron J. Krych

2.1 Introduction: Why Do and strength stratification of published literature.


We Need a Hierarchy? Many journals have now begun requiring authors
to declare a level of evidence for their submitted
Not all evidence is created equal. In the age of work [48].
information, when data and results can be dis-
seminated within seconds, it remains ever-­
important to ensure that evidence is accurately 2.2  he Hierarchy of Evidence
T
organized and graded. In order for clinical and Study Selection
research to be translated into improving the qual-
ity of patient care and provide practice-changing The hierarchy of clinical evidence is traditionally
evidence, a researcher/provider must be able to introduced as a pyramid that allows the reader to
efficiently distinguish the quality and application organize study design by the level of evidence it
of the information they are reading [22]. From yields. Figure 2.1 is a commonly seen rendering
the researcher’s perspective, an organized hierar- of this important principle. This basic framework
chy allows him or her to select the optimal study has been presented and corroborated by several
design for their clinical question, as will be dem- authors since initially appearing in the literature
onstrated later in this chapter. And from the read- as early as 1979 [19, 23, 40]. Primary, or unfil-
er’s perspective, it allows for results to be tered, evidence involves the direct observation
interpreted in a clinically appropriate context and and interpretation of data. Secondary, or filtered,
be compared on the basis of strength relative to evidence relies on the reorganization and inter-
the results of other investigators [41]. Selection pretation of data that has already been published
of the appropriate study type can serve to mini- in a primary study in an attempt to efficiently
mize study bias and improve quality, validity, and draw conclusions based on larger sums of data.
reliability. This minimization in bias ultimately While the hierarchy does serve to delineate “pri-
has the potential to translate into improved clini- mary” from “secondary” evidence and visually
cal decision-making [21]. There is a growing illustrates the relative strength of study types [1],
consensus for the importance of effective quality it does not address how to apply a particular
study design to the clinical question of the inves-
V. S. Desai · C. L. Camp · A. J. Krych (*) tigator. Use of the “PICO framework” (which
Department of Orthopedic Surgery and Sports stands for Patient, Problem, Population;
Medicine, Mayo Clinic, Rochester, MN, USA
Intervention; Comparison, Control; Outcome)
e-mail: Desai.vishal@mayo.edu; Camp.christopher@
mayo.edu; krych.aaron@mayo.edu may guide development of an appropriate clinical

© ISAKOS 2019 11
V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_2
12 V. S. Desai et al.

question to which a study design can later be tai- 2.3 Primary/Unfiltered Research
lored. This model serves to clearly outline key
variables of the question so that subsequent data 2.3.1 Expert Opinion
collection and analysis can be performed in a
focused manner [26, 43] (Fig.  2.2: Patient, Prior to popularization of evidence-based medicine
Problem, Population; Intervention; Comparison, and peer review, expert opinion was the predomi-
Control; Outcome). Focused around the topic of nant form of “clinical evidence” used to communi-
anterior cruciate ligament (ACL) reconstruction cate findings to a larger audience. In fact, a
outcomes, we will introduce and parse out the significant portion of orthopedic literature that has
hierarchy of clinical evidence as well as demon- been influencing therapeutic decision-­making of
strate how its components can be applied differ- surgeons for many years is based in expert opinion
ently to answer clinical questions. Let us begin and case series [18]. As it is reliant on the observa-
first with a discussion of primary (unfiltered) tions and experiences of an individual (or group)
research including expert opinion, case reports, and may lack a reproducible or methodical design,
cross-sectional study, case-control study, cohort some argue that it does not belong in the hierarchy
study, and randomized controlled trials. of evidence at all [14]. To reduce the dependence of

Fig. 2.1  Pyramid of


evidence-based
medicine
Systematic
Reviews
Filtered
Critically-Appraised Information
Topics
(Evidence Syntheses)

Critically-Appraised Individual
Articles (Article Synopses)
ce
en
vid

Randomized Controlled Trials


fE
yo
alit

Cohort Study
Qu

Unfiltered
Case-Control Study Information

Cross Sectional Study

Case Report/Case Series

Background Information/Expert Opinion

P I C O
Patient
Population Intervention Comparison Outcome
Problem

Fig. 2.2  The PICO format allows the examiner to methodically develop all components of a complete clinical
question
2  What Is the Hierarchy of Clinical Evidence? 13

the results on simply the observations of an indi- 3. An “outlier” with features strikingly outside
vidual, expert opinion can be strengthened by pool- the realm of what is usually seen with a par-
ing together and appropriately summarizing the ticular disease
opinions of multiple, well-respected members of a 4. An unexpected response or course suggesting
scientific community. Consider the following by a previously unrecognized therapeutic or
Zantop et al. [49] in which 20 panelists from around adverse effect of intervention
the world were surveyed on their preferential tech-
nique and rehabilitation protocol for ACL recon- In the realm of orthopedic surgery, surgical
struction. The purpose of the study was to complications may be of significance. Although
summarize the collective practices of this panel of surgical complications remain a traditionally
experts to inform clinicians of the current technical underreported topic in orthopedic literature, case
conventions predominantly used in ACL recon- reports may serve to inform surgeons of potential
struction. The authors defined “expert” as those risks and how to avoid them. For example, a case
who were identified in the specific subject matter to report by Heng et  al. illustrated a 35-year-old
have contributed to the National Library of man who developed a peri-ACL, distal femoral
Medicine or present clinical evidence at interna- fracture running through the two femoral tunnels
tional meetings [49]. Expert opinions provide an following double-bundle ACL reconstruction.
efficient way for the clinical community at large to Furthermore, the authors go on to identify this
be updated on the current trends and pitfalls of case as the first of its kind following a thorough
practice in the community at large. However, based literature review [24]. On the one hand, a clini-
on the design of expert opinion articles, it becomes cian, after having read this, may refine surgical
clear that panelist bias, small sample sizes, and lack technique or be more cognizant of specific risk
of a rigorous data collection protocols limit their factors in his/her patients that may predispose
overall impact and relative significance [14, 47, them to a similar outcome. However, although
49]. Despite their shortcomings, the information the authors did allude to a possible mechanism of
presented in a high-quality, thorough expert opin- this complication, without an appropriately con-
ions may serve to have more practical utility than ducted scientific experiment, the basis of their
that of a poorly designed clinical study, and there- commentary largely remains conjecture. Case
fore, they will remain a crucial component of the reports may inspire a testable hypothesis for
medical literature [47]. future studies to execute and be successful in pro-
viding short and succinct learning points for the
reader [17]; however, they provide limited value
2.3.2 Case Report when attempting to investigate and answer spe-
cific clinical questions. Further limitations of
Case reports document events or outcomes of case reports include a lack of generalizability, the
individual or small groups of patients that the inability to establish cause-effect relationships,
author finds to be of significance. Whether or not publication bias, and the risk of “overinterpret-
the presented findings are positive or negative, ing” the findings beyond their simple anecdotal
they may be useful in dictating future patient intent [38].
care [14]. Huth outlined four major applications
of case reports in his discussion on medical
­literature [27]: 2.3.3 Cross-sectional Study

1. A unique case that may represent a previously The cross-sectional study attempts to answer the
unknown syndrome or disease question, “what is happening right now?” One
2. A case with the previously unreported associ- of the most common applications of the cross-­
ation of two distinct diseases, suggesting a sectional study is in determining the prevalence
possible relationship between them of a condition or diagnosis at a particular time.
14 V. S. Desai et al.

Let us return to our theme of ACL reconstruc- ferences in exposures and risk factors between
tion and consider the following cross-sectional the two groups. Because case-control studies
study. Farber et  al. [16] investigated, “Which begin with the outcome, they are always inher-
management strategies for ACL injury are cur- ently retrospective in design. In addition, because
rently the predominant forms of therapy applied the patient outcome is selected by the investiga-
by team physicians in Major League Soccer tor, case controls serve as the ideal study design
(MLS)?” The authors selected a focused ques- for analyzing root causes and risk factors of rare
tion that attempts to answer what is currently diseases. For example, you may have identified a
happening rather than trying to establish a tem- group of your patients who failed single-bundle
poral, causative, or correlative relationship of (SB) ACL reconstruction. And as you think about
any sort. In order to answer this question, the why this may have happened, the question arises,
investigators sent out a survey which required “what predisposed these patients to failing sur-
respondents to provide their particular surgical gery?” A well-designed case control, such as the
technique when treating MLS players [16]. It is following, may be suited to answer this question.
important to note that while this study may have Parkinson et  al. [39] identified 123 SB ACL
employed a survey-­ based model just as the reconstruction patients with long-term follow-up
expert opinion example did, they can be differ- who met their inclusion criteria. Patients who
entiated by how participants were selected and demonstrated failure at 2-year follow-up had
the types of questions they are trying to answer. higher rates of medial and lateral meniscus defi-
Rather than a more loosely defined group of ciency, shallower femoral tunnel positioning, and
“experts,” cross-­sectional studies identify a spe- younger patient age. The authors were able to
cific group, in this case MLS team physicians, conclude that medial meniscus deficiency was
and select a specific question for them to answer. the most significant factor to predict graft failure
Their responses can then be statistically pooled in SB anatomic ACL reconstructions [39] (Fact
and homogenously formatted. The primary Box 2.1: Odds Ratio). While they serve a very
shortcoming of cross-­sectional studies is their important role in clinical research, case-control
inability to generate temporal or causative rela- studies are not without limitation. In particular,
tionships between exposures and outcomes. In they are prone to recall bias in which there are
our example case, although the study was able discrepancies in the accuracy and thoroughness
to identify which reconstruction techniques are by which prior exposures are recalled among
currently prevalent among MLS team physi- study participants [30]. For example, in a study of
cians, it does not begin to address the relative risk factors for chronic ACL deficiency, partici-
outcomes of the patients who underwent these pants with ACL pathology are likely to more thor-
interventions nor the risk factors that may have oughly search their memories for injuries and
placed them at risk for the injury in the first underlying causes compared to their unaffected
place. Therefore, while they do not necessarily counterparts, thus creating bias in the collected
serve to show trends or relationships between data. Another important shortcoming of case-­
multiple variables, cross sections are effective control studies is their limited ability to demon-
for gathering and presenting large volumes of strate causation. As causation is typically best
generalizable data. accomplished in prospective studies, the retro-
spective nature of a case control makes it a poor
indicator for causal inference between exposure
2.3.4 Case-Control Study and outcome. In the case of our above example,
the study we highlighted more effectively showed
Case-control studies begin with a sample of a correlation between medial meniscus defi-
patients in whom a selected outcome has already ciency and ACL reconstruction failure rather than
occurred and been identified. Patients with and showing that medial meniscus deficiency caused
without this outcome are then compared for dif- a failure in ACL reconstruction.
2  What Is the Hierarchy of Clinical Evidence? 15

questionnaires and function scores were read-


Fact Box 2.1: Odds Ratio ministered, and outcomes between those who
Odds ratio (OR) is a commonly applied sta- underwent an autograft and allograft revision
tistical tool in case-control studies that ACLR were compared [13]. In other words, the
illustrates how strongly a given characteris- authors selected a group of patients with certain
tic of a subject is associated with the pres- exposures (autograft vs allograft), followed them
ence or absence of another characteristic of over a course of time, and finally evaluated them
that subject. In the case of a study examin- for how their long-term outcomes compared.
ing risk factors for a disease, it answers the In retrospective cohorts, however, the exam-
question, “what are the odds that those who iner begins by collecting patient exposure infor-
developed the disease were exposed to the mation from some time in the past and then
risk factor compared to the odds that those analyzes the outcomes that developed between
who developed the disease were NOT that time and the present time [4]. Kim et al. com-
exposed to the risk factor?” pared functional outcomes and stability of ACL
reconstructions of double-bundle versus single-­
bundle ACL remnant pullout suture techniques
using this study design. The examiners selected
2.3.5 Cohort Study 44 patients who underwent single-bundle recon-
struction with remnant tensioning (Group 1) and
Whereas a case control seeks to answer the ques- 56 patients who underwent double-bundle recon-
tion of “what happened that caused this out- struction (Group 2) 5–8 years prior to the begin-
come?”, cohort studies serve to answer the ning of the study; then, outcomes were measured
question of “what will happen?” Instead of con- at a minimum of 3 years after the surgical date,
trolling the intervention, the researcher instead and no statistically significant difference was
observes the outcomes in the sample being stud- identified between the two groups [29]. Their
ied. This makes cohort studies suitable for study- question sought to determine how two groups of
ing the natural progression of exposures, diseases, patients, treated by different interventional tech-
and risk factors [4]. Unlike case-control studies, niques in the past (exposure), progressed over a
cohorts are unique in that they can be executed period of time and how their outcomes compared;
both prospectively and retrospectively [4]. We thus, a retrospective cohort proved to be the opti-
have compared and contrasted two cohort studies mal choice.
to illustrate this principle. Either of the above two orientations may
First, let us consider the following prospective serve to provide concrete outcome data between
cohort in which the examiner collects patient two groups of patients (Fact Box 2.2: Relative
exposure information at the beginning of the Risk). In the context of orthopedic surgery,
study and then analyzes the outcomes at a time cohorts are especially useful in providing
after that [4]. In 2014, the Multicenter ACL strong, clinically validated support for the
Revision Study (MARS) Group enrolled 1205 effectiveness of surgical interventions.
patients to understand the effect that graft choice Although cohort studies may be cumbersome
has on outcomes in patients undergoing revision and costly to execute, especially when prospec-
ACL reconstruction (ACLR). They hypothesized tive, they have the benefit of providing a tempo-
that an autograft reconstruction would result in ral sequence of events between exposure and
increased sports function and activity levels and outcome and, as a result, render compelling,
decreased osteoarthritis symptoms and graft fail- clinically useful findings including potential
ure rates compared to allograft reconstruction. causation. In fact, it has been suggested that a
Baseline questionnaires and function scores were rigorously designed cohort study has the poten-
measured in all patients; after following these tial to yield results similar to that of the more
patients for a minimum of 2  years, these same powerful randomized controlled trial [4].
16 V. S. Desai et al.

clinical equivalency between the two techniques


Fact Box 2.2: Relative Risk [32]. Due to the deliberate and focused design of
Relative risk (RR), similar to the odds ratio, RCTs, it has been suggested that they can demon-
is a statistical tool more commonly applied strate causality between the exposure and out-
in cohort studies and represents the com- come more so than other study designs can [33].
pared probabilities of an outcome occurring As the name implies, randomization is one of
in the exposed versus unexposed group. In the most important features of RCTs. By “ran-
the case of a study examining outcomes of domly” placing subjects in different experimen-
certain risk factors, it answers the question, tal groups, the examiner is able to not only
“what is the risk of developing a disease in decrease potentially biasing factors but also
the group who was exposed to a risk com- improve the internal validity of the study [45].
pared to the risk of developing the same When random allocation is performed properly,
outcome in a group who was NOT exposed the participants that are assigned to the treatment
to the risk factor?” Although the distinction and control arms of a study are placed there by
between RR and OR may appear subtle, it is chance and no preference of the investigator; this
nonetheless a crucial one. minimizes the burden that known and unknown
patient factors may have on the results of the
study and provides the highest quality of data.
Although there are many different protocols that
2.3.6 Randomized Controlled can be applied to carry out effective randomiza-
Trial (RCT) tion, as discussed by Kim et al., they all attempt
to achieve this same basic objective of study
RCTs are among the most robust experimental design [28].
study designs and have been described in the past When discussing RCTs, an understanding of
as the “gold standard” of evidence-based medi- the “intention-to-treat (ITT)” principle is essen-
cine [45]. Examiners place the study subjects in tial. Simply put, it states that regardless of patient
either a control or intervention group; thus, noncompliance, deviations from the experimen-
because the examiner is in the position to regu- tal protocol, or patient withdrawal from a study, a
late the exposure or interventional strategy in patient should remain a member of the original
question, RCTs are inherently prospective [14]. group, whether it be control or experimental, that
A quintessential RCT may be organized to he/she was assigned to [20, 36]. Why is this sig-
answer the question of whether or not a differ- nificant? ITT can serve to minimize bias and
ence exists between an exposure and control (or improve the validity of the data presented by the
an alternative exposure) group when compared examiner in the event of patient noncompliance
prospectively. Let us consider the following RCT or patient withdrawal [20]. Let us illustrate this
by Mayr et al. [32]. The authors sought to answer principle with an example using our recurring
the question: “if a group of patients in need of theme of ACL reconstruction to further elucidate
undergoing ACL reconstruction were random- the concept. Consider, for example, a group of
ized to receive either a single-bundle technique 100 patients who have suffered ACL injury. Of
or double-bundle technique, how would their these, 50 are randomly assigned to undergo sur-
outcomes differ at long-term follow-up?” Sixty-­ gical reconstruction (experimental) whereas the
four patients were randomly and evenly split to other 50 are assigned conservative treatment and
undergo one of the two surgical techniques. At a physical therapy (control). Patients are con-
minimum of 2 years after the operation, patients sented, and the plan is to evaluate the patients
were both subjectively and objectively evaluated 2 years from now to see how both groups fared
for knee function. Their results showed no statis- over that period of time. However, during the
tically significant difference in outcomes between experimental period, ten patients from the con-
the two interventional groups, implying a relative trol group feel that they are not improving and
2  What Is the Hierarchy of Clinical Evidence? 17

decide to seek surgical intervention through which group they were originally assigned to
another avenue. What now are we to do with this [15]. The SPORT trial illustrates side by side how
data which has been altered by patient movement these two analytical methods can yield drasti-
between our groups? The ITT principle contends cally different results from the same patient set.
that patients should be analyzed in the group to In addition to the ITT analysis performed (as
which they were initially randomized. If the ten described above), the examiners also analyzed
patients who left the control group are excluded the data using the as-treated method which
from its analysis, the remaining 40 patients in showed strong, statistically significant advan-
that group now represent a clinically healthier tages for surgery at all follow-up times through
and skewed sample of patients thus biasing the 2 years [46]. While this does go to show that ITT
results. In other words, patients who are faring can underestimate the true effect of surgery in
more poorly in the control group are more likely certain cases, one clear limitation of as-treated
to abandon it. The Spine Patient Outcomes analysis is that it does not provide the same pro-
Research Trial (SPORT) of 2006 beautifully tection against confounding variables that can be
illustrates a real-life example of this dilemma. A achieved by following through with the original
group of 501 patients who were all surgical can- randomization protocol [46].
didates for lumbar discectomy for intervertebral Despite representing the highest level of pri-
disk herniation were randomized to either mary evidence, RCTs are not without shortfalls. A
undergo standard open discectomy or non-­ discussion of the limitations of RCTs would be
operative treatment for their persistent signs and incomplete without an understanding of “clinical
symptoms. Although patients were initially ran- equipoise.” This principle states that the examiner
domized evenly between the two groups, only and medical community at large genuinely cannot
50% of patients assigned to the surgery group definitely say that one arm of an experiment is
had undergone surgery at 3 months; in this same superior to the other. Ideally, RCTs performed
3-month period, 30% of those assigned to the under the assumption of clinical equipoise uphold
non-operative arm of the study underwent sur- a higher ethical benchmark as they prevent the
gery. The examiners used ITT analysis on the deliberate withholding of a treatment from
data which did show small and not statistically patients which is otherwise known to be superior
significant improvement in the surgical group to the control arm of the study [7, 37]. Equipoise
over the nonsurgical. However, the substantial is necessary to ethically justify the use of random-
crossover of patients between the two groups led ization to select which treatment (or lack thereof)
the author to conclude that superior or equiva- a patient will receive in a situation where the opti-
lence of treatments could not be established and mal treatment method is not yet agreed upon by
thus illustrated a key limitation of executing the expert community [12]. This concept, in the
RCTs [46]. If all patients are analyzed in the context of orthopedic surgery, can be well illus-
groups to which they are initially assigned, as per trated with a discussion of sham surgery. Sham
ITT analysis, examiners can improve protocol surgeries are analogous to “placebo drugs” in that
adherence and decrease bias. As a result of imple- they create the illusion to a patient that he or she
menting this principle, the examiner’s findings is being treated; however, they in fact omit the
tend to become more conservative as non-­ therapeutically significant step in surgery [37].
compliant and withdrawn patients potentially On the one hand, they allow us to accurately com-
dilute positive data [20]. ITT is not without limi- pare the efficacy of the treatment arm; but on the
tations, and critics have argued that it may fail to other hand, they raise the possibility of denying
provide an accurate picture of the data if large appropriate therapy to patients of the control arm.
portions of patients change groups [25, 44]. An Current literature holds that within the confines of
alternative to ITT is “as-treated” analysis in ethical practices, equipoise, and clear, informed
which subjects are compared with the treatment consent, sham surgeries do have a place in ortho-
regimen that they have received regardless of pedic clinical trials [12, 34].
18 V. S. Desai et al.

A major challenge with RCTs lies in the orthopedic literature and a preponderance of sys-
expense, man power, and resources needed to tematic reviews substantiated by nonrandomized
perform them. In the absence of ethically man- and heterogeneous studies; they recommended a
aged protocols and high reporting standards, degree of caution when interpreting findings of
RCTs may provide erroneous information which secondary evidence [2].
can subsequently compromise patient care.
Despite being considered the “gold standard” of
study design, RCTs are not the only option to 2.4.1 Systematic Review
generate reliable and evidence-based results. It is
recommended that if clinical questions can be Systematic reviews offer a comprehensive collec-
answered accurately and more efficiently with a tion of all literature available on a particular topic
so-called “lower-impact” study design (as the followed by a discussion and analysis of findings.
ones discussed above), those avenues should be They allow the examiner to answer the question,
considered [6]. “What does the current literature, as a whole, say
about my question?” Returning to our theme of
ACL reconstruction, consider the following. In
2.4 Secondary/Filtered Research assessing how outcomes of ACL reconstruction
surgery differ between male and female patients,
Secondary evidence primarily takes the form of Ryan et al. performed a systematic review of lit-
two study designs: the systematic review and the erature and found 13 studies that met their inclu-
meta-analysis. Rather than searching for and sion criteria. Stratifying their findings based on
interpreting data directly, secondary evidence “graft failure risk,” “contralateral ACL injury
relies upon pooling of primary evidence that is risk,” “knee laxity,” and “patient-reported out-
already available and organizing it in a meaning- comes,” their review found no evidence of clini-
ful and novel fashion. Because it represents a cally important difference in outcomes between
summary of the best available evidence on a par- male and female patients [42]. While systematic
ticular topic [21], secondary evidence is tradi- reviews do allow for a simplified, qualitative dis-
tionally classified as the highest level on the cussion of large volumes of data, what they gen-
hierarchy of evidence. Despite the perceived erally lack is a head-to-head statistical analysis.
value of secondary evidence, critics have called Another limitation of systematic reviews is that
into question the validity of its findings and sug- while they do require an exhaustive search of lit-
gest that the medical community may be over- erature, the inclusion criteria of studies that ulti-
valuing their contribution compared to mately make it to the analysis stage can be
well-designed, primary studies [2, 3, 5, 8]. The doctored to the point that you now portray a
inclusion of methodologically flawed and non- biased perspective on an otherwise generalizable
randomized trials into systematic reviews, due to question. In other words, the author has the free-
the unavailability of high-quality RCTs in the dom to alter the inclusion criteria to include (or
orthopedic literature, has been a recent topic of exclude) studies that may reinforce their
discussion [5]. The findings from a systematic argument.
review or meta-analysis are only as strong as the
studies that comprise it; thus, the inclusion of
low-quality studies compromises the validity of 2.4.2 Systematic Review
reported results. The inclusion of clinically irrel- with Meta-analysis
evant studies and “old” information has weak-
ened the quality of conclusions, if any, that can be Like a systematic review, meta-analysis also
drawn from many meta-analyses. As a result, begins with a comprehensive collection of litera-
they do little to advance the standard of care or ture pertinent to the examiner’s question.
best practices [8]. Audigé et  al. reviewed the However, in order to be classified as meta-­
2  What Is the Hierarchy of Clinical Evidence? 19

analysis, the examiner must be able to pool statis- design are highly encouraged to review and
tics [14]. Therefore, meta-analysis requires a adhere to the PRISMA guidelines prior to begin-
relative degree of homogeneity in data. When ning. To see an outline of how to systematically
studies are combined in a meta-analysis, they are filter the literature to isolate the studies to more
generally done using either a “fixed effects” thoroughly examine for inclusion in a study,
model or “random effects” model. In the Mantel-­ please refer to the Additional Resources section.
Haenszel fixed model, the examiner assumes that
one true effect size underlies all studies in the
analysis and that any differences among the 2.5 Levels of Evidence
included studies are due to chance [31]. In the
DerSimonian and Laird random effects model, Although the “levels of evidence” will be dis-
the degree of heterogeneity between studies is cussed at length in Chap. 5, we felt they merited
considered to render a fixed effect model implau- a brief mention here as they are modeled off of a
sible and thus is the optimal selection in evaluat- similar structure as the hierarchy of evidence.
ing heterogeneity between studies [10]. Figure  2.3 adapted from an editorial by Wright
Because data on the same subject matter may be et  al. in the Journal of Bone and Joint Surgery
presented using different metrics, units, outcome elegantly outlines the levels of evidence and the
scores, etc., meta-analysis is more difficult to exe- study designs that underlie them [48]. Not only
cute. Say one were interested in finding a compre- do they demonstrate the relative strength of data
hensive analysis comparing outcomes scores and generated by different types of studies; they serve
physical exam findings for all patients who under- to uniformly grade the quality recommendations
went single-bundle versus d­ouble-­ bundle ACL derived from that data.
reconstruction that are documented in the litera-
ture. A meta-analysis by Desai et al. identified 970
patients from 15 studies which met inclusion crite- 2.6 Trends in the Evolution
ria and allowed the examiner to compare a homog- of Orthopedic Literature
enized data set. Their results showed a statistical
improvement in knee kinematics of the double-bun- Over the past several decades, orthopedic litera-
dle technique but no significant improvement over ture has continued to change in both quantity and
single-bundle reconstruction on clinically mean- quality. Cunningham et al. [9] published a 2013
ingful outcomes [11]. Just as in systematic reviews, study in which they reviewed literature in eight of
meta-analysis is also limited by its vulnerability to the highest impact journals in orthopedic medi-
bias based on the author’s decision to include or cine over the past 10 years to identify trends in
exclude studies. the volume and level of evidence that has been
published. In short, they found that the annual
volume of published studies continuously
2.4.3 PRISMA increased between the years 2000 and 2010 and
the number of Level I and Level II studies signifi-
How does one go about conducting an exhaustive cantly increased during that same period.
review of literature? In 2009, Moher et  al. However, despite these increases in volume, they
released the “preferred reporting items for sys- reported that much of the literature still contains
tematic reviews and meta-analyses” (PRISMA) Level III and Level IV studies and advocated for
statement. The article outlines a meticulous researchers to publish the highest level of evi-
methodology by which to design and execute dence available for a given question [9] (Fig. 2.4).
systematic reviews and meta-analyses with the These findings were later corroborated in a study
intent of improving and standardizing the quality by Grant et al. [18] who also identified an increase
of secondary evidence available in the literature in the volume of Level I and Level II evidence
[35]. Researchers preparing a manuscript of this studies. In addition, they highlighted that the
20 V. S. Desai et al.

Levels of Evidence for Primary Research Question


Therapeutic Studies – Prognostic Studies – Diagnostic Studies –
Investigating the Investigating the Investigating a
Results of Treatment Outcome of Disease Diagnostic Test

Level I 1. Randomized controlled trial 1. Prospective study1 1. Testing of previously developed diagnostic
a. Significant difference 2. Systematic review2 of Level-I studies criteria in series of consecutive patients
b. No significant difference but narrow (with universally applied reference
confidence intervals “gold” standard)
2. Systematic review2 of Level-I randomized 2. Systematic review2 of Level-I studies
controlled trials (studies were
homogeneous)
Level II 1. Prospective cohort study3 1. Retrospective study4 1. Development of diagnostic criteria
2. Poor-quality randomized controlled trial 2. Study of untreated controls from a on basis of consecutive patients
(e.g., 80% follow-up) previous randomized controlled trial (with universally applied reference
3. Systematic review2 3. Systematic review2 of Level-II studies “gold” standard)
a. Level-II studies 2. Systematic review2 of Level-II studies
b. nonhomogeneous Level-I studies
Level III 1. Case-control study5 Case series 1. Study of nonconsecutive patients
2. Retrospective cohort study4 (no consistently applied reference
3. Systematic review2 of Level-III studies “gold” standard)
2. Systematic review2 of Level-III studies
Level IV Case series (no, or historical, control group) 1. Case-control study
2. Poor reference standard
Level V Expert opinion Expert opinion Expert opinion
Figure adapted from The Journal of Bone & Joint Surgery: 85-A(1), 2003

Fig. 2.3  This table outlines the clinical levels of evidence and the study designs that underlie them in the context of
therapeutic, prognostic, and diagnostic studies [48]

60 LoE
Proportion of Articles (%)

0
200

200

201
1
50 2
III

III
IV
IV

III
IV
I
II

I
II

II
I
AJSM Arth FAI LoE
3
40 4 100
1
2
Number of Articles

50 3
30 0 4
JHS JoT JPO
100
20
50
0
10 JSES Spine
I

I
III

III
I
III

II

II
II

IV

IV
IV

100
0

0
5

0
200

50
200

201

LoE 0
IV

IV

IV
II

II

II
III

III

III
I

LoE
I

I
III

III
I
III
II

II
II
IV

IV
IV

Year 00 05 10 Year
0

20 20 20
5

0
200

200

201

Fig. 2.4  Despite an increase in the volume of orthopedic literature published on an annual basis over the past 20 years,
a high proportion of the literature is still comprised of Level III and Level IV studies [9]

largest increase in volume has been seen in diag- poor job in answering the specific question
nostic studies, compared to more modest that the examiner sought to address in the first
increases in prognostic and therapeutic ones [18]. place.
• While RCTs may represent the gold stan-
dard of clinical evidence, they may not
Take-Home Message always be the ideal choice for a particular
• Despite this hierarchical construct that has question.
been developed to organize evidence based on • While we encourage publishing the highest
strength, the most important factor to consider quality of evidence feasible, we stress the
when designing a study is “what is the ques- importance of ensuring that every scientific
tion that I want to answer?” study begins with a well-crafted clinical ques-
• A seemingly “stronger” study may add no tion and only then should a study design be
additional benefit to one’s objective if it does a implemented.
2  What Is the Hierarchy of Clinical Evidence? 21

2.7 Additional Resources 11. Desai N, Bjornsson H, Musahl V, Bhandari M, Petzold


M, Fu FH, Samuelsson K. Anatomic single- versus dou-
and Websites ble-bundle ACL reconstruction: a meta-­analysis. Knee
Surg Sports Traumatol Arthrosc. 2014;22(5):1009–23.
• PRISMA Flow Diagram: http://prismastate- https://doi.org/10.1007/s00167-013-2811-6.
ment.org/documents/PRISMA%202009%20 12. Dowrick AS, Bhandari M. Ethical issues in the design
of randomized trials: to sham or not to sham. J Bone
flow%20diagram.pdf. Joint Surg Am. 2012;94(Suppl 1):7–10. https://doi.
• University of Central Florida College of org/10.2106/jbjs.l.00298.
Medicine Harriet F. Ginsburg Health Sciences 13. MARS Group. Effect of graft choice on the outcome
Library Evidence-Based Medicine Guide: of revision anterior cruciate ligament reconstruction
in the Multicenter ACL Revision Study (MARS)
http://guides.med.ucf.edu/c.php?g=17436 Cohort. Am J Sports Med. 2014;42(10):2301–10.
3&p=1149202. https://doi.org/10.1177/0363546514549005.
14. Elamin MB, Montori VM. The hierarchy of evidence:
from unsystematic clinical observations to systematic
reviews. In: Burneo JG, Demaerschalk BM, Jenkins
References ME, editors. Neurology: an evidence-based approach.
New York, NY: Springer; 2012. p. 11–24.
1. Audige L, Ayeni OR, Bhandari M, Boyle BW, Briggs 15. Ellenberg JH. Intent-to-treat analysis versus as-treated
KK, Chan K, Chaney-Barclay K.  A practical guide analysis. Drug Inf J. 1996;30(2):535–44. https://doi.
to research: design, execution, and publication. org/10.1177/009286159603000229.
Arthroscopy. 2011;27(4 Suppl):S1–112. https://doi. 16.
Farber J, Harris JD, Kolstad K, McCulloch
org/10.1016/j.arthro.2011.02.001. PC.  Treatment of anterior cruciate ligament injuries
2. Audige L, Bhandari M, Griffin D, Middleton P, by major league soccer team physicians. Orthop J
Reeves BC.  Systematic reviews of nonrandomized Sports Med. 2014;2(11):2325967114559892. https://
clinical studies in the orthopaedic literature. Clin doi.org/10.1177/2325967114559892.
Orthop Relat Res. 2004;427:249–57. 17.
Gopikrishna V.  A report on case reports. J
3. Bhandari M, Morrow F, Kulkarni AV, Tornetta P 3rd. Conserv Dent. 2010;13(4):265–71. https://doi.
Meta-analyses in orthopaedic surgery. A systematic org/10.4103/0972-0707.73375.
review of their methodologies. J Bone Joint Surg Am. 18.
Grant HM, Tjoumakaris FP, Maltenfort MG,
2001;83-a(1):15–24. Freedman KB.  Levels of evidence in the clinical
4. Bryant DM, Willits K, Hanson BP.  Principles of sports medicine literature: are we getting better over
designing a cohort study in orthopaedics. J Bone time? Am J Sports Med. 2014;42(7):1738–42. https://
Joint Surg Am. 2009;91(Suppl 3):10–4. https://doi. doi.org/10.1177/0363546514530863.
org/10.2106/jbjs.h.01597. 19. Greenhalgh T.  How to read a paper. Getting your
5. Chaudhry H, Mundi R, Singh I, Einhorn TA, bearings (deciding what the paper is about). BMJ.
Bhandari M.  How good is the orthopaedic litera- 1997;315(7102):243–6.
ture? Indian J Orthop. 2008;42(2):144–9. https://doi. 20. Gupta SK.  Intention-to-treat concept: a review.

org/10.4103/0019-5413.40250. Perspect Clin Res. 2011;2(3):109–12. https://doi.
6. Concato J, Shah N, Horwitz RI.  Randomized, org/10.4103/2229-3485.83221.
controlled trials, observational studies, and the 21. Guyatt G, Rennie D, Meade M, Cook D. Users’ guides
hierarchy of research designs. N Engl J Med. to the medical literature: a manual for evidence-based
2000;342(25):1887–92. clinical practice. Chicago, IL: AMA Press; 2002.
7. Cook C, Sheets C.  Clinical equipoise and personal 22. Guyatt GH, Haynes RB, Jaeschke RZ, Cook DJ,

equipoise: two necessary ingredients for reducing Green L, Naylor CD, Wilson MC.  Users’ Guides to
bias in manual therapy trials. J Man Manip Ther. the Medical Literature: XXV. Evidence-based medi-
2011;19(1):55–7. https://doi.org/10.1179/1066981 cine: principles for applying the Users’ Guides to
11x12899036752014. patient care. Evidence-Based Medicine Working
8. Court-Brown CM, McQueen MM.  How useful are Group. JAMA. 2000;284(10):1290–6.
meta-analyses in orthopedic trauma? J Trauma. 23. Guyatt GH, Sackett DL, Sinclair JC, Hayward R,
2011;71(5):1395–9. https://doi.org/10.1097/ Cook DJ, Cook RJ.  Users’ guides to the medical
TA.0b013e318208f983. literature. IX. A method for grading health care rec-
9. Cunningham BP, Harmsen S, Kweon C, Patterson ommendations. Evidence-Based Medicine Working
J, Waldrop R, McLaren A, McLemore R.  Have lev- Group. JAMA. 1995;274(22):1800–4.
els of evidence improved the quality of orthopaedic 24. Heng CH, Wang Bde H, Chang PC.  Distal femo-
research? Clin Orthop Relat Res. 2013;471(11):3679– ral fracture after double-bundle anterior cru-
86. https://doi.org/10.1007/s11999-013-3159-4. ciate ligament reconstruction surgery. Am J
10. DerSimonian R, Laird N. Meta-analysis in clinical tri- Sports Med. 2015;43(4):953–6. https://doi.
als. Control Clin Trials. 1986;7(3):177–88. org/10.1177/0363546514563908.
22 V. S. Desai et al.

25. Hollis S, Campbell F.  What is meant by inten-


38. Nissen T, Wynn R. The clinical case report: a review of
tion to treat analysis? Survey of published ran- its merits and limitations. BMC Res Notes. 2014;7:264.
domised controlled trials. BMJ (Clin Res ed). https://doi.org/10.1186/1756-0500-7-264.
1999;319(7211):670–4. 39. Parkinson B, Robb C, Thomas M, Thompson P,

26. Huang X, Lin J, Demner-Fushman D. Evaluation of Spalding T.  Factors that predict failure in anatomic
PICO as a knowledge representation for clinical ques- single-bundle anterior cruciate ligament reconstruc-
tions. AMIA Ann Symp Proc. 2006;2006:359–63. tion. Am J Sports Med. 2017;45(7):1529–36. https://
27. Huth EJ.  Writing and publishing in medicine.
doi.org/10.1177/0363546517691961.
Baltimore: Lippincott Williams and Wilkins; 1999. 40. The periodic health examination. Canadian Task

p. 103–10. Force on the Periodic Health Examination. Can Med
28. Kim J, Shin W.  How to do random allocation (ran- Assoc J. 1979;121(9):1193–254.
domization). Clin Orthop Surg. 2014;6(1):103–9. 41. Röhrig B, du Prel JB, Blettner M. Study design in medical
https://doi.org/10.4055/cios.2014.6.1.103. research: part 2 of a series on the evaluation of scientific
29. Kim SH, Jung YB, Song MK, Lee SH, Jung HJ, Lee HJ, publications. Dtsch Arztebl Int. 2009;106(11):184–9.
Jung HS. Comparison of double-bundle anterior cruci- https://doi.org/10.3238/arztebl.2009.0184.
ate ligament (ACL) reconstruction and single-bundle 42. Ryan J, Magnussen RA, Cox CL, Hurbanek JG,

reconstruction with remnant pull-­ out suture. Knee Flanigan DC, Kaeding CC.  ACL reconstruction:
Surg Sports Traumatol Arthrosc. 2014;22(9):2085–93. do outcomes differ by sex? A systematic review. J
https://doi.org/10.1007/s00167-013-2619-4. Bone Joint Surg Am. 2014;96(6):507–12. https://doi.
30. Kopec JA, Esdaile JM.  Bias in case-control stud-
org/10.2106/jbjs.m.00299.
ies. A review. J Epidemiol Community Health. 43. Schardt C, Adams MB, Owens T, Keitz S, Fontelo
1990;44(3):179–86. P.  Utilization of the PICO framework to improve
31. Mantel N, Brown C, Byar DP.  Tests for homogene- searching PubMed for clinical questions. BMC
ity of effect in an epidemiologic investigation. Am J Med Inform Decis Mak. 2007;7:16. https://doi.
Epidemiol. 1977;106(2):125–9. org/10.1186/1472-6947-7-16.
32. Mayr HO, Benecke P, Hoell A, Schmitt-Sody M,
44. Sheiner LB.  Is intent-to-treat analysis always (ever)
Bernstein A, Suedkamp NP, Stoehr A.  Single-­ enough? Br J Clin Pharmacol. 2002;54(2):203–11.
bundle versus double-bundle anterior cruciate 45. Shore BJ, Nasreddine AY, Kocher MS.  Overcoming
ligament reconstruction: a comparative 2-year fol- the funding challenge: the cost of randomized con-
low-­up. Arthroscopy. 2016;32(1):34–42. https://doi. trolled trials in the next decade. J Bone Joint Surg
org/10.1016/j.arthro.2015.06.029. Am. 2012;94(Suppl 1):101–6. https://doi.org/10.2106/
33. McCarthy CM.  Randomized controlled trials. Plast jbjs.l.00193.
Reconstr Surg. 2011;127(4):1707–12. https://doi. 46. Weinstein JN, Tosteson TD, Lurie JD, et al. Surgical
org/10.1097/PRS.0b013e31820da3eb. vs nonoperative treatment for lumbar disk hernia-
34. Mehta S, Myers TG, Lonner JH, Huffman GR, Sennett tion: the spine patient outcomes research trial (sport):
BJ. The ethics of sham surgery in clinical orthopaedic a randomized trial. JAMA. 2006;296(20):2441–50.
research. J Bone Joint Surg Am. 2007;89(7):1650–3. https://doi.org/10.1001/jama.296.20.2441.
https://doi.org/10.2106/jbjs.f.00563. 47. Wenger DR. Limitations of evidence-based medicine:
35. Moher D, Liberati A, Tetzlaff J, Altman DG. Preferred the role of experience and expert opinion. J Pediatr
reporting items for systematic reviews and meta-­ Orthop. 2012;32(Suppl 2):S187–92. https://doi.
analyses: the PRISMA statement. PLoS Med. org/10.1097/BPO.0b013e318259f2ed.
2009;6(7):e1000097. https://doi.org/10.1371/journal. 48.
Wright JG, Swiontkowski MF, Heckman
pmed.1000097. JD.  Introducing levels of evidence to the journal. J
36. Montori VM, Guyatt GH. Intention-to-treat principle. Bone Joint Surg Am. 2003;85-a(1):1–3.
CMAJ. 2001;165(10):1339–41.
49. Zantop T, Kubo S, Petersen W, Musahl V, Fu
37. Mundi R.  Design and execution of clinical trials in FH.  Current techniques in anatomic anterior cruciate
orthopaedic surgery. Bone Joint Res. 2014;3(5):161– ligament reconstruction. Arthroscopy. 2007;23(9):938–
8. https://doi.org/10.1302/2046-3758.35.2000280. 47. https://doi.org/10.1016/j.arthro.2007.04.009.
Bias and Confounding
3
Naomi Roselaar, Magaly Iñiguez Cuadra,
and Stephen Lyman

3.1 Introduction: What Is Bias? 3.2  iases in the Study Design


B
Phase
Bias in clinical research refers to any study limi-
tation that precludes the unprejudiced consider-
ation of a question [20]. Bias can be considered Fact Box 3.1
the biggest enemy of research, as it can distort
Conceptual bias and confounding are two
findings, weaken true associations, or produce
of the most common types of bias that
spurious associations, destroying the validity of a
occur during the design phase of a clinical
study. All bias can be categorized as either ran-
research study.
dom or systematic [22]. Random bias occurs
when the lack of precision is distributed ran-
domly across the study. Systematic bias occurs
when the lack of precision is concentrated among Two of the most common types of bias that
specific subsets of the study population (Fig. 3.1). occur during the design phase of a study are con-
Bias can be introduced during many stages of ceptual bias and confounding.
research including design of the study, sample
selection, data collection, data analysis, and inter-
pretation of research results [4]. Bias can even 3.2.1 Conceptual Bias
occur during publication. That manuscripts
reporting statistically significant results are pub- In orthopedics, conceptual bias can occur during
lished more often than those that report negative the identification and selection of procedures.
results, results in systematic bias [27]. Each phase When selecting procedures for a study, it is
of research is susceptible to common types of bias important to consider evidence in the literature,
and awareness of these tendencies can help specifically support based on randomized con-
researchers avoid them. Below, bias as it may trolled trials. Bias can be introduced when proce-
arise during each phase of research is addressed. dures are selected on criteria based on personal
perception rather than support from the literature.
This is particularly frequent for orthopedic pro-
N. Roselaar · S. Lyman (*)
Hospital for Special Surgery, New York, NY, USA
cedures as discussed by Lim et al. [17]. Different
e-mail: LymanS@HSS.edu types of orthopedic procedures were investigated
M. I. Cuadra
with respect to their degree of RCT evidence and
Cirugía de Rodilla, Traumatología CLC, support. Of the total orthopedic procedures
Santiago, Chile assessed, only 37% were supported by at least

© ISAKOS 2019 23
V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_3
24 N. Roselaar et al.

Fig. 3.1  A visual


representation of how
bias can lead to random
or systematic error,
diverging from the truth

Random Error Systematic Error The Truth

one RCT according to criteria used by the authors ­analysis [10]. Both restriction and matching
[17]. Therefore, the selection of procedures when methods can be used to mitigate confounding
designing a study sometimes does not occur factors at the research design stage [4].
based on evidence from the literature.
3.2.2.2 Restriction
The principle of restriction involves limiting the
3.2.2 Confounding Bias ability of a variable to vary in the study popula-
tion [10]. As Gerhard et al. explain, “A variable
The basis for confounding is the presence of one has to be able to vary to be associated with an
or more risk factors independently related to both exposure.” In a study of only non-smokers, smok-
the exposure and outcome of a study [24]. Known ing cannot act as a confounding variable.
as confounding variables, confounding factors, or
confounders, these variables primarily affect non-
randomized observational studies. When unseen, Clinical Vignette
these risk factors can affect the association A basic orthopedic example of restriction
between the exposure and outcome in a study. used to negate a measurable confounding
Randomized controlled trials are affected by con- variable may be found in the study of ACL
founding factors to a lesser degree, because the injuries in young athletes. Sports may be a
confounding effects are greatly reduced or elimi- confounding factor when considering the
nated by the process of randomization. This is one association between sex and suffering from
of the most important reasons why randomized an ACL injury in young athletes. Types of
controlled trials are considered the gold standard sports are associated with both the exposure
in clinical research design [30]. (sex) and with the outcome (ACL injury fre-
quency). Some sports such as American
3.2.2.1 Measuring and Controlling football or rhythmic gymnastics are either
Confounding Factors predominantly male or female, while cutting
It is impossible to completely eliminate the and pivoting sports like soccer and lacrosse
effects of all confounding factors. However, steps have the highest rates of ACL injury. Because
can be taken to reduce the confounding effects of the type of sport that an athlete plays can be
certain variables. If the confounding factor in precisely determined (measured), the study
question is measurable, the effects can be design can be modified to eliminate this con-
decreased or eliminated [23]. Mitigating the founding effect. Limiting the study to include
effects of confounding factors is possible only to only participants who play a specific sport
the degree of precision that the factor is measur- (or sports) through the use of specific inclu-
able [23]. Confounding bias can be measured sion and exclusion criteria is one way to con-
using methods such as restriction and matching trol this confounding variable.
and can be controlled for using multivariate
3  Bias and Confounding 25

3.2.2.3 Matching include randomization, restriction, or matching.


Matching is another way to prevent confounding Randomized trials are considered to have the
variables from affecting a study. Like restriction, highest level of evidence because randomization
matching is implemented during the design phase increases the likelihood that unknown confound-
of the study. However, instead of restricting the ers—i.e., ones that cannot be anticipated—are
study to certain groups, matching involves avoided. When randomization is not possible,
­creating comparison groups with equal distribu- restriction and matching (discussed above) are
tions of the specified confounders [10]. The goal useful. By making sure that the study design
of matching is to create two groups that are simi- includes ways to collect data on potential con-
lar in every way except for the exposure of inter- founders will also help mitigate their effect dur-
est [6]. Note that matching becomes more difficult ing the analysis phase.
the more variables you attempt to match on, so
it’s easier to create a matched comparison group
with a smaller number of confounding variables 3.3 Biases in the Patient
[10]. The trade off, of course, is that your study Recruitment Phase
will be more prone to residual confounding.
Selection and/or retention bias can occur during
the recruitment phase of a study. Selection bias
Clinical Vignette occurs when recruitment methods introduce dis-
There are many cases in the orthopedic lit- proportionate differences between treatment and
erature of matched-controlled studies. One control (or comparison) groups. Biased selection
such example is the article, “Total Knee compromises the internal validity of a study
Replacement in Morbidly Obese Patients” because the actual sample population will be sig-
from The Bone & Joint Journal [1]. In the nificantly different from the intended population
4-year assessment of outcomes following (Fig. 3.2a).
total knee replacements by Amin et  al., Retention bias—also known as transfer bias—
body mass index (BMI) was the exposure occurs when patients from either the treatment or
of interest. The morbidly obese cohort control (or comparison) group become more
underwent 41 primary total knee replace- likely to leave or be lost from the study, based on
ments (TKRs). To create a comparison some aspect of the study. This type of bias can
group, the authors selected a group of 41 compromise generalizability, as the study will
TKRs in nonobese patients. In order to only be representative of the patients who stayed
limit confounding effects of age, sex, diag- in the study (Fig. 3.2b).
nosis, type of prosthesis, laterality, and pre-
operative functional scores, the control
group was matched to the morbidly obese 3.3.1 Selection Bias
group for all of these variables in order to
isolate the effect of morbid obesity on TKR Selection bias, which can be random or system-
outcomes. atic, occurs on the basis of how subjects are
recruited into a study. It is obviously a risk in ret-
rospective studies, where the interest in a known
outcome drives the study question. However,
3.2.2.4 Avoiding Bias selection bias can also occur in prospective stud-
Due to Confounding Factors ies. In a prospective study, the individuals in
When designing a study, bias due to confounding comparison groups may differ systematically
factors can be mitigated by controlling how [16]. For example, a study comparing outcomes
patients will be selected and by collecting the of surgical or nonsurgical treatment of anterior
appropriate data. For patient selection, strategies cruciate ligament (ACL) injury may reflect dif-
26 N. Roselaar et al.

Fig. 3.2 (a) Bias during a Target Population


selection may result in a
Intended Sample Actual Sample
sample population that
is different from the AII TKA Patients Female
intended study TKA Patients
population. (b) over age 65
Retention bias may
result in a sample
population that is not
generalizable to the
intended target
population

b Target Population Intended Sample Actual Sample

Cohort Consented
AII TKA Patients Female
selection
TKA Patients
over age 65

Compliant
Completed surveys,
not lost of to follow-up

ferences in the patients who did or did not select make their outcomes systematically different
surgery—such as level of activity, severity of from patients who would seek care at a commu-
injury, or age—which affect their outcome as nity hospital.
much or more than the treatment itself. Some of Consenting to randomization can be a signifi-
these confounding factors could be controlled by cant hurdle to achieving unbiased selection in
collecting relevant data and accounting for them RCTs in orthopedics (and other surgical special-
in the analysis. However, other factors that may ties). Not surprisingly, few patients are willing to
be systematically associated with either the treat- consent to be randomized to receive either sur-
ment or control group, such as motivation to gery or no surgery or even a specific type of sur-
achieve pre-injury level of activity, may be diffi- gery. For example, in the Spine Patient Outcomes
cult or impossible to detect or measure. Still other Research Trial (SPORT) [28], a much lower pro-
factors that result in patients choosing surgical portion of eligible patients enrolled in the ran-
versus nonsurgical treatment, and that affect out- domized surgical/nonsurgical arm of the study
come, may be unknown. compared with the parallel observational cohort
Regardless of the study design, choosing the (Fig. 3.3). That the proportion of patients willing
correct control or comparison group is critical for to be randomized is small suggests that this group
mitigating selection bias. Whenever possible his- may represent individuals with a specific person-
torical controls should be avoided, as they always ality, disease, or other characteristics that may
introduce some bias due to trends over time. affect outcome in a way that is different from
While more common in observational studies, other patients, so that their findings are not gener-
selection bias can also occur in randomized trials. alizable to the general population of patients with
For example, if recruitment occurs through a ter- back pain. Furthermore, the low number of
tiary care center—as is often the case for RCTs— patients in the trial diminishes the likelihood that
patients who are recruited may have different randomization can be effective in mitigating the
disease or socioeconomic characteristics, which effect of unknown confounders.
3  Bias and Confounding 27

100
Percent of Patients Consented in SPORT Even loss of patients in both groups relatively
90 RCTs vs. Parallel Observational Cohort equally will compromise the generalizability of a
80 study, as the loss may cause the study population
Percent of Patients

70 63% to become significantly different from the target


60 population. For example, the specific loss of
50 patients of a particular race or socioeconomic
40 status from a trial due to the distance of a study
30 28%
site from their home would skew the results even
20 if control and treatment groups were affected.
10
Guidelines for the follow-up rates required for
0
Surgical/Non-surgical Parallel Observational
publication of randomized controlled trials differ
(SPORT) Cohort (SPORT) by publication. The Journal of Bone and Joint
Surgery requires a level 1 randomized controlled
Fig. 3.3  The proportion of eligible patients in the Spine
Patient Outcomes Research Trial (SPORT) study consent-
trial to have a follow-up rate of at least 80% [14].
ing to be randomized to surgical versus nonsurgical treat- In the SPORT study, non-compliance and
ment was much lower than those consenting to enter a crossover resulted in a high proportion of reten-
parallel observational arm of the study, where patients tion bias, which undermined the expectations of
could choose whether or not to undergo surgery
the study (Fig. 3.4).
Mitigating retention bias may require imple-
Narrowing eligibility criteria to limit hetero- mentation of measures to reduce loss, such as
geneity and reduce confounding and increasing making follow-up easier by conducting studies at
the recruitment period and number of study sites sites accessible to all patients or using outcomes
are ways to decrease selection bias in RCTs. measures that are convenient—such as short form
Contrary to dogma, in orthopedics and other sur- surveys and surveys translated into the patient’s
gical specialties, where low patient enrollment primary language. Limiting crossover may be
obviates the benefits of randomization, appropri- required. Finally, documenting the reasons for loss
ately controlled, prospective observational stud- of patients from the study may be helpful in
ies may ultimately represent a higher level of accounting for drop out when interpreting the data.
evidence than RCTs.

3.4 Informational Biases


3.3.2 Retention Bias in the Data Collection Phase

Retention bias occurs when patients who drop Informational biases can stem from detection or
out of a study differ from those not lost to measurement bias by the observer or due to
­follow-­up. Examples include patients who die or response bias by the study subjects.
leave the study due to poor health or because they
are unsatisfied with the treatment. Patients may
be lost due to non-compliance. A disproportion- 3.4.1 D
 etection or Measurement
ate loss of patients in one group or another is par- Bias
ticularly problematic, as that may introduce error
in the analysis of the outcome. For example, if Detection bias can occur during the data collec-
patients leave a study investigating the outcome tion phase of a study when outcomes assessment
of a surgical procedure because they undergo tools are inappropriate or used inappropriately
revision surgery at an outside hospital, this would [27]. Detection bias can arise from poor calibra-
cause an underestimation of the rate of complica- tion of a measurement instrument, prejudice in
tions. Documenting the reasons that patients the interviewer or grader, as well as misclassifi-
leave a study can mitigate this type of bias. cation of exposure or outcome. In orthopedics
28 N. Roselaar et al.

100 Patients Undergoing Surgery in SPORT Trial


90

80

70 66%
Percent of Patients

60%
60

50 44%
40

30

20

10

0
Surgical Non-surgical Parallel
observational cohort

Fig. 3.4  The similarity in the rate of surgery in the surgical underwent surgery was unexpectedly low because of non-
and nonsurgical groups made differentiating the outcomes compliance. At the same time, the proportion of patients
in the two groups impossible in the SPORT study. The pro- who started out in the nonsurgical group, but who crossed
portion of patients in the surgical group who actually over and underwent surgery, was unexpectedly high

research, a common cause of detection bias is the Blinding can be used to prevent interviewer or
failure to use a standardized method to diagnose grader prejudice from introducing measurement
the condition being studied. This type of bias is bias. Using clear definitions, validating findings
more common in retrospective studies, where with multiple resources and standardizing mea-
researchers do not control which diagnostic tools surement protocols are other methods to avoid
are used for each patient in the study and may not biases that stem from observer errors.
have information on what data or process led to a
specific diagnosis [27]. In prospective studies,
detection bias can be mitigated by standardizing 3.4.2 Response Bias
data collection protocols, including, but not lim-
ited to, protocols for how diagnoses are made. Subjects in a study may inadvertently give biased
In a study on the association of patellofemoral responses. The Hawthorne effect refers to a phe-
pain (PFP) and patellofemoral cartilage composi- nomenon where people tend to show bigger
tion, researchers in the Netherlands compared improvements, simply on the basis of being
patients with PFP to healthy controls [26]. All observed [29]. Another example of response bias
subjects included in the study were diagnosed is the tendency for respondents to please the
with PFP based on the presence of at least three interviewer, especially if the interviewer is the
criteria from a standardized list. Cartilage com- doctor or someone associated with their doctor.
position was assessed in all patients by a single Studies asking for responses on sensitive topics
validated magnetic resonance imaging (MRI) can also result in response bias, if patients are too
method. In this case, researchers minimized the embarrassed to tell the truth.
risk of detection bias by using a prospective study During the data collection phase of retrospec-
design. This allowed them to specify the diagnos- tive observational study designs such as case
tic criteria for subject inclusion in the study and series, case-control studies, and retrospective
standardize the MRI protocol for assessment of cohort studies, recall bias can be particularly
cartilage composition in both patients with PFP problematic [30]. Recall bias occurs when sur-
and the healthy controls. veys or interviews require subjects to recall their
3  Bias and Confounding 29

own history, which they may incorrectly remem- used to reveal confounding. Multivariable model-
ber and report, inadvertently introducing error ing can be used to adjust for known potential
into the study [25]. Prospective cohort studies confounders. Time-to-event analyses can help
­
avoid recall bias because subject-obtained data is mitigate bias due to loss to follow-up.
collected in real time [12]. Propensity score matching can be effective in
Gabbe et al. [9] compared retrospective, self-­ mitigating selection bias.
reported injury histories of 70 community-level
Australian football players with prospective
injury surveillance records for the same 12-month 3.5.1 Propensity Scores
period and found a significant difference, to
investigate the magnitude of recall bias in a typi- Propensity scores are generated from logistic
cal type of question that might be included in an regression models including all potentially impor-
orthopedic study. They reported that recall bias tant information that could influence a clinician’s
increased with the level of detail that was treatment decision. Therefore, the model is esti-
requested [9]. mating the likelihood of a patient going down a
The proportion of community football players treatment pathway, rather than an outcome. The
who were accurately able to recall their injuries in coefficients are used to assign each patient a pro-
the past 12 months declined as the level of detail, pensity score ranging from 0 to 1 representing the
which was asked about the injury history, increased. patient’s likelihood of receiving a reference treat-
All respondents were accurately able to recall the ment (usually the standard of care). A score of 0
presence or absence of any injury, but fewer than indicates no chance of receiving treatment A,
80% were able to recall the number of injuries or while a score of 1 represents absolutely certainty
body region where the injury occurred correctly. of receiving treatment A.  A score of 0.5 would
Accuracy decreased further when respondents represent an unbiased random allocation.
were asked to recall number of injuries, where on Once propensity scores are assigned to each
the body the injury occurred and diagnosis. study subject, they can be used in a variety of
The study of Gabbe et al. [9] not only showed ways. For example, a propensity score “match”
how much recall bias could affect research find- could be performed, which simulates randomiza-
ings but suggested that asking only simple ques- tion. A patient with a propensity score of 0.576 for
tions could increase the accuracy of responses, a specific surgery would be matched to a patient
when researchers need to ask respondents for from the other treatment type with a propensity
their medical history. score nearest to 0.576. A number of methods exist
Other strategies for mitigating response bias to optimize this matching process, but when
include asking questions related to sensitive top- appropriately performed, relevant characteristics
ics later in a questionnaire or interview, using of the two groups will be balanced, minimizing
validated survey instruments, and seeking potential selection bias. This can be visualized by
responses as early as possible after the exposure plotting the standardized differences for each
or treatment being studied. variable before and after matching (Fig.  3.5). A
standardized difference is used to compare bal-
ance between treatment groups for each indepen-
3.5 Biases in the Analysis dent variable in the dataset. It was proposed by
and Data Interpretation Cohen [3] as Cohen’s effect size index for the
Phase comparison of two sample mean values [8, 18]. A
standardized difference of >0.1 is considered to
The data analysis and interpretation phase offer represent an imbalance in the variable. As can be
opportunities to avoid confounding and other seen from the example, 14 variables were unbal-
types of bias, which may inadvertently or anced before the match and all variables were bal-
unavoidably have been introduced in earlier anced after the match. In fact, for all but two of the
phases. Stratification/subgroup analysis can be variables, the balance improved.
30 N. Roselaar et al.

Race - Hispanics
Chronic pulmonary disease
Medicaid
Commercial
Charlson-Deyo comorbidity index - 3+
Lymphoma
Solid tumor w/out metastasis
Race - Black
Liver disease
Pulmonary circulation disease
Paralysis
Peripheral vascular disease
Hypertension
Charlson-Deyo comorbidity index - 1–2
Congestive heart failure
Charlson-Deyo comorbidity index - None
DMAG
Valvular disease
Coagulopathy
Fluid and electrolyte disorders
Race - Asian
Race - White Before Propensity Score Matching
Renal failure
Performing Surgeon’s last TKA volume
Hypothyroidism After Propensity Score Matching
Year of surgery - 2009
ASA - ‘1-2
ASA - ‘3-4
Other neurological disorders
Race - All others
OA diagnosis
Year of surgery - 2007
Systemic inflammatory disease diagnosis
Year of surgery - 2008
Gender
BMI
Year of surgery - 2011
Selfpay
Year of surgery - 2010
Medicare
Age
Year of surgery - 2012

0 0.1 0.2 0.3 0.4 0.5 0.6


Standardized difference

Fig. 3.5  Standardized differences were calculated for the denotes meaningful imbalance in the variable. In this exam-
42 variables (along x-axis) for which patients in a sample ple, there is imbalance (standardized difference > 0.1) in 15
cohort undergoing two different knee procedures were variables among patients in the two treatment groups before
matched. The standardized difference (also known as propensity score matching (dark blue diamonds). However,
Cohen’s effect size index, along the y-axis) compares two after propensity score matching (light blue circles), the like-
sample mean values in units of the pooled standard devia- lihood these variables affected each patient’s assignment to
tion so that a standardized difference of 0.1 or greater a particular treatment became more balanced

Whereas randomization eliminates bias, even determine a standardized list of variables that
from unknown confounders, in a strict sense, pro- best represents the most important information
pensity matching can only balance for known for creating a propensity score. Propensity score
factors. However, because the matching process matching has been used in orthopedics including
balances all measured variables, the effects of arthroplasty and trauma [5, 15].
unmeasured confounders are often also mini- A drawback of propensity score matching is
mized. Despite this, it is incumbent on orthope- that a very large number of patients may be
dic researchers to review the existing literature to needed, especially in the standard of care group
3  Bias and Confounding 31

[11]. Moreover, matching frequently omits a large 2. Cassel C, Sarndal C, Wretman J. Some uses of sta-
proportion of the available study population when tistical models in connection with the nonresponse
problem. In: Madow W, Olkin I, editors. Incomplete
comparison groups are being created. Therefore, data in sample surveys, Symposium on incomplete
inverse probability of treatment weighting (IPTW) data, proceedings, vol. 3. New York, NY: Academic;
is proposed as an alternative to matching to adjust 1983.
for confounders [2, 13, 21]. With IPTW, the pro- 3. Cohen J. Statistical power analysis for the behavioral
sciences. 2nd ed. New York: Routledge.
pensity score is used as weights in a weighted 4. Delgado-Rodríguez M, Llorca J.  Bias. J Epidemiol
regression. Weights restore balance in the clinical Community Health. 2004;58:635–41.
characteristics of the two treatment groups and 5. Duchman K, Gao Y, Pugely A, Martin C, Callaghan
allow for the use of the entire sample, rather than J.  Differences in short-term complications between
unicompartmental and total knee arthroplasty: a
the subset of matched patients. ­propensity score matched analysis. J Bone Joint Surg
Am. 2014;96(16):1387–94.
6. Dunn WR, Lyman S, Marx R.  ISAKOS scientific
3.6 Publication Bias committee report research methodology. Arthroscopy.
2003;19(8):870–3.
7. Emerson GB, Warme WJ, Wolf FM, Heckman
Despite diligent attention to study design and exe- JD, Brand RA, Leopold SS.  Testing for the pres-
cution, bias can be introduced at the stage of pub- ence of positive-outcome bias in peer review: a
lication. Reviewers are also human and studies randomized controlled trial. Arch Intern Med.
2010;170(21):1934–9.
have shown that prejudice can affect which studies 8. Flury B, Riedwyl H. Standard distance in univariate
are published. Emerson et al. (JAMA 2010) used and multivariate analysis. Am Stat. 1986;40:249–51.
fabricated manuscripts to show that those with 9. Gabbe BJ, Finch CF, Bennell KL, Wajswelner H. How
positive outcomes were more likely to be accepted valid is a self reported 12 month sports injury history?
Br J Sports Med. 2003;37(6):545–7.
for publication than those with negative ones. This 10. Gerhard T. Bias: considerations for research practice.
article also reported the presence of positive out- Am J Health Pharm. 2008;65:2159–68.
come bias in the Journal of Bone and Joint Surgery 11. Guo S, Fraser M. Propensity score analysis: statisti-
(JBJS) and Clinical Orthopaedics and Related cal methods and applications. 2nd ed. Thousand Oaks,
CA: Sage; 2015.
Research (CORR) [7]. Another type of bias was 12. Hennekens C, Buring J.  Epidemiology in medicine.
reported by Okike et al. (JBJS 2008), who found 1st ed. Boston: Little, Brown and Company; 1987.
that submissions to JBJS were more likely to be 13. Hirano K, Imbens G.  Estimation of causal effects
accepted if they were from the USA or Canada using propensity score weighting: an application
to data on right heart catheterization. Health Serv
[19]. These studies indicate that readers should be Outcomes Res Methodol. 2001;2:259–78.
aware of bias, even when reading peer-reviewed 14. JBJS. JBJS Inc. Journals Level of Evidence [Internet].
papers. J Bone Joint Surg. 2015. https://journals.lww.com/
jbjsjournal/Pages/Journals-Level-of-Evidence.aspx.
15. Jenkinson R, Kiss A, Johnson S, Stephen D, Kreder
Take-Home Message H.  Delayed wound closure increases deep-infection
• When performing orthopedic research, the rate associated with lower-grade open fractures: a
effects of confounding factors and bias must propensity-matched cohort study. J Bone Joint Surg
Am. 2014;96(5):380–6.
be acknowledged. 16. Jennings JM, Sibinga E.  Understanding and iden-

• Techniques such as randomization, matching, tifying bias in research studies. Pediatr Rev.
and choice of study design may help mitigate 2010;31(4):161–2.
the effects of confounding and bias. 17. Lim HC, Adie S, Naylor JM, Harris IA. Randomised
trial support for orthopaedic surgical procedures.
PLoS One. 2014;9(6):e96745.
18. Normand S, Landrum M, Guadagnoli E, Ayanian J,
References Ryan T, Cleary P, et al. Validating recommendations
for coronary angiography following an acute myocar-
1. Amin AK, Clayton RAE, Patton JT, Gaston M, Cook dial infarction in the elderly: a matched analysis using
RE, Brenkel IJ.  Total knee replacement in morbidly propensity scores. J Clin Epidemiol. 2001;54:387–98.
obese patients: results of a prospective, matched 19. Okike K, Kocher MS, Mehlman CT, Heckman

study. J Bone Joint Surg [Br]. 2006;88-B(10):1321–6. JD, Bhandari M.  Publication bias in orthopaedic
32 N. Roselaar et al.

research: an analysis of scientific factors associated biological plausibility. J Fam Plann Reprod Health
with publication in The Journal of Bone and Joint Care. 2008c;34(4):261–4.
Surgery (American Volume). J Bone Joint Surg Am. 25. Smith J, Noble H. Bias in research. Evid Based Nurs.
2008;90(3):595–601. 2014;17(4):100–1.
20.
Pannucci CJ, Wilkins EG.  Identifying and 26. van der Heijden RA, Oei EHG, Bron EE, van Tiel J,
avoiding bias in research. Plast Reconstr Surg. van Veldhoven PLJ, Klein S, et al. No difference on
2010;126(2):619–25. quantitative magnetic resonance imaging in patello-
21. Rosenbaum P. Model-based direct adjustment. J Am femoral cartilage composition between patients with
Stat Assoc. 1987;82:387–94. patellofemoral pain and healthy controls. Am J Sports
22. Shapiro S. Causation, bias and confounding: a
Med. 2016;44(5):1172–8.
hitchhiker’s guide to the epidemiological galaxy 27. Viswanathan M, Berkman ND, Dryden DM, Hartling
Part 1. Principles of causality in epidemiological L.  Assessing risk of bias and confounding in obser-
research: time order, specification of the study base vational studies of interventions or exposures: further
and specificity. J Fam Plann Reprod Health Care. development of the RTI item bank. Agency Healthc
2008a;34(2):83–7. Res Qual. 2013:1–22.
23. Shapiro S. Causation, bias and confounding: a hitch- 28. Weinstein JN, Tosteson TD, Lurie JD, Tosteson ANA,
hiker’s guide to the epidemiological galaxy: Part 2. Hanscom B, Skinner JS, et al. Surgical vs nonopera-
Principles of causality in epidemiological research: tive treatment for lumbar disk herniation the Spine
confounding, effect modification and strength of Patient Outcomes Research Trial (SPORT): a random-
association. J Fam Plann Reprod Health Care. ized trial. JAMA. 2006;296(20):2441–50.
2008b;34(3):185–90. 29. Wickstrom G, Bendix T.  The “Hawthorne effect”—
24. Shapiro S. Causation, bias and confounding: a hitch- what did the original Hawthorne studies actually show?
hiker’s guide to the epidemiological galaxy Part 3: Scand J Work Environ Health. 2000;26(4):363–7.
Principles of causality in epidemiological research: 30. Zlowodzki M, Jönsson A, Bhandari M. Common pit-
statistical stability, dose- and duration-response falls in the conduct of clinical research. Med Princ
effects, internal and external consistency, analogy and Pract. 2006;15:1–8.
Ethical Consideration
in Orthopedic Research
4
Jason L. Koh and Diego Villacis

4.1 Introduction health, prevents disease and injuries, and


improves the diagnosis and treatment of disease
For thousands of years, science has sought and injuries [1]. In order to justify the use of
answers to questions regarding health and the human subjects, the potential risks to the research
human body. From the start, there has been a participant must be reasonable in relation to the
dilemma with defining the boundary of ethical potential benefits to the participant or future
research while still pushing to advance scientific patients and be sensible because of the impor-
knowledge for the benefit of humanity. There tance of the knowledge which might be gained
came to be a universal understanding that protec- [1]. For any research study, the full benefits and
tion of the human participants must take priority. risk can never truly be known ahead of time, nor
Unfortunately, our society has paid heavy costs can the effects of the study be determined until a
in the past when failing to ensure ethical research. study is completed. This quandary is a reality for
From these early beginnings, principles arose to researchers that highlights the importance of
guide the conduct and review of human subject oversight and regulatory bodies to ensure that
research. These principles were formalized by human research studies stay true to guiding ethi-
the development of regulations and governing cal principles.
bodies to review the creation of research
protocols.
To begin one must understand the purpose of 4.2 History
health research. A study should be designed to
develop new or confirm knowledge that promotes The contemporary viewpoint with emphasis on
the individual and the importance to protect the
research subject was initiated long before the
development of modern research ethics.
Hippocrates (460 BC) is regarded by many as the
J. L. Koh ∙ D. Villacis (*)
Department of Orthopaedic Surgery, NorthShore
best known of early scientists to promote physi-
University Health System, Evanston, IL, USA cian constraint. His beliefs are well quoted in the
Northshore Orthopedic Institute, Evanston, IL, USA
Hippocratic Oath, which requires physicians to
practice medicine in such a way that ultimately
Pritzker School of Medicine, University of Chicago,
Chicago, IL, USA
benefits the patient while avoiding mischievous
e-mail: Jkoh@northshore.org; behavior or behavior that is not in the best ­interest
dvillacis@northshore.org of the patient, “Do no harm” [14, 25]. However,

© ISAKOS 2019 33
V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_4
34 J. L. Koh and D. Villacis

for the purpose of this section, we will focus on Multiple revisions have been made over the
developments from the modern era, starting with years; however, the principles set forth by the
the aftermath of the atrocities suffered during “Nuremberg Code” remain the foundation for
World War II. the conduct of ethical research all over the
The Nuremberg Military Tribunal after world.
World War II shed light on the sadistic and hor- During early times of modern human research
rifying “research” conducted by Nazi German ethics, audible outcry was forming in the United
scientists. Nazi research included forced human States. Ethical concern was mounting regarding
exposure to the effects of freezing, incendiary the 1936 publication of a paper funded by the
devices, mustard gas, and other weaponized federal government entitled “The Tuskegee Study
agents. The rulings handed down from the of Untreated Syphilis in the Negro Male.” The
Nuremberg trial served as an outline for the purpose of the study was to investigate the effects
required elements of conducting ethical human of untreated syphilis starting in 1932, essentially
research. tracking the natural history. The study would
remain in progress until 1972. What would make
the study infamous was the continued withhold-
Fact Box 4.1 ing of treatment even after the acceptance of pen-
Initially known as rules for “Permissible icillin as the standard of care in 1945. Subjects
Medical Experiments,” this became known were led to believe that they were being treated
as the “Nuremberg Code.” The leading eth- when in fact nothing was being done. It took
ical principles that emerged were a require- nearly 40 years from the initial publication of the
ment for voluntary consents, evidence of study until stories in the press created public out-
scientific merit, benefits outweighing the cry on a national level, and the study was
risks, and the ability of research subjects to ­concluded [13].
terminate participation in the study at any- The subsequent attention gained by the shock-
time [23, 27]. ing truths of the Tuskegee study leads to congres-
sional hearings to address the matter of ethical
conduct in human subject research. As a direct
response to these atrocities and those of other
Although not formally adopted into law or as similar situations of ethical abuse, the National
any part of a professional ethical code, it served a Research Act of 1974 was passed to address ethi-
significant influence for the development of con- cal concerns [7]. The congress called for the
temporary ethical guidelines. establishment of “The National Commission for
the Protection of Human Subjects and Biomedical
and Behavioral Research” and the establishment
Fact Box 4.2 of institutional review boards (IRBs). The
Subsequently, the Declaration of Helsinki “National Commission” was tasked with identi-
in 1964 was developed by the World fying the basic ethical principles underlying
Medical Association (comprised of almost human subject research and develops guidelines
all national medical associations) to pro- for ensuring ethical principles are followed [3].
vide guiding principles regarding human The commission met between 1975 and 1978
subject research. Key among these princi- releasing a series of reports with their last report,
ples are a respect for the individual and a summary of their deliberations regarding ethi-
their right to make informed choices cal principles, released in 1979. This final report
regarding participation in research [9]. The was entitled “Ethical Principles and Guidelines
individual’s interests are placed above for the Protection of Human Subjects of
those of “science and society.” Research” and would come to be known as the
Belmont Report.
4  Ethical Consideration in Orthopedic Research 35

Table 4.1  Nuremberg Code 1947 [27]


Fact Box 4.3 1.  T he voluntary consent of the human subject is
The Belmont Report established boundar- absolutely essential
ies between clinical practice and unneces- 2.  The experiment should be such as to yield fruitful
results for the good of society, unprocurable by
sary research, basic ethical principles, and
other methods or means of study, and not random
fundamental applications of ethical and unnecessary in nature
research [11]. 3.  The experiments should be so designed and based on
the results of animal experimentation and knowledge
of the natural history of the disease or other problem
under study that the anticipated results will justify
The three basic ethical principles identified the performance of the experiment
by the Belmont Report include “respect for 4.  The experiment should be so conducted as to avoid
persons” which asserts the importance of all unnecessary physical and mental suffering and
respecting autonomy, “beneficence” which injury
requires actions that do not cause harm and 5.  No experiment should be conducted where there is
a prior reason to believe that death or disabling
aim to maximize benefit while minimizing injury will occur, except perhaps, in those
harm, and “justice” which requires a shared experiments where the experimental physicians also
burden of benefits and risk when choosing serve as subject
research subjects so as not to exploit a popula- 6.  The degree of risk to be taken should never
exceed that determined by the humanitarian
tion. These principles set forth by the Belmont importance of the problem to be solved by the
Report share importance and are to be utilized experiment
as a “framework” when designing human sub- 7.  Proper preparations should be made and adequate
ject research protocols [8]. To ensure that facilities provided to protect the experimental
subject against even remote possibilities of injury,
research does meet the abovementioned crite-
disability, or death
ria, IRBs were created (Table 4.1). 8.  The experiment should be conducted only by
scientifically qualified persons. The highest degree
of skill and care should be required through all
stages of the experiment of those who conduct or
4.3 Institutional Review engage in the experiment
Board (IRB) 9.  During the course of the experiment, the human
subject should be at liberty to bring the experiment
The US Department of Health and Human to an end if he has reached the physical or mental
state where continuation of the experiment seems to
Services (HHS) under the guidance of the him to be impossible
Belmont Report findings set up requirements that 10. “During the course of the experiment, the scientist
form the basis for institutional review boards. in charge must be prepared to terminate the
The goal of an IRB is to regulate human subject experiment at any stage, if he has probable cause to
believe, in the exercise of the good faith, superior
research by advocating, upholding, and maintain-
skill, and careful judgment required of him, that a
ing the rights of research participants [11]. IRBs continuation of the experiment is likely to result in
have become widespread and are engaged in all injury, disability, or death to the experimental
HHS- and National Institutes of Health (NIH)- subject”
funded studies. The initial requirements proposed
by HHS were formalized in 1991 by the US
Federal Policy for the Protection of Human they are compliant with federal regulations [26].
Subjects. Protection of Human Subjects became Medical and academic institutions had already in
better known as “Common Rule,” which pro- large part created their own IRBs to review clini-
vided specific direction for the structure and cal research projects. This was formalized with
organization of IRBs, the requirements for “Common Rule” so as to further regulate that
obtaining informed consent, and the requirement IRBs review all research protocols for compli-
that institutions provide written assurance that ance with ethical guidelines [4].
36 J. L. Koh and D. Villacis

An important concept for IRB review is


Human Subjects Research?
understanding when review is necessary. The
necessity for review relates directly to deter-
mining the risk of a proposed study. Per the No Yes
IRB risks are split into three broad categories:
less than minimal risk, minimal risk, and Exempt from IRB
greater than minimal risk. A study with less review?
than minimal risk means there is no known
physical, psychological, or economic risk to Yes No
the subject [2]. Such a situation would deem
the study exempt from IRB review. However,
exempt status must be determined in a formal More than
Minimal Risk Minimal Risk
process with adequate documentation and not
simply be determined by the researcher.
Can review be
Relevant to orthopedic research, a study may expedited?
also be deemed exempt if it limits human sub-
jects to one or more of the following catego-
ries: educational practices and assessments, Yes No
interviews or observations of public behavior,
and studies of public data or specimens with- Full board review
out accompanying information that might per- No IRB review required
mit subject identification [26]. When a study
IRB review can be expedited
poses minimal risk, it may qualify for expe-
dited review by select m ­ embers of the IRB
Fig. 4.1  Regulatory questions to ask prior to IRB [11]
committee. Common examples of such stud-
ies include observational studies reviewing
medical data that is collected as part of rou- 4.4 Privacy Regulations:
tine medical care, medical chart review with The United States
no unique identifiers included in the record, and Europe
and questionnaire or survey studies where it is
unlikely the questions would cause emotional Additional regulatory guidelines for human sub-
distress in the patient [2]. Studies that are ject research were enacted in the United States
determined to pose “greater than minimal with the passing of the Health Insurance Portability
risk” as defined by risk beyond those typi- and Accountability Act (HIPAA) in 1996.
cally encountered by the patient require a
full-committee review by the IRB [2]. Once
under review the committee will assess a Fact Box 4.4
broad list of factors prior to acceptance The broader goal of the HIPPA was
including study design, deception or with- “improve portability and continuity of
holding of information, risks and benefits, health insurance coverage in the group of
selection of subjects, identification of research individual markets, to combat waste, fraud,
participants, protection of patient privacy, and abuse in health insurance and health
process of obtaining informed consent, care delivery, to promote the use of medical
informed consent forms, qualifications of savings accounts, to improve access to
investigators, and potential conflict of inter- long-term care services and coverage, to
ests [19]. After acceptance, for studies lasting simplify the administration of health insur-
longer than 1 year, it is required that the study ance, and for other purposes” [12].
be reviewed annually [19] (Fig. 4.1).
4  Ethical Consideration in Orthopedic Research 37

Although it was to have an initial impact on by public disclosure of the breach, limitations
health-care providers and insurance carriers, it on research, fines, and imprisonment (Fig. 4.2).
would also go on to affect human subject
research through its impact on protection of
patient privacy. This focus on protection of 4.5 Informed Consent
patient privacy was expanded with a provision
in 2003 requiring all “covered entities,” essen- The process of informed consent is a funda-
tially any personnel with access to patient mental pillar to ethical human subject research.
information, to be compliant with the HIPAA In essence the participant must be able to both
privacy rule. The goal of the HIPAA privacy understand and clearly verbalize a study’s pur-
rule is to regulate the use and disclosure of spe- pose, methods, risk, benefits, and alternative
cific “individually identifiable health informa- to participation in order to be consider
tion,” termed protected health information “informed.” The participant must be allowed
(PHI) [6]. A patient’s PHI is considered to be to freely decide whether to participate in the
information pertaining to their past, present, or study both at the beginning and throughout the
future physical or mental health, the provision study with the option to discontinue at any
of health care to an individual, and the past, time. For situations in which the potential sub-
present, or future payment for the provision of ject is a minor or adult of otherwise limited
health care to the individual [17]. Similar legis- mental capacity, a proxy can be used with the
lation to HIPAA has been recently enacted in same requirements of informed consent [22].
the European Union in 2016 [24]. In order to Informed consent is designed so as to demon-
help ensure the privacy of PHI during human strate respect for the autonomy of potential
subject research, the IRB is tasked with allow- and enrolled participants. Therefore, informed
ing the transmission of only the minimal consent must be obtained in a voluntary and
amount of information necessary. The IRB noncoercive manner. This goal of voluntary
must also make sure that this information is enrollment can be challenging when consider-
also adequately protected when transmitted or ing potential vulnerability to autonomy in sit-
stored electronically with tools such as encryp- uations where there is a large inequality in
tion or password protection. In the United authority between researcher and enrolled or
States, violations of this act can be punishable potential participant (Fig. 4.3).

Fig. 4.2  List of PHI [24] 1. Names


2. Contact information (e.g., phone or fax #, website or internet protocol [IP]
or electronic mail addresses, geographic address smaller than State, except
first three digits of zip code)
3. Identifying dates (more detailed than year, e.g., birth, death, admission, discharge)
4. Age over 89 years (unless listed as 90 or older)
5. Social security, medical record, insurance identification number
6. Vehicle identification numbers (e.g., serial numbers, license plate numbers)
7. Device identification or serial numbers
8. Certificate or license numbers
9. Biometric identity (e.g., voice or retinal print, fingerprint, full face image)
10. Any other unique account numbers or material
38 J. L. Koh and D. Villacis

Fig. 4.3 Paragraph 1. The orthopaedic surgeon must explain to the patient in terms the patient can
VIII.A. of the American understand the proposed treatment, its likely effect on the patient, and purpose of
of Orthopaedic Surgeons’ the research. Orthopaedic surgeons must provide at least the degree of information
Code of Medical Ethics that is required by applicable state and federal law, which will include at a minimum
and Professionalism for information on the purpose of the research, its potential side effects, alternatives and
Orthopaedic Surgeons [1] risks of the proposed treatment as well as the method, purpose, conditions of
participation and the opportunity to withdraw from the research protocol
without penalty.
2. The patient must understand for what they are providing consent. The orthopaedic
surgeon must ensure that the patient has understood the basic information and
has engaged in rational decision-making in deciding to participate in the research; and

3. The patient’s consent must be voluntary. Voluntary consent requires that the patient
agreeing to participate in the project has a full understanding of all alternative
treatments beyond the research protocol. The orthopaedic surgeon must believe
that the patient’s consent is free from undue or overbearing influences, e.g., fear
of the loss of care or medical benefits if the patient declines to participate.

scientifically valid in order to give justification


4.6 Evaluation of Human to the study’s conclusion. Otherwise the study is
Research Protocols a waste of resources and undue risk to the human
subjects. During study design the selection of
subjects should include concerns for the risk
Fact Box 4.5 burden by participants and potential for partici-
There are seven main principles that can be pants to also enjoy benefits from the research
utilized to guide protocol design for human study [22]. As discussed throughout this chap-
subject research: social and clinical value, ter, a heavily weighed factor in any study design
scientific validity, fair subject selection, is creating a favorable risk-benefit ratio. Since
favorable risk-benefit ratio, independent by definition a research study is investigating
review, informed consent, and respect for the unknown, it is impossible to truly know all
potential subjects [10]. the risk and benefits of an intervention. However,
by aiming to minimize risk and maximize ben-
efits, a study can attempt to avoid foreseeable
safety issues or potential harm to the partici-
The principles for evaluation of human
pants. Finally, study protocols must undergo an
research protocols are nicely summarized by the
independent review to ensure the abovemen-
NIH with their published recommendations for
tioned principles are upheld. This is the role of
ethical study design [22]. Informed consent and
an institution’s IRB (Fig. 4.4).
respect for potential subjects were discussed in
the previous section. Below is a brief descrip-
tion of the five remaining principles. Clinical 1. Social and clinical value
and social value refers to the overriding concern 2. Scientific validity
3. Fair subject selection
of whether a study will potentially provide 4. Favorable risk- benefit ratio
answers that lead to a new and significant dis- 5. Independent review
covery of information. Also, will this informa- 6. Informed consent
7. Respect for potential and enrolled subjects
tion lead to a tangible impact on patient care
both for the individual and society as a whole? Fig. 4.4  The seven ethical principles of human research
When looking for an answer, the study must be protocol design [10]
4  Ethical Consideration in Orthopedic Research 39

4.7 Responsibilities of Principal 4.8 Sham Surgery


Investigator
Utilization and discussion of sham surgery in
orthopedic research are limited [20, 30]. The
Fact Box 4.6 concern is obvious, and there is difficulty in
The principal investigator (PI) of a research designing a study that can safely minimize
study is “responsible for proposing, design- risks while presenting a potential benefit to
ing, and reporting the research” [1]. justify these risks. However, we know from
robust data that subjective outcomes can sig-
nificantly be altered by placebo effect [15,
16]. Therefore, a study design that can account
Based on if there is funding associated with
for placebo effect would be ideal for determin-
the project, the PI is also responsible for the dis-
ing the true effect on an intervention [16]. In
persal of funds. A key element brought forth by
addition, sham surgery allows for the blinding
the American Academy of Orthopaedic Surgeons
of the subject and the researcher collecting
(AAOS) is that although the PI may have an opin-
patient data. It is difficult to conceal the scar
ion on the clinical topic being investigated the
or incision from both the patient and individ-
study can only be justified if there is debate in the
ual collecting data. Also, there is evidence to
medical community regarding the clinical ques-
support the notion that patients have a bias
tion being pursued. Delegation of portions of the
toward favorable outcome with surgical inter-
study to others involved in the study is permitted,
vention. Patients want to believe they chose
but “delegation does not relieve the PI of respon-
the correct treatment option [28, 29]. This all
sibility for work conducted by other individuals”
leads to the conclusion that shame surgery can
[1]. A frequent issue in the field of research that
be acceptable but only if done for an appropri-
is infrequently discussed is proper credit and
ate procedure and the study is designed so as
authorship for work done. The PI is responsible
to minimize risk to the human subjects. A
for ensuring that any articles relevant to the
common application is when investigating
research study include “appropriate credit for
clinical intervention that is strongly suspected
individuals contributing importantly to the
of having little benefit to placebo effect. An
research” [1]. Each author of a publication should
example of such a trial was conducted by
be able to individually justify the content of the
Moseley et al., investigating arthroscopic knee
publication regardless of specific contribution
surgery vs. placebo surgery for knee osteoar-
(Fig. 4.5) [1].
thritis that had failed 6 months of conservative
treatment (Table 4.2) [21].
10. An orthopaedic surgeon shall warrant that he or
she has made significant contributions to the conception
and design or analysis and interpretation of the data,
drafting the manuscript or revising it critically for
important intellectual content, and approving the Table 4.2  Guidelines for sham surgery [20]
version of the manuscript to be published.”
1. There is skepticism regarding the therapeutic merits
12. An orthopaedic surgeon shall credit with authorship of a particular treatment
or acknowledge and not exclude those individuals who 2. There are disagreements about the perceived benefits
substantially contributed to the proposed research, the of a particular procedure compared with the placebo
analysis and interpretation of the data, and the drafting 3. Benefits might be due to the “experience of surgery”
and revising of the final article or report. and the postoperative care regimen
4. Risks are reduced as far as possible in the sham
Fig. 4.5  Mandatory Standards 10 and 12 from Standards surgery arm without compromising trial design
of Professionalism on Research and Academic 5. There is a lack of a superior therapy
Responsibilities
40 J. L. Koh and D. Villacis

4.9 Funding and Potential researcher, the research institution, and the cor-
Conflict of Interest (COI) poration funding the research [1]. These interests
can vary drastically and do not necessarily run in
Traditionally, there was minimal support from parallel. Thankfully, many institutions have cre-
the industry for orthopedic research. Research ated structures that assist in negotiating research
projects came from academic institutions as well funding so as to help eliminate or minimize con-
as research-active individuals or practices. flicts of interest. As may seem obvious, “ethical
However, in the past 3 decades, orthopedic sur- problems may arise when the researcher or the
gery has undergone a transformation with the research institution have direct financial interest
development and flourishing of an “orthopedic in the research program” [1]. However, for the
industry” [5]. Much of the industry-fueled inno- researcher or research institution, financial bias
vation has come in terms of implant develop- may not be obvious and can often be subtle. The
ment. During the same period, there has been a Academy (AAOS) references two ethical princi-
shift in research funding from predominantly ples for researchers to fall back on when faced
charitable or government funded to one where with economic conflicts of interest (refer to
industry funding represents a majority [5]. This Table 4.3) [1]. One can conclude from these two
includes industry support for education, meet- principles set forth by AAOS that once someone
ings, and conferences. Although increased indus- has received funding from a corporation they
try support clearly has the power to discover should not buy or sell the corporations stocks
positive advancements in the treatment of during the entirety or their involvement in the
patients, it also produces a potential bias and situ- project. Also, someone who has developed an
ation of potential conflict. The potential for con- implant from which they will receive royalties
flicts has led to most societies and journals in the should avoid doing research on that implant,
orthopedic community requiring full statements leaving the research to a disinterested third party
of disclosure when presenting any research work. who has no potential financial interest from use
However, successfully navigating the potential of the device [1]. That does not exclude research-
conflict of interests in regard to industry funding ers from being able to serve as consultants for a
is still an active challenge. corporation, given that the compensation is in
In order to better understand the arena, it is line with their efforts.
important to grasp the three parties who have dis-
tinct interests in industry-funded research: the

Clinical Vignette
Table 4.3  Ethical principles for economic conflict of Below are examples of unethical behavior
interest [1]
regarding conflict of interests as stated by
1. A researcher ethically may share the economic
the AAOS:
rewards of his or her efforts. If a drug, device, or
other products becomes financially remunerative, the
researcher may receive profits that reasonably • Knowingly negotiating for more fund-
resulted from his or her contribution. The Academy’s ing than is appropriate to support the
Standards of Professionalism on Orthopaedic
project and related institutional and
Surgeon-Industry Relationships and the Code of
Medical Ethics and Professionalism for Orthopaedic departmental overhead costs
Surgeons explicitly permit an orthopedic surgeon to • A researcher’s selling or purchasing
receive royalties. However, ethically the researcher stock in a company whose orthopedic
may not reap profits that are not justified by the value
device is being tested by that orthopedic
of his or her actual efforts
2. Potential sources of bias in research should be surgeon-researcher
eliminated, particularly where there is a direct • A researcher accepting financial incen-
relationship between a researcher’s personal interests tives to alter data
and potential outcomes of the research
4  Ethical Consideration in Orthopedic Research 41

methods, risks, benefits, and alternatives to


• A researcher accepting excessive remu- participation.
neration by the funding corporation for • There are seven principles to guide human
evaluating or interpreting data about subject research protocol design: social and
that corporation’s products clinical value, scientific validity, fair subject
• A failure to disclose research or consult- selection, favorable risk-benefit ratio, inde-
ing arrangements with the funding cor- pendent review, informed consent, and respect
poration when reporting about research for potential subjects.
on devices manufactured by that • Although there is justified concern, there is
corporation still a role for sham surgery when a protocol
can be designed that minimizes risk while pre-
Clinical Trials Registration senting a potential robust benefit.
Another aspect of the ethical conduct of • There is a growing concern for potential con-
research is in reporting results accurately flict of interest related to industry funding, and
and completely. Concerns regarding selec- therefore it is critical that a researcher be
tive publication of data (e.g., only positive aware of several ethical principles pertinent to
information in a pharmaceutical trial) that conflict of interest when navigating this area.
could result in the reporting of misleading • Clinical trials registration is important ethi-
conclusions have prompted the cally to reduce the risk of selective p­ ublication;
International Committee of Medical it has also become a standard for publication
Journal Editors (ICMJE) to require clinical in most major medical journals.
trials registration of any study that assigns
patients to an intervention [18]. For publi-
cation to be considered, these studies must
be prospectively registered in a publically References
accessible database, such as clinicaltrials.
1. American Academy of Orthopaedic Surgeons,
gov or the World Health Organization
Opinion on Ethics and Professionalism. Ethics in
(WHO) International Clinical Trials health research in orthopaedic surgery. Adopted
Registry Platform [18]. October 1994. Revised December 1995, May 2002,
July 2003, September 2005, and September 2016.
2. Assessing Level of Risk and Type of IRB Review.
Research compliance news. University of South
Alabama. 2008. www.southalabama.edu/research-
Take-Home Messages compliance/pdf/compliancenews0908.pdf. Accessed
1 Feb 2018.
• The essence of ethics regarding human subject 3. Brody B. The ethics of biomedical research. New York:
research is respect for the participant’s auton- Oxford University Press; 1998.
omy and well-being. 4. Brown JG. Department of Health and Human Services.
• The modern era began with the end of World Office of Inspector General. Institutional review
boards: their role in reviewing approved research.
War II and the guidelines put forth with the Office of Evaluations and Inspections; 1998.
subsequent Nuremberg Code of 1947. 5. Carr AJ. Which research is to be believed. J Bone Joint
• The Belmont Report identified three basic Surg (Br). 2005;87-B:1452–3.
ethical principles: respect for persons, benefi- 6. Clinical Research and the HIPAA Privacy Rule. NIH
Publication Number 04-5495. 2004. http://privacyru-
cence, and justice. leandresearch.nih.gov/clin_research.asp. Accessed 29
• Institutional review boards were established to Jan 2018.
regulate human subject research and ensure 7. Cobb WM.  The tuskegee syphilis study. J Natl Med
adherence ethical practices. Assoc. 1973;65:345–8.
8. Cohen J. History and ethics of human subjects research.
• Informed consent requires a patient to be able Collaborative Institutional Training Initiative. CITI
to understand and verbalize a study’s purpose, Program. 2017. Accessed 28 Jan 2018.
42 J. L. Koh and D. Villacis

9. Declaration of Helsinki History Website. Ethical prin- 21. Moseley JB, O’Malley K, Petersen NJ, Menke TJ,
ciples for medical research. JAMA Netw. Accessed 26 Brody BA, Kuykendall DH, Hollingsworth JC, Ashton
Feb 2018. CM, Wray NP. A controlled trial of arthroscopic sur-
10. Emanuel EJ, Wendler D, Grady C. What makes clini- gery for osteoarthritis of the knee. N Engl J Med.
cal research ethical? JAMA. 2000;283:2701–11. 2002;347:81–8.
11. Friedman EA. Ethical issues in clinical research. In: 22. NIH & Clinical Research. Ethics in Clinical Research.
Supino PG, Borer JS, editors. Principles of research http://clinicalresearch.nih.gov/ethics_guides.html.
methodology: a guide for clinical investigators. Ed 1 Accessed 26 Jan 2018.
ed. New York: Springer; 2012. p. 233–54. 23. Nuremberg Code [from Trials of War Criminals

12. Health Insurance Portability and Accountability Act Before the Nuremberg Military Tribunals Under
of 1996. Public Law 104–191. 104th Congress. 1996. Control Council Law No. 10. Nuremberg, October
http://www.gpo.gov/fdsys/pkg/PLAW-104publ191/ 1946–April 1949. Washington, DC: U.S.  G.P.O,
pdf/PLAW-104publ191.pdf. Accessed 30 Jan 2018. 1949–1953].
13. Heller J. Syphilis victims in U.S. study went untreated 24. Official Journal of the European Union. Legislation
for 40 years. New York Times (New York). 1972;1:8. L119. http://eur-lex.europa.eu/legal-content/EN/
14. The hippocratic oath: today. Doctors’ diaries. WGBH TXT/?uri=OJ%3AL%3A2016%3A119%3ATOC.
Educational Foundation. 1964. http://www.pbs. Accessed 27 Feb 2018.
org/wgbh/nova/body/hippocratic-oath-today.html. 25. Owsei T, Temkin C.  Ancient medicine. Selected

Accessed 28 Jan 2018. papers of Ludwig Edelstein Johns. Baltimore:
15. Hrobjartsson A, Gotzsche PC. Is the placebo power- Hopkins University Press; 1987.
less? An analysis of clinical trials comparing placebo 26. Public, Protection of Human Subjects, Basic HHS
with no treatment. N Engl J Med. 2001;344:1594– Policy for Protection of Human Research Subjects,
602. Erratum in: N Engl J Med;345:304. Title 45 CFR Part 46, Subpart A. 2005. http://ohsr.
16. Hrobjartsson A, Gotzsche PC.  Placebo interventions od.nih.gov/guidelines/45cfr46.html. Accessed 28 Jan
for all clinical conditions. Cochrane Database Syst 2018.
Rev. 2004;3:CD003974. 27. Shuster E.  Fifty years later: the significance of the
17. HSS.  Summary of the HIPAA Privacy Rule. http:// Nuremberg Code. N Engl J Med. 1997;337:1436–40.
www.hhs.gov/ocr/privacy/hipaa/understanding/sum- 28. Sussman MD.  Ethical requirements that must be

mary/privacysummary.pdf. Accessed 30 Jan 2018. met before the introduction of new procedures. Clin
18. International Committee of Clinical Journal Editors. Orthop Relat Res. 2000;378:15–22.
Clinical Trials. http://www.icmje.org/recommenda- 29. Sussman MD.  Ethical standards in the treatment of
tions/browse/publishing-and-editorial-issues/clinical- human subjects involved in clinical research. J Pediatr
trial-registration.html. Accessed 27 Feb 2018. Orthop. 1998;18:701–2.
19. Mazur DJ.  Evaluating the science and ethics of
30. Wolf BR, Buckwalter JA. Randomized surgical trials
research on humans: a guide for IRB members. and “sham” surgery: relevance to modern orthopae-
Baltimore: The Johns Hopkins University Press; dics and minimally invasive surgery. Iowa Orthop J.
2007. 2006;26:107–11.
20. Mehta S, Myers TG, Lonner JH, Huffman GR, Sennett
BJ. The ethics of sham surgery in clinical orthopaedic
research. J Bone Joint Surg Am. 2007;89:1650–3.
Conflict of Interest
5
Michael Hantes and Apostolos Fyllos

5.1 Introduction Fact Box 5.1


With fewer government funds available for
A conflict of interest is a set of circumstances that
creates a risk that professional judgment or actions research and education worldwide, many
regarding a primary interest will be unduly influ- researchers are turning to industry for
enced by a secondary interest. research support. In the USA, 71% of
This is the definition of conflict of interest in research and development funding comes
medical science given in 2009 by the Institute of from industry, followed by government
Medicine of the National Academies [10]. With (21%) and private foundations (4%).
fewer government funds available for research
and education worldwide, many researchers are
turning to industry for research support. Industry 5.2 Definition of Conflict
relationships with academic biomedical research- of Interest
ers are extensive [3, 21]. In the USA, 71% of
research and development funding comes from The three main elements of the said definition are
industry, followed by government (21%) and pri- the primary and secondary interest and their con-
vate foundations (4%) [2]. Despite their benefits, flict. Primary interests include promoting and
relationships with industry create conflicts of protecting the integrity of research method and
interest that can undermine the primary goals of results and the welfare of patients and research
medical research. Several systematic reviews and participants. Secondary interests may include not
other studies provide substantial evidence that only financial gain but also the desire for profes-
clinical trials with industry ties are more likely to sional advancement, recognition for personal
have results that favor industry [13, 19]. Other achievement, and favors to friends and family or
sources of conflict of interest, other than finan- to students and colleagues. The secondary inter-
cial, exist as well. ests are objectionable only when they have
greater weight than the primary interest in pro-
fessional decision-making. For the researcher,
financial interests should be subordinate to pre-
senting objective unbiased scientific evidence.
M. Hantes (*) · A. Fyllos The third element, the actual conflict, exists
Department of Orthopaedic Surgery, Faculty of whether or not a particular party is actually influ-
Medicine, University of Thessalia, University
enced by the secondary interest. Both experience
Hospital of Larissa, Larissa, Greece
e-mail: hantesmi@otenet.gr and research indicate that under certain c­ onditions

© ISAKOS 2019 43
V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_5
44 M. Hantes and A. Fyllos

there is a risk that professional judgment may be ing than the other interests but that they are rela-
influenced more by secondary interests than by tively more objective and quantifiable [10].
primary interests [10].
Conflict of interest is a worthwhile subject con-
sidering the simple truth that financial ties between 5.3 Recognition
industry and academic researchers contain the and Consequences
inherent risk that a guideline development process of Conflict of Interest
or research results may be compromised. Conflict
of interest policies typically and reasonably focus Financial conflicts of interest and intellectual
on financial gain and financial relationships. An bias can influence recommendations for clinical
“intellectual conflict of interest” is also possible practice. The issue of orthopedic surgeons having
and has been defined as well by Guyatt et al. [9]: industry-related conflicts of interest that could
Academic activities that create the potential for an potentially affect the scientific reports that are
attachment to a specific point of view that could used to guide orthopedic patient care remains
unduly affect an individual’s judgment about a spe- important patient- and community-wise. To begin
cific recommendation.
with, dissemination of orthopedic research can be
Potential conflicts of interest can arise because affected by industry funding-related conflict of
of the professional position held by the author, interest. Rates of citation may be influenced by
such as being employed by a private clinic or sit- characteristics other than the impact factor of the
ting on advisory boards related to specific treat- publishing journal. It has been found that high
ments. Researchers can feel pressured to obtain citation rates in medical literature are associated
funding, publish, advance careers, or obtain ten- with industry funding and the reporting of an
ure. Universities add to this pressure by pushing industry-favoring result, among others and in
researchers to succeed [16]. These personal inter- addition to journal impact factor [11]. Even more
ests also have the potential to bias researchers recently, in a study focused exclusively on ortho-
when they conflict with research. Furthermore, pedic journals, high level of evidence, large sam-
intellectual COI is further characterized as ple size, representation from multiple institutions,
“important,” when it includes authorship of origi- and conflict of interest disclosure itself involving
nal studies and peer-reviewed grant funding by a nonprofit organization or for-profit company
the government or nonprofit organizations that are associated with higher rates of citation in
directly relate to a recommendation and should be orthopedics [18]. A possible explanation offered
frowned upon, and “less important,” as is the par- is the possibility that researchers who secure
ticipation in previous guideline panels that must external funding may be able to publish articles
be acknowledged but do not necessarily preclude that are scientifically superior and therefore more
participation in developing new recommenda- likely to be cited.
tions. The reason of focus on financial COI is not Biases from financial and/or intellectual COIs
that financial gains are necessarily more corrupt- may result from the ability of such conflicts to
affect decision-making in a way that is com-
pletely hidden from the person making the deci-
Fact Box 5.2 sion. A related concern involves access to data. In
The three main elements of the definition industry-supported research, there is often the
of a conflict of interest are the primary and case that the investigator may lack full access to
secondary interest and their conflict. the study data. This exposes and creates a new
Secondary interests are objectionable only idea of unintentional bias on the researcher’s
when they have greater weight than the pri- behalf. Industry sponsors may even go as far as
mary interest in professional decision-­ preventing publication of findings not in their
making, interest, and their conflict. favor. Conflict of interest relates to people, not to
companies or organizations. Most people
5  Conflict of Interest 45

involved in research do not recognize the effect and quality of drug trials financed by pharmaceu-
of these conflicts on their judgments [5, 23]. ticals, the methodological quality of trials funded
Declaring or acknowledging conflicts does not by profit organizations was not found to be worse
mitigate their effects. Moral licensing occurs than trials funded through other more indepen-
when disclosure of a COI reduces feelings of dent sources. It was however suggested that pro-
guilt of the advisor, resulting in more biased tocols were influenced to favor the trial product
advice because advisees have been warned [4, and results were interpreted more favorably [22].
15]. Even though a researcher could be unbiased Another way of looking into this subject is that
and declaring no conflict of interest whatsoever, bias lies in funding decisions made by companies
one should not fail to mention the potential con- toward studies that are likely to promote their
flict of interest of editors and reviewers in rela- interests and in manipulating the process of
tion to the review or publication of research. research [19]. Finally, as far as perceptions of
financial ties to profit organizations by patients,
professionals, and researchers, the quality of
Fact Box 5.3 research evidence is thought to be compromised,
Biases from financial and/or intellectual and ties should be disclosed in order to enable
COIs may result from the ability of such readers to interpret results through their own
conflicts to affect decision-making in a way scope [14, 17].
that is completely hidden from the person
making the decision.
5.4 Management of Conflict
of Interest
Distinction between an actual conflict of inter-
est, an apparent conflict of interest, and a poten- Institutions, professional organizations, and
tial conflict of interest exists. A perceived conflict governments establish policies to address the
of interest may be as important as an actual con- problem of conflict of interest on behalf of the
flict of interest. Financial COI is an important public in order to achieve and maintain high
factor impacting the public’s trust in research. A standards of care. Such policies work best when
perceived conflict of interest can undermine the they aim at prevention or mitigation of its
public’s opinion and trust in the quality of public impact on research integrity rather than penalty
sector services, institutional processes, and pub- after occurrence [20]. Although penalizing
lic institutions [1]. researchers who violate disclosure policies may
Fear for biased results or even fake results is also assist with prevention, the presence of a
legitimate. Some researchers argue that disclo- conflict of interest does not imply that any indi-
sure policies of scientific journals about research- vidual is improperly motivated. To avoid these
ers’ financial ties to their work do little to help and similar mistakes and to provide guidance
with bias prevention or identification. for formulating and applying such policies, a
Furthermore, valuable research can be discarded framework for analyzing conflicts of interest is
by readers if financial ties are declared, even desirable. Conflict of interest policies should
though results were not affected by funding [6]. not only address concerns that financial rela-
Another way of affecting readers’ perception is tionships with industry may lead to bias or a
bias in control group choice in prospective ran- loss of trust but should also consider the poten-
domized controlled trials. Such bias can favor tial benefits of such relationships in specific
products that are made by the company funding situations.
the research, and this might be due to the selec- There are three key steps in managing conflict
tion of inappropriate comparator products and of interest: (a) identifying relevant suspicious
publication bias. In a recent study looking into relationships, (b) resolving such existing rela-
possible influences on the findings, protocols, tionships, and (c) disclosing relevant ­relationships
46 M. Hantes and A. Fyllos

to learners [7]. Eliminating the conflict may not In reviewing manuscripts, reviewers need to
be possible for ethical or practical reasons. hold in mind the potential for conflict of interest
Requirements for the registration of clinical and be vigilant for potential conflicts of interest.
trials are, in part, a response to concerns about Reviewers need to check that references used to
conflict of interest in industry-sponsored research support statements are being appropriately uti-
and research reporting. The registration of clini- lized, that the claims made are reasonable given
cal trials ensures that basic methods for the con- the findings of the study, that negative findings
duct and analysis of the findings of a study as are fairly reported, and that conclusions and rec-
well as the primary clinical end points to be ommendations are commensurate with the impli-
assessed and reported are specified before the cations that can be drawn from the study. Editors
trial begins and before data are analyzed [10]. In also need to consider their own individual con-
the USA, management of COI and objectivity in flicts in a similar way to authors and reviewers.
research is regulated federally by the Department They have the job of developing policy and
of Health and Human Services [8]. ensuring that authors’ and reviewers’ interests do
Essentially all medical journals require disclo- not distort publication decisions. It is the readers,
sure of potential conflicts of interest from their the healthcare professionals, that have the most
authors. Unfortunately the format of that disclo- difficult part in this sort of conflict, since they
sure is not consistent among journals. The man- have to develop their own approach to the inter-
agement of conflict of interest in academic pretation of the results. Awareness is the key, and
publishing involves multiple parties: those who the responsibility is theirs to question whether
research, those who edit and publish, and those published material may be influenced by interests
who read the research. Awareness of the possibil- of authors, reviewers, or editors [17].
ity of conflict of interest is the first step to recog-
nition, and the difficult part in managing this
complex issue is declaration of conflict of inter-
Fact Box 5.4
est. The responsibility for managing conflict of
There are three key steps in managing con-
interest in publishing is equally spread across
flict of interest: (a) identifying relevant sus-
authors, reviewers, editors, and readers.
picious relationships, (b) resolving such
For researchers and authors, there is a need to
existing relationships, and (c) disclosing
be aware of those situations that have the potential
relevant relationships to learners.
to bias results or perspectives. Potential conflicts
Eliminating the conflict may not be possi-
may include financial or in-kind support from
ble for ethical or practical reasons.
vested interests. Most clearly and easily identified
is research funding sources directly related to the
publication, but this support could also include
conference scholarships, paid speaking engage- Take-Home Message
ments, and prizes. Managing this potential con- • It is important to understand that having a con-
flict of interest in the first instance involves flict does not mean wrongdoing.
awareness and acknowledgement of potential or • A conflict is a situation that increases the
real conflict of interest and declaration on submis- potential or risk for bias to occur, and steps
sion. Adherence to sound academic principles, should be taken to assure the integrity of the
such as being true to the data, not over-claiming, research.
and not under-disclosing, is a protection that can • The goal is to increase transparency, primarily
help manage potential conflicts. In several coun- though disclosure and regulation, which
tries, in addition to reporting funding, researchers allows consumers of the research to consider
must disclose COIs in presentations and publica- potential influences related to the conflict.
tions, as well as to colleagues and, in some cases, • Sometimes careful judgment is required based
research subjects [12]. on the facts and the nature of the research.
5  Conflict of Interest 47

References 13. Lee K, Bacchetti P, Sim I. Publication of clinical tri-


als supporting successful new drug applications: a lit-
erature analysis. PLoS Med. 2008;5:e191. https://doi.
1. Australian Research Council. ARC conflict of inter-
org/10.1371/journal.pmed.0050191.
ests and confidentiality policy Version 2017.1. http://
14. Licurse A, Barber E, Joffe S, Gross C. The impact of
www.arc.gov.au/arc-conflict-interest-and-confidenti-
disclosing financial ties in research and clinical care:
ality-policy-version-20171.
a systematic review. Arch Intern Med. 2010;170:675–
2. Battelle. 2014 Global R&D funding forecast. R&D
82. https://doi.org/10.1001/archinternmed.2010.39.
Magazine. 2013. https://www.battelle.org/docs/
15. Loewenstein G, Sah S, Cain DM.  The unintended
default-source/misc/2014-rd-funding-forecast.
consequences of conflict of interest disclosure.
pdf?sfvrsn=2.
JAMA. 2012;307:669–70. https://doi.org/10.1001/
3. Bekelman JE, Li Y, Gross CP.  Scope and impact of
jama.2012.154.
financial conflicts of interest in biomedical research: a
16. Marcovitch H, Barbour V, Borrell C, Bosch F,

systematic review. JAMA. 2003;289:454–65. https://
Fernández E, Macdonald H, Marusić A, Nylenna
doi.org/10.1001/jama.289.4.454.
M, Esteve Foundation Discussion Group. Conflict
4. Cain DM, Detsky AS.  Everyone’s a little bit biased
of interest in science communication: more than
(even physicians). JAMA. 2008;299:2893–5. https://
a financial issue. Report from Esteve Foundation
doi.org/10.1001/jama.299.24.2893.
Discussion Group, April 2009. Croat Med J. 2010;51:
5. Cosgrove L, Bursztajn HJ, Erlich DR, Wheeler EE,
7–15.
Shaughnessy AF.  Conflicts of interest and the qual-
17. O’Brien L, Lakeman R, O’Brien A. Managing poten-
ity of recommendations in clinical guidelines. J Eval
tial conflict of interest in journal article publication.
Clin Pract. 2013;19:674–81. https://doi.org/10.1111/
Int J Ment Health Nurs. 2013;22:368–73. https://doi.
jep.12016.
org/10.1111/j.1447-0349.2012.00869.x.
6. de Melo-Martín I, Intemann K.  How do disclo-
18. Okike K, Kocher MS, Torpey JL, Nwachukwu

sure policies fail? Let us count the ways. FASEB
BU, Mehlman CT, Bhandari M.  Level of evi-
J. 2009;23:1638–42. https://doi.org/10.1096/
dence and conflict of interest disclosure associ-
fj.08-125963.
ated with higher citation rates in orthopedics. J Clin
7. Dickerson P, Chappell K.  Content integrity, con-
Epidemiol. 2011;64:331–8. https://doi.org/10.1016/j.
flict of interest, and commercial support: defin-
jclinepi.2010.03.019.
ing and operationalizing the terms. J Nurses Prof
19. Resnik DB, Elliott KC. Taking financial relationships
Dev. 2015;31:225–30. https://doi.org/10.1097/
into account when assessing research. Account Res.
NND.0000000000000182.
2013;20:184–205. https://doi.org/10.1080/08989621.
8. Electronic Code of Federal Regulations. PART 50—
2013.788383.
Policies of general applicability. Subpart F—Promoting
20. Resnik DB. Science and money: problems and solu-
objectivity in research. https://www.ecfr.gov/cgi-bin/
tions. J Microbiol Biol Educ. 2014;15:159–61. https://
text-idx?SID=b17a2b836d6343b5c359269f32ed5ccb
doi.org/10.1128/jmbe.v15i2.792.
&mc=true&node=pt42.1.50&rgn=div5#sp42.1.50.f.
21. Rockey SJ, Collins FS.  Managing financial con-

9. Guyatt G, Akl EA, Hirsh J, Kearon C, Crowther M,
flict of interest in biomedical research. JAMA.
Gutterman D, et al. The vexing problem of guide-
2010;303:2400–2. https://doi.org/10.1001/
lines and conflict of interest: a potential solution.
jama.2010.774.
Ann Intern Med. 2010;152:738–41. https://doi.
22. Schott G, Pachl H, Limbach U, Gundert-Remy

org/10.7326/0003-4819-152-11-201006010-00254.
U, Ludwig WD, Lieb K.  The financing of drug tri-
10. IOM (Institute of Medicine). Conflict of inter-

als by pharmaceutical companies and its conse-
est in medical research, education, and practice.
quences. Part 1: a qualitative, systematic review of
Washington, DC: The National Academies Press;
the literature on possible influences on the findings,
2009.
protocols, and quality of drug trials. Dtsch Arztebl
11. Kulkarni AV, Busse JW, Shams I.  Characteristics

Int. 2010;107:279–85. https://doi.org/10.3238/
associated with citation rate of the medical literature.
arztebl.2010.0279.
PLoS One. 2007;2:e403.
23. Thagard P.  The moral psychology of conflicts

12.
Lach HW.  Financial conflicts of interest in
of interest: insights from affective neurosci-
research: recognition and management. Nurs
ence. J Appl Philos. 2007;24:367–80. https://doi.
Res. 2014;63:228–32. https://doi.org/10.1097/
org/10.1111/j.1468-5930.2007.00382.x.
NNR.0000000000000016.
Ethics in Clinical Research
6
Naomi Roselaar, Niv Marom, and Robert G. Marx

6.1 Introduction subjects became the longest nontherapeutic study


of humans in medical history [11]. The Tuskegee
6.1.1 Tuskegee Syphilis Experiment Syphilis Experiment illustrated the exploitation
of vulnerable patients, the need for informed con-
In 1932, the US Public Health Service and sent, and the misrepresentation of minority popu-
Tuskegee Institute in Alabama began an observa- lations in clinical studies. Since then, great care
tional study of syphilis in African American men has been taken at all levels of clinical research to
[6]. Called the “Tuskegee Study of Untreated employ ethical guidelines and regulatory com-
Syphilis in the Negro Male,” the study was mittees to oversee studies involving human
intended to demonstrate the need for a syphilis subjects.
treatment program [6]. Approximately 600 sub-
jects, of whom 400 had syphilis, were told noth-
ing of their disease. Despite the availability of 6.2 Regulatory and Ethical
bismuth, arsenic, mercury, and later penicillin, as Guidelines in Clinical
therapy, the subjects were offered no treatment Research
[11]. Subjects suspected of receiving injections
of arsenic or mercury were immediately replaced When proposing and conducting experiments
[32]. As reported in a paper read before the 14th involving human subjects, researchers must com-
Annual Symposium on Recent Advances in the ply with international, federal, and institutional
Study of Venereal Diseases in January 1964, guidelines to protect participants. The US
“Fourteen young, untreated syphilitics were Department of Health and Human Services’
added to the study to compensate for this” [32]. “Common Rule,” the Institutional Review Boards
Following media outrage, the study concluded in (IRBs) at individual institutions, and the Health
1972 when a nine-person panel found that no Insurance Portability and Accountability Act of
information had been provided to subjects before 1996 (HIPAA) are three primary regulatory mea-
they agreed to participate [6]. This 40-year exper- sures for ethics in clinical research. All three
iment on non-consenting, medically neglected were enacted after the 1964 adoption of the World
Medical Association’s Declaration of Helsinki:
Ethical Principles for Medical Research
N. Roselaar · N. Marom · R. G. Marx (*) Involving Human Subjects.
Orthopedic Surgery, Hospital for Special Surgery,
New York, NY, USA
e-mail: roselaarn@hss.edu; maromn@hss.edu;
marxr@hss.edu

© ISAKOS 2019 49
V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_6
50 N. Roselaar et al.

6.2.1 Declaration of Helsinki tions under the Common Rule [40]. Revisions to
the Common Rule were proposed in 2015 by
The Declaration of Helsinki (DOH) has been HHS and 15 other federal departments and agen-
considered the gold standard for ethics in clinical cies [22]. The updates are designed to reflect
research [26]. In its current form, the DOH changes in research over the 35  years since the
applies to human subjects, data, and material inception of the Common Rule [22]. Goals
(WMA). Central tenets of the DOH protect the include enhancing respect and strengthening
health and rights of all patients involved in clini- informed consent, particularly for the long-term
cal research and advocate for the continuous use of de-identified biospecimens; enhancing
evaluation of safety, effectiveness, efficiency, safeguards by specifying privacy and security
accessibility, and quality of human subject measures concerning identifiable information;
research [46]. It was originally composed of 14 streamlining Institutional Review Board (IRB)
short statements outlining ethical guidelines for review by clarifying levels of risk and the IRB
conducting human subject research [47]. Since process for multisite studies; and calibrating
its inception, the DOH has been revised seven oversight [40]. The HHS offered the opportunity
times, 1975 (Tokyo), 1983 (Venice), 1989 (Hong for public comments on revisions until January
Kong), 1996 (Somerset, South Africa), 2000 2016 [42].
(Edinburgh), 2008 (Seoul), and 2013 (Fortaleza,
Brazil), and revised twice [46]. By 2014, the
DOH included 37 detailed principles [46]. The 6.2.3 Institutional Review Board
basis of the declaration stems from the Nuremberg
Code [5]. This seminal code of ethics was estab- The Common Rule also states that research con-
lished at the conclusion of Nuremberg trials for ducted or supported by organizations outside a
Nazi war crimes, including horrifically violent federal department or agency must be compliant
human medical experiments on Holocaust vic- with the host’s Institutional Review Board (IRB)
tims [34]. [40]. Like the Common Rule, IRBs aim to provide
ethical and regulatory oversight for research with
human subjects. On an institutional level, they
6.2.2 Common Rule ensure compliance with external laws, policies,
and regulations [27]. Both the Common Rule and
In the United States, the Department of Health IRBs operate to uphold ethical principles defined
and Human Services (HHS) issues regulations on in the Belmont Report of 1979 [40, 27]. The pri-
the ethical conduct of research on humans [22]. mary principles include respect for persons, benef-
The HHS Code of Federal Regulations 45 Part 46 icence, and justice [27]. Research approved by an
Protection of Human Subjects was developed in IRB is subject to annual continuing reviews.
1981 and updated in 2009 [40]. The policy is
known colloquially as the “Common Rule” and
protects human subjects in research conducted or 6.2.4 HIPAA
supported by a federal department or agency
[40]. It requires researchers to provide informed, In addition to the protection of subjects them-
written consent, full disclosure of the benefits selves, information that identifies subjects must
and foreseeable risks of the proposed study, and a also be protected. The Health Insurance
statement addressing subject rights to refuse par- Portability and Accountability Act of 1996 seeks
ticipation at any point [40]. Considered vulnera- to increase privacy protection for human research
ble populations, pregnant women, human fetuses participants [10]. This includes the privacy and
(definite as the “product of conception from security of information that could be used to
implantation until delivery”), neonates, prison- identify subjects in a particular study such as
ers, and children are offered additional protec- name, medical record number, birthdate, social
6  Ethics in Clinical Research 51

security number, address, or identifying photo- other fields in clinical research including cardio-
graph. As electronic medical records become vascular disease, respiratory disease, mental
more prevalent, and security and privacy issues health services, and substance abuse [2, 24, 33, 3].
extend to the online storage of identifiable data, Despite legislation, barriers in inclusion crite-
regulations must also change. In 2000, 2004, ria may prevent trials from including minority
2009, and 2013, HIPAA has been modified and populations. English language fluency is required
extended to reflect technological advances [41]. for clinical trials more and more frequently [16].
The HIPAA Privacy Rule, which protects The mistrust of healthcare professionals and the
identifiable health information, was introduced in lack of understanding of clinical research also
2000 and mandated nationally in 2003 [29]. By contribute to low rates of minority participation
protecting ownership and transfer of specific pro- [2]. Decreased access to academic institutions
tected health information (PHI), the Privacy Rule pursuing clinical research by minorities influ-
aims to safeguard health information associated ences recruitment diversity [14].
with individuals while facilitating data flow to
maximize the quality of health care [43]. PHI
comprises any information that can be used to 6.4 Ethical Publishing Practices
identify a study participant [43]. The implemen-
tation of the HIPAA Privacy Rule in clinical 6.4.1 Data Fraud and Misconduct
research has also provoked criticism. In 2007,
JAMA published a study in which more than One of the most highly publicized fraudulent stud-
two-thirds of surveyed epidemiologists perceived ies is Dr. Andrew Wakefield’s case series suggest-
“substantial, negative influence on the conduct of ing an association between the MMR vaccine and
human subjects health research” after the imple- autism [45]. Published in The Lancet, a British
mentation of the HIPAA Privacy Rule [29]. To medical journal, in 1998, Dr. Wakefield’s paper
encompass the protection of electronic PHI caught the attention of mainstream media [31].
(e-PHI), the HIPAA Security Rule (HSR) was Consequently, the rate of MMR vaccines for tod-
enacted in 2003 [44]. dlers in the United Kingdom decreased from 83.1%
in 1997 to 69.9% in 1998 [39]. A one-page com-
mentary titled, “Retraction of an Interpretation”
6.3 Ethical Population was published in 2004 by 10 of the 13 authors of
Representation the original article [25]. Simultaneously, editors at
The Lancet acknowledged a lack of financial dis-
The lack of diverse racial and ethnic representa- closures by Dr. Wakefield et al. and reaffirmed “the
tion in clinical research prevents the best possible paper’s suitability, credibility, and validity for pub-
treatment for disease outcomes in heterogeneous lication” [21]. In 2010, The Lancet retracted Dr.
populations [30]. To address this issue, Congress Wakefield’s article.
passed the National Institute of Health (NIH) This example demonstrates many important
Revitalization Act of 1993, intended to catalyze facets of the ethics of clinical research. The
the diversification of participants in clinical researchers failed to report accurate findings and
research [28]. With minimal exceptions the drew speculative conclusions from a small, non-
Revitalization Act requires NIH-funded clinical representative case series [31]. These unethical
research to include women and members of actions by the authors were compounded by irre-
minority groups [28]. sponsible publishing practices at The Lancet. The
Twenty years after the introduction of regula- publishers failed to require proper disclosure of
tory laws to diversify participation in clinical conflicts of interest, specifically those that
research, the minority participation in cancer clin- revealed Dr. Wakefield’s financial gains related to
ical trials remained minimal [7]. The lack of the research conclusions [12]. Furthermore, The
minority representation persists across many Lancet only retracted the fraudulent article
52 N. Roselaar et al.

12 years after its initial publication [15]. After the Journal Editors (ICMJE). The conflicts of interest
retraction of the article, investigative journalist included by the ICJME conflict of interest form
Brian Deer published multiple articles in the include financial activities related to the work as
BMJ revealing Dr. Wakefield’s connection to well as relevant financial activities outside the sub-
lawyer Richard Barr [12]. Barr, who was working mitted work. Relevant financial activities may
to file a lawsuit against vaccine manufacturing include relationships with a pertinent entity such
companies, provided Dr. Wakefield with as a government agency, foundation, academic
£400,000 through the Legal Aid Fund while also institution, or commercial sponsor; grants; per-
representing the anti-vaccine organization sonal fees; royalties; leadership positions; and
Justice, Awareness, and Basic Support (JABS) nonfinancial support [23].
[13]. Barr used his connection to JABS to find
patients for Dr. Wakefield’s study [12].
However, cases such as Dr. Wakefield’s are not 6.4.3 Self-Citation
common. Although data fraud is difficult to moni-
tor and likely underreported, confirmed cases of Self-citation refers to referencing an article from
data fabrication, falsification, and plagiarism exist the same journal [8]. The rate of self-citation for
among 0.01% of scientists according to the US a medical journal is defined by the number of
Public Health Service [19]. Dr. Wakefield’s will- self-citations divided by the number of total ref-
ful deceit through data falsification is classified as erences made by that journal in a specified time
fraud, while misconduct refers to honest errors in period [20]. In orthopedics journals, many fac-
ethical research practices [19]. In addition to data tors influence the differences in self-citation
fraud and misconduct, there are many other rates. Self-citation rates are highest in sub-­
aspects of clinical research for which good clini- specialized journals due to their specificity [36].
cal practices must be observed. They include con- Specialized orthopedics journals including Spine,
flict of interest disclosure, self-citation, and Arthroscopy, and FAI have self-citation rates two
distinguishing predatory journals. and three times higher than the general orthope-
dics journals CORR and JBJS (Am), respectively
[36]. Rates of self-citation are categorized as
6.4.2 Conflict of Interest “high” if they are at or above 20% (JCR).
Self-citation rates are relevant in the calcula-
Conflicts of interest in clinical orthopedic research tion of medical journal impact factors [17].
are any instances of overlapping personal, finan- According to the Science Citation Index, the
cial, or academic involvement that may bias or impact factor for medical journals “measures the
influence a participant’s work. Investigators are average number of citations received in a particu-
required to submit conflict of interest statements lar year by papers published in the journal during
for project proposals, manuscript publications, and the two preceding years” [8]. Practices surround-
presentations at conferences. This benefits con- ing the manipulation of self-citation rates intro-
sumers of the research as it provides context for duce bias into impact factors [20]. For journals in
the circumstances under which research is con- which self-citations dominate the references, the
ducted. It is the responsibility of authors, editors, true contribution to the journal’s discipline may
peer reviewers, and another other staff members be misrepresented [18].
who play a role in determining the publication or
presentation of a study to disclose any relevant
conflicts of interest [23]. Internationally, many 6.4.4 Predatory Journals
orthopedics journals follow the “Recommendations
for the Conduct, Reporting, Editing and Publication An increase in the existence of open-access pub-
of Scholarly Work in Medical Journals” estab- lications with minimal or no peer review has been
lished by the International Committee of Medical accredited to the rise of spam emails [4]. Known
6  Ethics in Clinical Research 53

as “predatory journals” [1], these publications References


often require authors to pay high fees for the pro-
cessing and peer-review process with no follow-­ 1. Beall J.  Predatory publishers are corrupting open
through [4]. Jeffrey Beall, a former Scholarly access: journals that exploit the author-pays model
damage scholarly publishing and promote unethical
Initiatives Librarian at the University of Colorado behaviour by scientists, argues Jeffrey Beall. Nature.
Denver coined the term and proposed the first list 2012;489(7415):179.
of predatory journals [1]. He warned that these 2. Boden-Albala B, Carman H, Southwick L, Parikh NS,
publishers “exploit the author-pays model, dam- Roberts E, Waddy S, et  al. Examining barriers and
practices to recruitment and retention in stroke clini-
age scholarly publishing, and promote unethical cal trials. Stroke. 2015;46(8):2232–7.
behavior by scientists” [1]. In 2017 Beall’s list of 3. Burlew K, Larios S, Suarez-Morales L, Holmes B,
predatory journals, which had been used as a Venner K, Chavez R. Increasing ethnic minority par-
government standard, was removed from his ticipation in substance abuse clinical trials: lessons
learned in the National Institute on Drug Abuse’s
webpage [38]. However, institutions such as the Clinical Trials Network. Cult Divers Ethn Minor
Yale University Library system continue to rec- Psychol. 2011;7(4):345–56.
ommend it and other similar lists [48]. 4. Butler D.  The dark side of publishing. Nature.
Articles submitted to predatory journals are 2013;495:433–5.
5. Carlson RV, Boyd KM, Webb DJ. The revision of the
published quickly due to the limited or nonexis- declaration of Helsinki: past, present and future. Br J
tent review process [35]. Additionally, published Clin Pharmacol. 2004;57(6):695–713.
articles are often non-indexed despite advertise- 6. Centers for Disease Control and Prevention. The
ments to the contrary [9]. Non-indexed articles Tuskegee timeline [Internet]. U.S.  Public Health
Service Syphilis Study at Tuskegee. 2017 [cited 2017
cannot be retrieved through an online search [35]. Oct 21]. https://www.cdc.gov/tuskegee/timeline.htm.
Human behavioral scientists in Poland aimed 7. Chen MS, Lara PN, Dang JHT, Paterniti DA, Kelly
to shed light on the issue of predatory journals K. Twenty years post-NIH Revitalization Act: enhanc-
through a systematic study in which they created ing minority participation in clinical trials (EMPaCT):
laying the groundwork for improving minority clini-
a profile for an imaginary scientist and applied cal trial accrual: renewing the case for enhancing
for editorial positions at 360 journals [37]. False minority participation in cancer clinical trials. Cancer.
online accounts, journal and book publications, 2014;120(Suppl 7):1091–6.
and faculty positions—none of which could be 8. Clarivate Analytics. Journal Impact Factor
[Internet]. InCites Help. [cited 2017 Dec 19]. http://
verified—were compiled into the fake applica- ipscience-help.thomsonreuters.com/inCites2Live/
tion. Of the 360 editorial positions to which the indicatorsGroup/aboutHandbook/usingCitationIndi-
fake application was submitted, 120 were for catorsWisely/jif.html.
journals indexed by Journal Citation Reports 9. Clark J. How to avoid predatory journals—a five
point plan. BMJ Opin. 2015. https://blogs.bmj.com/
(JCR), 120 were listed on the Directory of Open bmj/2015/01/19/jocalyn-clark-how-to-avoid-preda-
Access Journals (DOAJ), and 120 were included tory-journals-a-five-point-plan/.
in Beall’s list of predatory journals. Acceptances 10.
104th Congress. Health Insurance Portability
came from 40 journals included on Beall’s list and Accountability Act of 1996 [Internet].
U.S.  Department of Health and Human Services.
and 8 journals listed on the DOAJ [37]. Fittingly, 1996 [cited 2017 Oct 21]. https://aspe.hhs.gov/report/
the fake editor’s name was Dr. Anna O. Szust— health-insurance-portability-and-accountability-
similar to Oszust, the polish word for “fraud.” All act-1996.
offers for editorial positions were declined [37]. 11. Corbie-Smith G.  The continuing legacy of the

Tuskegee Syphilis Study: considerations for clinical
investigation. Am J Med Sci. 1999;317(1):5–8.
Take-Home Message 12. Deer B.  How the vaccine crisis was meant to make
• Ethical considerations are important at many money. BMJ. 2011;342:c5258.
13. Deer B.  Revealed: secret payments to MMR doctor
levels and processes in clinical research. Wakefield at heart of vaccine crusade. 2006. brian-
• Investigators, ethical review board members, deer.com.
publishers, and peer reviewers all contribute 14. Durant RW, Wenzel JA, Scarinci IC, Paterniti DA,
responsibility to maintaining high ethical Fouad MN, Hurd TC, et  al. Perspectives on barri-
ers and facilitators to minority recruitment for clini-
standards in clinical research.
54 N. Roselaar et al.

cal trials among cancer center leaders, investigators, 32. Rockwell DH, Yobs AR, Moore B Jr. The Tuskegee
research staff, and referring clinicians: enhancing study of untreated syphilis: the 30th year of observa-
minority participation in clinical trials (EMPaCT). tion. Arch Intern Med. 1964;114(6):792–298.
Cancer. 2014;120(Suppl 7):1097–105. 33. Santiago CD, Miranda J. Progress in improving mental
15.
Eggertson L.  Lancet retracts 12-year-old arti- health services for racial-ethnic minority groups: a ten-
cle linking autism to MMR vaccines. CMAJ. year perspective. Psychiatr Serv. 2014;65(2):180–5.
2010;182(4):e199–200. 34. Shuster E.  Fifty years later: the significance of the
16. Egleston BL, Pedraza O, Wong YN, Dunbrack RL, Nuremberg code. N Engl J Med. 1997;337:1436–40.
Griffin CL, Ross EA, et al. Characteristics of clinical 35. Shyam A. Predatory journals: what are they? J Orthop
trials that require participants to be fluent in English. Case Rep. 2015;5(4):1–2.
Clin Trials. 2015;12(6):618–26. 36. Siebelt M, Siebelt T, Pilot P, Bloem RM, Bhandari
17. Frandsen TF. Journal self-citations -analysing the JIF M, Poolman RW.  Citation analysis of orthopaedic
mechanism. J Informetr. 2007;1(1):47–58. literature; 18 major orthopaedic journals compared
18. Garfield E. Journal self-citation in the Journal Citation for impact factor and SCImago. BMC Musculoskelet
Reports – Science Edition [Intranet]. Clarivate Disord. 2010;11(4):1–7.
Analytics. 2002 [cited 2018 Nov 13]. https://clarivate. 37. Sorokowski P, Kulczycki E, Sorokowska A, Pisanski
com/essays/journal-self-citation-jcr/. K.  Predatory journals recruit fake editor. Nature.
19. George SL, Buyse M.  Data fraud in clinical trials. 2017;543:481–3.
Clin Invest (Lond). 2015;5(2):161–73. 38. Strielkowski W.  Predatory journals: Beall’s list is

20. Hakkalamani S, Rawal A, Hennessy MS, Parkinson missed. Nature. 2017;544:416.
RW. The impact factor of seven orthopaedic journals. 39. Thomas DR, Salmon RL, King J.  Rates of first

J Bone Joint Surg [Br]. 2006;88(2):159–62. measles-­mumps-rubella immunisation in Wales (UK).
21. Horton R, Murch S, Walker-Smith J, Wakefield A, Lancet. 1998;351:1927.
Hodgson H. A statement by the editors of the lancet. 40. U.S.  Department of Health and Human Services.

Lancet. 2004;363:P820–1. Code of Federal Regulations Title 45 Part 46
22.
Hudson KL, Collins FS.  Bringing the com- [Internet]. Office for Human Research Protections.
mon rule into the 21st century. N Engl J Med. 2009 [cited 2017 Oct 21]. https://www.hhs.gov/ohrp/
2015;373(24):2293–6. regulations-and-policy/regulations/45-cfr-46/index.
23. ICMJE. Recommendations for the conduct, reporting, html#subparta.
editing, and publication of scholarly work in medical 41. U.S.  Department of Health and Human Services.

journals. 2017. HIPAA for Professionals [Internet]. Office for Civil
24. McGarry ME, McColley SA.  Minorities are under- Rights. 2017 [cited 2017 Oct 21]. https://www.hhs.
represented in clinical trials of pharmaceutical gov/hipaa/for-professionals/index.html.
agents for cystic fibrosis. Ann Am Thorac Soc. 42. U.S.  Department of Health and Human Services.

2016;13(10):1721–5. NPRM for revisions to the common rule. Federal
25. Murch S, Anthony A, Casson D, Malik M, Mark
Register. 2015.
B, Dhillon A, et  al. Retraction of an interpretation. 43. U.S.  Department of Health and Human Services.

Lancet. 2004;363:750. Summary of the HIPAA Privacy Rule [Internet].
26. Muthuswamy V.  The new 2013 seventh version of Office for Civil Rights. 2013 [cited 2017 Oct 21].
the declaration of Helsinki—more old wine in a new https://www.hhs.gov/hipaa/for-professionals/privacy/
bottle? Indian J Med Ethics. 2014;11(1):2–4. laws-regulations/index.html.
27. National Institute of Environmental Health Sciences. 44. U.S.  Department of Health and Human Services.

Institutional Review Board [Internet]. National Summary of the HIPAA Security Rule [Internet].
Institutes of Health. 2015 [cited 2017 Oct 21]. https:// Office for Civil Rights. 2013 [cited 2017 Oct 21].
www.niehs.nih.gov/about/boards/irb/index.cfm. https://www.hhs.gov/hipaa/for-professionals/secu-
28. National Institutes of Health. NIH policy and guide- rity/laws-regulations/index.html.
lines on the inclusion of women and minorities as 45. Wakefield A, Murch S, Anthony A, Linnell J, Casson
subjects in clinical research [Internet]. Office of D, Malik M, et  al. RETRACTED: ileal-lymphoid-­
Extramural Research. 2017 [cited 2017 Dec 12]. nodular hyperplasia, non-specific colitis, and per-
https://grants.nih.gov/grants/funding/women_min/ vasive developmental disorder in children. Lancet.
guidelines.htm. 1998;351:P637–41.
29. Ness RB.  Influence of the HIPAA privacy rule on 46. World Medical Association. WMA Declaration of

Health Research. JAMA. 2007;298(18):2164–70. Helsinki—ethical principles for medical research
30. Oh SS, Galanter J, Thakur N, Pino-Yanes M, Barcelo involving human subjects. Helsinki; 1964.
NE, White MJ, et al. Diversity in clinical and biomed- 47. World Medical Organization. Declaration of Helsinki
ical research: a promise yet to be fulfilled. PLoS Med. (1964). Br Med J. 1996;313(7070):1448–9.
2015;12(12):e1001918. 48. Yale University Library. Choosing a journal for publi-
31. Rao T, Andrade C.  The MMR vaccine and autism: cation of an article: list of suspicious journals and pub-
sensation, refutation, retraction, and fraud. Indian J lishers [Internet]. 2018 [cited 2017 Dec 19]. https://
Psychiatry. 2011;53(2):95–6. guides.library.yale.edu/c.php?g=296124&p=1973764.
Part II
How to Get Started with Clinical Research?
How to Get Started: From Idea
to Research Question
7
Lachlan M. Batty, Timothy Lording,
and Eugene T. Ek

7.1 Research Questions orthopedic principles. It is important for these


and the Decisions We Make decisions to be continually challenged and objec-
Every Day tively evaluated in the clinical environment to
account for known and unknown variables.
In trying to achieve the best possible outcomes Testing these advances and reforms as structured
for patients, orthopedic surgeons are faced with and answerable research questions is a funda-
multiple decisions every day. Evidence-based mental initial step in the process of conducting
medicine (EBM), as described by Sackett, is “the clinical research. With this approach the surgeon-­
conscientious, explicit and judicious use of cur- scientist uses scientific principles to guide prac-
rent best evidence in making decisions about the tice and practice to guide science with the
care of individual patients” [17]. It is a systematic ultimate aim of optimizing patient outcomes.
approach to optimize patient care and dictates
that our practice as surgeons is guided by data to
support each of these decisions. 7.2 The Importance
Orthopedic surgeons are an inherently inquisi- of a Research Question
tive group, making observations, continually
refining practice, and adopting novel and innova- Developing a clinically relevant, structured, and
tive technologies and techniques. In this context, answerable research question that addresses a
many decisions made by surgeons are based on a knowledge gap is a prerequisite in conducting
limited evidence base or by extrapolation of clinical research. The importance of the research
question cannot be understated as it can impact the
L. M. Batty (*) design, length, cost, and feasibility of the study.
Department of Orthopaedics, St Vincent’s Hospital The research question is critical in determining
Melbourne, Melbourne, VIC, Australia inclusion and exclusion criteria, ultimately impact-
Department of Orthopaedics, Western Health, ing the extrinsic validity of the study. For example,
Melbourne, VIC, Australia if a study investigates a group of patients at high
T. Lording risk for a certain condition, subsequent extrapola-
Melbourne Orthopaedic Group, tion to a wider population may not be appropriate.
Melbourne, VIC, Australia
The dependent and independent variables are also
E. T. Ek shaped early on by the research question. Hulley
Melbourne Orthopaedic Group,
Melbourne, VIC, Australia et al. [10] describe the “FINER criteria” to aid in
the development of a research question. By these
Department of Surgery, Monash Medical Centre,
Monash University, Melbourne, VIC, Australia criteria, a proposed study should be Feasible,

© ISAKOS 2019 57
V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_7
58 L. M. Batty et al.

Interesting, Novel, Ethical, and Relevant. Each of • What are the risk factors for infection after
these elements should be met when developing an shoulder arthroscopy? [3]
idea into a research question as failure to recog-
nize any one element may impede progression of Studies investigating the utility of diagnostic
any investigation. modalities of a condition or pathological process
may include evaluation of clinical examination
techniques and biochemical, hematological,
7.3 Types of Research Questions pathological, or radiological investigations.
Some examples include:
Research questions can take many forms; how-
ever, many authors divide them into four main • Following medial opening wedge high tibial
categories [9, 11]. These pertain to (1) the safety osteotomy, does high-resolution computer
and efficacy of an intervention, (2) the etiology of tomography have higher sensitivity in diagno-
a pathological process, (3) the diagnosis of a con- sis of lateral hinge fracture compared to plain
dition or pathological process, or (4) the progno- radiography? [13]
sis of a condition. • What are the sensitivity and specificity of the
Investigating an “intervention” may include Lachman, anterior drawer, and pivot shift tests
an operation, modification to an operation, a for clinical diagnosis of anterior cruciate liga-
medication, or other forms of treatment, such as a ment rupture? [2]
rehabilitation program. Examples of research • Can preoperative magnetic resonance imaging
questions assessing interventions include: predict irreparable rotator cuff tears? [12]

• Does combined lateral extra-articular tenode- Prognostication is important for allowing


sis reduce graft rupture rates compared with communication of expectations to the patient
isolated intraarticular ACL reconstruction in regarding outcomes, whether it be following non-­
high-risk patients? [7] operative or operative intervention. In some
• Do selective COX-2 inhibitors affect pain cases, these are similar to etiological investiga-
control and healing after arthroscopic rotator tions if the outcome of interest is a secondary
cuff repair? [15] pathology. For example:
• Does operative management of acute Achilles
tendon rupture result in a reduced re-rupture • What is the midterm risk for rotator cuff tear
rate compared to non-operative management arthropathy progression in patients with an
with an accelerated rehabilitation program? [21] asymptomatic rotator cuff tear? [4]
• What are the return-to-sport rates after ante-
Investigating the etiology of a pathological rior cruciate ligament reconstructive
process aims to determine the causes or risk fac- ­surgery? [1]
tors for a pathological process. Potential causes • Is anterior cruciate ligament reconstruction
are multiple, including traumatic, degenerative, effective in preventing secondary meniscal
and genetic. Unlike investigating an intervention, tears and osteoarthritis? [18]
factors investigated in etiological studies are not
typically controlled by the surgeon, and therefore
study designs generally differ. Examples of 7.4 Identifying Knowledge Gaps
research questions that investigate the etiology of
a condition include: When there are insufficient or inadequate data to
adequately guide the clinical decision-making
• Is ACL rupture a risk factor for development process, there is a “knowledge gap.” When devel-
of osteoarthritis of the knee in male soccer oping an idea into a research question, it must
players? [20] firstly be confirmed that the question has not pre-
7  How to Get Started: From Idea to Research Question 59

viously been answered satisfactorily or entirely. 7.5 The PICO Approach


This confirms a knowledge gap. Before embark-
ing on a lengthy and time-consuming investiga- The PICO (Patient, Intervention, Comparison,
tion, it is important for researchers to evaluate the Outcome) approach is a well-accepted method of
existing literature. The depth of a literature review developing a clinical research question [16, 19].
can vary, ranging from an ad hoc database search The structured approach guides consideration of
and narrative review to undertaking a structured important elements of the research question indi-
systematic review or meta-analysis of a particular vidually. The framing of research questions using
topic, which is in itself a type of publishable study this format has been shown to be independently
addressed in a later chapter in this book. associated with better overall reporting quality in
Inherently, a systematic approach to reviewing randomized trials [16]. The key is specificity in
the literature will lead to a more complete and definitions. Some authors add the element of
thorough evaluation of the available data and Time or Timeframe to the acronym making
therefore guide the prospective researcher’s ques- PICOT [16]. This approach lends itself particu-
tions to greater depth and completion. Often, this larly well to investigating the safety and efficacy
goes beyond what is necessary in preparing a of an intervention; however, similar principles
research question however, and indeed, to under- can be applied when developing other types of
take a systematic review, one must start out with research questions.
a research question to begin with in order to hone
relevant articles on the subject.
Often, a research question will gestate from a 7.5.1 Patient
clinical experience or dilemma, and the aspiring
researcher will approach the relevant literature to The patient population reflects the inclusion and
see what information already exists. If it does not exclusion criteria of a subsequent study. This
already exist, then a knowledge gap can be estab- impacts the extrinsic validity of the study (i.e.,
lished. Even if studies do exist for a particular who the study results can be applied to) and will
research question, there may still be substantial affect recruitment rates, event rates, and ethics
knowledge gaps or conflict among reported applications. For example, liberal inclusion crite-
results. The researcher should consider whether ria may accelerate recruitment rates, whereas
the studies could be improved upon, for example, focusing on a high-risk population for a condi-
with greater patient numbers or more controlled tion may increase the event rate. Defining terms
methodology and reporting of outcomes. such as “high risk” is important and can be eluci-
One example, of many, that highlights the role dated from the initial literature review identifying
of systematic review in this setting, is the ongoing the knowledge gap. Inclusion of minors may
STABILITY study [7, 8]. Such rigorous reviews have implications for ethics approval or informed
of existing literature may also be beneficial in consent process.
obtaining funding for a research project and are Factors to consider in defining the patient
frequently of publishable quality as systematic group may include demographic factors (e.g.,
reviews. Identification of planned and ongoing age, sex), skeletal maturity (skeletally immature
investigations is also important within systematic or mature), nature of the intervention (revision
reviews and meta-analyses. With recent require- or primary procedure), and factors that may
ments for investigators to prospectively register impact on treatment outcome such as past medi-
clinical trials, searching trial registry databases cal history and other conditions (e.g., ligamen-
[6] is a way of identifying these. PRISMA tous laxity).
(Preferred Reporting Items for Systematic The patient population may be further character-
Reviews and Meta-Analyses) provides evidence- ized by the pathological process. In an investigation
based “gold-standard” guidelines for reporting in assessing the efficacy of meniscal root repair, for
systematic reviews and meta-analyses. example, patients with traumatic and degenerative
60 L. M. Batty et al.

tears may be considered differently. Investigating 7.5.3 Comparison


the efficacy of a new biological agent for chondral
restoration may not be applicable to all patients with The comparison should be described in the same
knee degeneration. Considering the underlying level of detail as the intervention. The compara-
diagnoses of osteoarthritis, inflammatory or post- tor may be a placebo in what is known as a
traumatic arthritis may be warranted as these have “placebo-­ controlled” study, or the ­ comparator
different underlying disease processes. may be an alternative intervention in what is
In a randomized control trial, the randomiza- known as a “head-to-head” study. An example of
tion process aims to evenly distribute known and the latter might include comparing differing
unknown variables between treatment groups, operative management and operative versus non-­
eliminating confounding; however, defining the operative management, such as physiotherapy,
patient population sets the boundaries for who is rehabilitation, or bracing.
included in this process and therefore who the Sometimes, the comparator can be an
outcomes can be applied to. ­alternative patient population in itself, where
the intervention is kept constant, but the
patient population differs. An example of such
7.5.2 Intervention a study might be the assessment of rotator cuff
repair failure rates between patients under
The intervention to be investigated should be 70  years of age compared to adults over
defined in detail, adhering to the scientific prin- 70 years of age.
ciple of reproducibility. The aim is to allow oth-
ers to reproduce the investigation or apply the
intervention to their own patients. Required
details in the descriptions of an intervention will 7.5.4 Outcome
depend if it is a procedure, medication, or treat-
ment program. In terms of a procedure, it should The outcome of the trial represents the depen-
include a detailed description of the pre-, intra-, dent variable in the study. There can be more
and postoperative course. Specifics may include than one outcome; however, there should be at
things like the position of the limb for graft ten- least one primary outcome. This can take a
sioning and rehabilitation protocols. Detailing number of forms including radiological, clini-
who can perform the procedure may also be cal, or patient-­ reported outcome measures.
important; procedures performed by both resi- Terms such as “failure” need to be clearly
dents and attending surgeons may add a con- defined; does this constitute a revision proce-
founding variable that requires consideration dure, graft rupture, ongoing symptoms, or
when evaluating the intervention. On the other something less tangible, such as patient dissat-
hand, this may more accurately reflect real-world isfaction? The method for diagnosis or diagnos-
applicability and is a good example of the impor- tic criteria for the defined outcome should be
tance of defining what the researcher wishes to detailed (i.e., clinical examination, radiological
achieve in asking their research question. In assessment, or other assessment methods with
terms of medications, specific details must also strict definitions as to what defines a positive
be listed. For example, an investigation into result). A time frame should also be included in
platelet-rich plasma (PRP) should clearly detail the research question; for example, investigat-
the preparation of the formulation including if it ing anterior cruciate ligament failure rates will
is leukocyte rich or poor and the number of and yield different results at 6 months compared to
interval between injections. 6 years.
7  How to Get Started: From Idea to Research Question 61

7.6 Hypothesis Testing


not combining extra-articular tenodesis
Following development of a research question, reduces the risk of graft failure in a high-­
hypotheses are defined that the authors can aim to risk population [14].
accept or reject, following conduction of a rigor- PICO Approach
ous investigation.
Hypothesis should be made prior to the start • Patient
of the investigation. The null hypothesis provides –– Skeletally mature male and female
that there is no difference between the groups and patients aged 14–25 with an ACL
the intervention is not superior to the control con- deficient knee. High risk for graft
dition. A hypothesis can be one-sided or two-­ rupture was defined as two or more
sided. A one-sided hypothesis states a direction of competitive pivoting sports, grade
associated with a difference between groups [5]. two pivot shift or greater, and gener-
For example, a one-sided hypothesis may state alized ligament laxity (Beighton
that patient-reported outcome scores are higher score of four or greater). Exclusion
in patients undergoing hamstring ACL recon- criteria include previous ACL recon-
struction compared to those having patella ten- struction on either knee, multi-­
don graft. A two-sided hypothesis would be that ligament injury (two or more
there is a difference in patient-reported outcomes ligaments requiring surgical atten-
between hamstring and patella tendon ACL tion), symptomatic articular cartilage
reconstruction. In this later case, the patient-­ defect requiring treatment other than
reported outcome measures in the hamstring debridement, greater than 3° of
group could be either higher or lower. asymmetric varus, or inability to
complete outcome questionnaires
• Intervention
Clinical Vignettes/Case Examples –– Lateral extra-articular tenodesis in
Case Example: The STABILITY Study addition to standard ACL reconstruc-
Clinical Scenario tion as described in the Comparison
Graft re-rupture following ACL recon- section. The LET is a modified
struction remains a concern, especially in Lemaire procedure where a 1-cm-
young patients returning to sports. wide  ×  8-cm-­long strip of iliotibial
Augmentation of ACL reconstruction with band is fashioned, leaving Gerdy’s
a lateral extra-articular tenodesis (LET) tubercle attachment intact. The graft
procedure is one potential strategy to is tunneled under the fibular collat-
reduce the failure rate. eral ligament (FCL) and attached to
Identification of the Knowledge Gap the femur with a staple distal to the
A review of the literature using a sys- intermuscular septum and just proxi-
tematic approach confirmed a statistically mal to the femoral insertion of the
significant reduction in pivot shift in favor FCL. Fixation is performed with the
of the combined lateral extra-articular knee at 70° flexion, neutral rotation.
tenodesis in conjunction with ACL recon- Minimal tension is applied to the
struction [8]. The available data was incon- graft. The free end is then looped
clusive, however, due to inadequate internal back onto itself and sutured using the
validity, sample size, methodological con- No. 1 Vicryl
sistency, and variable standardization of • Comparison
protocols and outcomes. The investigators –– Anatomic ACL reconstruction
designed a study to investigate whether or using a four-strand autologous ham-
62 L. M. Batty et al.

string graft. If the diameter of the Fact Box 7.2: PICOT Criteria
graft is found to be less than P Patient/ • What specific patient
7.5 mm, semitendinosus will be tri- population population are you interested
pled (five-strand graft) providing a in?
I Intervention • What is your investigational
greater graft diameter. Femoral tun-
intervention?
nels will be drilled using an antero- C Comparison • What is the main alternative
medial portal technique, with to compare to the
suspensory femoral fixation. Tibial intervention?
fixation will be provided by inter- O Outcome • What do you intend to
accomplish, measure,
ference screw improve, or affect?
• Outcome T Time • What is the appropriate
–– Primary: graft failure at 24  months. follow-up time to assess the
This is defined as symptomatic insta- outcome?
bility requiring revision ACL surgery Reproduced with permission from: Farrugia,
or a positive pivot shift or asymmet- P., et al., Research questions, hypotheses and
rical pivot shift greater than other objectives. Canadian Journal of Surgery,
contralateral sides 2010. 53(4): p. 278
–– Secondary: patient-reported, radio-
logical, and clinical examination
findings are listed

Take-Home Messages
• A structured and answerable research ques-
tion is an essential prerequisite to the develop-
ment of a research project.
Fact Box 7.1: FINER Criteria for a Good • Confirmation of a knowledge gap is the initial
Research Question step in the process from idea to research ques-
F Feasible •  Adequate number of subjects tion and requires reviewing the available lit-
•  Adequate technical expertise erature and ongoing research.
•  Affordable in time and money • The PICO approach provides a basic structure
•  Manageable in scope
for a well-formed clinical question.
I Interesting • Getting the answer intrigues
investigator, peers, and • Attention to detail and clear definitions are
community required for each element of the PICO model.
N Novel • Confirms, refutes, or extends • Time spent developing a well-considered
previous findings research question before commencing the
E Ethical • Amenable to a study that
institutional review board will
research trial can prevent problems down the
approve track.
R Relevant •  To scientific knowledge
•  To clinical and health policy
•  To future research 7.7 Resources/Websites
Reproduced with permission from: Farrugia,
P., et al., Research questions, hypotheses and https://clinicaltrials.gov—a database of privately
objectives. Canadian Journal of Surgery, and publicly funded clinical studies conducted
2010. 53(4): p. 278 around the world.
7  How to Get Started: From Idea to Research Question 63

References 12. Kim JY, Park JS, Rhee YG.  Can preoperative mag-
netic resonance imaging predict the reparability
of massive rotator cuff tears? Am J Sports Med.
1. Ardern CL, Taylor NF, Feller JA, Webster KE. Return-­ 2017;45(7):1654–63.
to-­sport outcomes at 2 to 7 years after anterior cruciate 13. Lee O-S, Lee YS.  Diagnostic value of computed
ligament reconstruction surgery. Am J Sports Med. tomography and risk factors for lateral hinge fracture
2012;40(1):41–8. in the open wedge high tibial osteotomy. Arthroscopy.
2. Benjaminse A, Gokeler A, van der Schans CP. Clinical 2018, 34(4):1032–43.
diagnosis of an anterior cruciate ligament rup- 14. Multicenter randomized clinical trial comparing ante-
ture: a meta-analysis. J Orthop Sports Phys Ther. rior cruciate ligament reconstruction with and without
2006;36(5):267–88. lateral extra-articular tenodesis in individuals who are
3. Cancienne JM, Brockmeier SF, Carson EW, Werner at high risk of graft failure 2018. https://clinicaltrials.
BC. Risk factors for infection after shoulder arthros- gov/ct2/show/NCT02018354.
copy in a large medicare population. Am J Sports 15. Oh JH, Seo HJ, Lee Y-H, Choi H-Y, Joung HY, Kim
Med. 2018;46(4):809–14. 0363546517749212. SH. Do selective COX-2 inhibitors affect pain control
4. Chalmers PN, Salazar DH, Steger-May K, and healing after arthroscopic rotator cuff repair? A
Chamberlain AM, Stobbs-Cucchi G, Yamaguchi K, preliminary study. Am J Sports Med. 2018;46(3):679–
et  al. Radiographic progression of arthritic changes 86. 0363546517744219.
in shoulders with degenerative rotator cuff tears. J 16. Rios LP, Ye C, Thabane L. Association between fram-
Shoulder Elb Surg. 2016;25(11):1749–55. ing of the research question using the PICOT format
5. Farrugia P, Petrisor BA, Farrokhyar F, Bhandari and reporting quality of randomized controlled trials.
M.  Research questions, hypotheses and objectives. BMC Med Res Methodol. 2010;10(1):11.
Can J Surg. 2010;53(4):278. 17. Sackett DL, Rosenberg WM, Gray JM, Haynes RB,
6. Health UNIo. ClinicalTrials.gov. 2012. Richardson WS. Evidence based medicine: what it is
7. Hewison CE. STABILITY Study: a multicentre RCT and what it isn’t. BMJ. 1996;312(7023):71–2.
comparing ACL reconstruction with and without lat- 18. Sanders TL, Kremers HM, Bryan AJ, Fruth KM,
eral extra-articular tenodesis for individuals at high Larson DR, Pareek A, et al. Is anterior cruciate liga-
risk of graft failure. Electronic Thesis and Dissertation ment reconstruction effective in preventing secondary
Repository, 3199. 2015. https://ir.lib.uwo.ca/etd/3199. meniscal tears and osteoarthritis? Am J Sports Med.
8. Hewison CE, Tran MN, Kaniki N, Remtulla A, Bryant 2016;44(7):1699–707.
D, Getgood AM.  Lateral extra-articular tenodesis 19. Thabane L, Thomas T, Ye C, Paul J.  Posing the
reduces rotational laxity when combined with ante- research question: not so simple. Can J Anesth. 2009;
rior cruciate ligament reconstruction: a systematic 56(1):71.
review of the literature. Arthroscopy. 2015;31(10): 20. Von Porat A, Roos E, Roos H.  High prevalence of
2022–34. osteoarthritis 14 years after an anterior cruciate liga-
9. Hoffmann T, Bennett S, Del Mar C.  Evidence-­ ment tear in male soccer players: a study of radio-
based practice across the health professions-e-Book. graphic and patient relevant outcomes. Ann Rheum
Amsterdam: Elsevier Health Sciences; 2013. Dis. 2004;63(3):269–73.
10. Hulley SB, Cummings SR, Browner WS, Grady
21. Willits K, Amendola A, Bryant D, Mohtadi NG,
DG, Newman TB.  Designing clinical research. Giffin JR, Fowler P, et  al. Operative versus nonop-
Philadelphia: Lippincott Williams & Wilkins; 2013. erative treatment of acute Achilles tendon ruptures: a
11.
Kaura A.  Crash course evidence-based medi- multicenter randomized trial using accelerated func-
cine: reading and writing medical papers-e-Book. tional rehabilitation. JBJS. 2010;92(17):2767–75.
Amsterdam: Elsevier Health Sciences; 2013.
How to Write a Study Protocol
8
Lukas B. Moser and Michael T. Hirschmann

8.1 Introduction facilitate the first steps into the unknown territory
of research [4].
This first step of getting the research idea orga- Secondly, in many countries young orthopedic
nized may be difficult for the beginner. However, trainees may not be prepared with a basic scien-
it is also an important step. In the further course tific skillset at medical school. Specific courses
of the research, the young scientist will find out may be rare and often do not focus enough on the
that writing a brief, concise, and comprehensive challenges of junior researchers [3].
study protocol will save time and problems in the Thirdly, from a young researcher’s perspec-
course of the study. If done properly it is defi- tive, it appears to be a waste of time to spend their
nitely one of the most rewarding parts of the valuable time on preparation of a study protocol.
research project as it facilitates later execution of It is the time of publish or perish in which the
the study and the subsequent writing process (see young researcher is driven by the ultimate goal to
Chapter 11.9). get the study done, to present at a conference and
Many junior researchers have difficulties finally get it published.
starting their first research project. This might be Fourthly, there is always a next conference and
due to a number of reasons. submission deadline to meet. This might mislead
Firstly, most of the scientific project ideas young researchers to gather overhasty data in
evolve from problems more senior orthopedic order to write their abstract in time. Please do not
surgeons encounter during their daily practice. feel rushed by upcoming deadlines. This inevita-
The seniors pass the study idea over to the bly decreases the quality of the scientific work,
younger researchers, who often do not fully and the abstract might even get rejected [3].
understand it at first sight as they do not have the Clearly, there is no alternative to writing a
necessary broad and deep orthopedic knowledge. proper study protocol. Therefore, writing a good
Hence, good mentorship from senior orthopedic study protocol is time well spent facilitating your
researchers is of utmost importance and will further study. It is also mandatory for ethical
committee or grant application.
L. B. Moser · M. T. Hirschmann (*)
Department of Orthopaedic Surgery and
Traumatology, Kantonsspital Baselland
(Bruderholz, Liestal, Laufen),
8.2 What Is a Study Protocol?
Bruderholz, Switzerland
A study protocol is the central documentation of
University of Basel, Basel, Switzerland
e-mail: michael.hirschmann@ksbl.ch; the research project including scientific, ethical,
michael.hirschmann@unibas.ch and regulatory considerations. It should provide
© ISAKOS 2019 65
V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_8
66 L. B. Moser and M. T. Hirschmann

detailed information on planning and execution Every research protocol follows a similar
of the research project. The study protocol serves basic structure. However, it is important to know
as a comprehensive guide and also represents the that the structure of a research protocol might
main document for external evaluation of the vary from one to another study. This is due to the
study (e.g., ethical committee, grant authorities). fact that study protocols written for ethical com-
However, the purpose of the study protocol is mittee approval might be slightly different than
to give a concise description of the study idea, protocols written for grant authorities. It is also
plan, and further analysis. The writing style mandatory to follow exactly the instructions
should be brief and concise. It needs to be easy to given by the grant authority or local ethical com-
understand even for laypersons without a medical mittee [9].
background. Generally, the study protocol can be divided
Think of a study protocol as a recipe, which into the following parts:
should enable the cook/reader to prepare an iden-
tical meal [9]. • Title page
• Background
• Study objectives (aims)
• Hypothesis
As a study is often the result of an interdis- • Material and methods
ciplinary collaboration and involves a vari- –– Study design
ety of different professions such as –– Subjects
clinicians, scientists, or statisticians, all –– Sample size
groups should be involved at this early –– Study procedures
stage. Significant input from all groups –– Outcome instruments
involved is necessary. –– Data collection
• Data management
• Statistical analysis
• Ethical considerations
A well-written study protocol is necessary for: • Time points and timeline
• Conflict of interest
• Application for ethical approval at the local
• Insurance
ethical committee.
• Reference section
• Application for a scientific grant at grant
• Annexes
authority.
• It helps to structure and define the study idea To help getting started, the researcher should
in the discussion with the collaborators. consider the following questions to be answered:
• It prompts rethinking the study plan and
reveals possible problems and obstacles at an • What is the clinical problem to be addressed
early stage. by this study?
• Clearly defines the responsibility of each • What is already known about the problem and
researcher involved. topic?
• Defines a budget and funding plan for the • What is the study design in the study?
study. • What are the inclusion and exclusion criteria?
• Defines a clear timeline for each step of the • What study subjects are investigated?
study. • What are the outcome instruments chosen?
• Helps to monitor the milestones and study • What are the primary or secondary endpoints
progress. of the study?
• Facilitates writing of the later scientific • What are the interventions?
article [14]. • What is the experimental setup?
8  How to Write a Study Protocol 67

• How is data collection organized? ence is really needed to lead the reader to the
• How is the data processed and analyzed? purpose of the study. It also helps to stay
• What are the statistical methods? focused [14].
• Are there any ethical problems to be It is of utmost importance to clarify why the
considered? researcher is conducting the study. The researcher
• What is the timeline? should make the reader understand why the
• What milestones are set? research project is planned and what the original
• How is funding organized [22]? study idea is. As this requires a good understand-
ing of the current knowledge in this field, a sys-
Having specific answers to the aforemen- tematic review of the current literature should be
tioned questions helps to fill every part of the performed. It facilitates the summary of current
study protocol. Hence, we can move on and get it evidence and allows to put the study idea into the
started step by step. broader scientific context [16]. The red thread of
the background section is to start from the eagle’s
perspective and come closer to the topic by every
8.3 Title Page sentence written [9].
Defining the research question is definitely the
The title page contains the most important infor- heart of the protocol. The research question pro-
mation at a single glance. Here, the researcher posed should be able to fill an existing gap of
should give the full title of the study along with the knowledge. Ideally, it should represent the logi-
names and affiliations of all people involved [4]. cal consequence of the background explained.
The project title needs to be wisely chosen as The question asked here should be as precise as
it should closely reflect the content of the study. possible [16].
Keep it brief and concise. The title should not This part decides the fate of the study. Clearly,
pose a question, but summarize the main objec- if the researcher is not able to give a proper study
tive including the study type such as “random- question and explain the hypothesis here, better
ized controlled trial.” It might also mention the reconsider and do not waste the personal resources
subjects to be investigated [14]. and time for this study [14] (Tables 8.1 and 8.2).
The affiliations of each study team member A major pitfall of study protocols is that many
need to be complete. For every member of the do not have a single study question and others
study team, the contact information including just have too many. There might be several study
e-mail address and telephone and fax number
should be given here. Table 8.1  The mnemonic FINER will help the researcher
If applicable state the study sponsor, in the with identification of a good research question and study
case the study is supported by a company or grant idea [7]
authority [4]. F Feasible in terms of recruitment, expertise, funding,
resources, and scope
I Interesting to the scientific community
N Novel, providing new findings and confirming or
8.4 Background rejecting previous findings
E Ethical approval should be possible
The background of the topic should be clearly R Relevant to scientists, clinicians, and future research
described. Meticulously guide the reader into
the topic while avoiding irrelevant informa- Table 8.2 The PICO acronym helps to formulate a
tion. Do not spend more than two pages on the ­testable question [15]
background information. As a rule of thumb, P Patient, problem
also limit the scientific articles for reference I Intervention or exposure
to less than 20–30. Only describe the most C Comparison
important ones here. Always ask if this refer- O Outcome
68 L. B. Moser and M. T. Hirschmann

questions of interest with regard to the research known to be possible causes of patellar overstuff-
project that need to be answered. However, the ing inducing a higher patellar BTU, these possi-
researcher should stay focused on the most ble bias parameters will be calculated and
important research question. Secondary research compared between the two groups.
questions are often of explorative nature, because The secondary purpose is to compare postop-
sample size calculation is based on the primary erative functional results measured at 1 and
research question. Do not overload the project 2 years postoperative.
with too many research questions. The researcher An exact definition of outcomes is mandatory to
will then lose the red threat and the reader will be standardize the outcome measures. Note the time
confused. The researcher also has to consider that of assessment and the unit of measures. If previous
each research question requires a hypothesis. definitions or validations are preferred, do not for-
Defining the research questions and objectives in get to quote the references. If the researcher com-
advance helps the researcher to avoid reporting pares outcomes, (s)he needs to state an overall goal
significant results more likely than negative of the comparison in the protocol. The reader
results (outcome reporting bias) [16]. should understand if the goal was to show superior-
ity, equivalence, or non-­inferiority [16].
Sometimes, outcomes are not easy to deter-
8.5 Study Objectives (Aims) mine. Some outcomes might be subjective (e.g.,
pain); others are more objectively measurable
After performing the literature review with a (e.g., range of motion).
focus on the research question, the researcher Surrogate endpoints are alternative endpoints
needs to define the objectives, outcomes, and that are faster to assess than long-term clinical
hypothesis. A clear definition of these factors is endpoints [8]. The effect of the intervention on
recommended [9] (Table 8.3). the surrogate endpoint has to correspond to the
Choose the study objectives wisely and effect on clinical outcome. However, this effect is
restrictively. If the researcher decides to use more often difficult to predict, and therefore surrogate
objectives, (s)he should distinguish between pri- endpoints should only carefully be used [12].
mary and secondary objectives or outcomes. Furthermore, the researcher needs to identify
Altogether, do not choose more than three to five all potential confounders. Confounders are
objectives [4]. defined as additional factors that distort the effect
Practical Case Example of the treatment or exposure on a predicted out-
A clinical study investigating if the femoral come. However, they have an impact on both, the
prosthetic design in total knee arthroplasty exposure and outcome, while not being part of
(TKA) influences the patellar loading and func- the causal pathway. Confounding factors might
tional scores in TKA with unresurfaced patellae lead to the situation that falsely a correlation
The primary purpose of the study is to investi- between treatment and outcome might be seen or
gate if the femoral shape of the different TKA a true relationship is hidden. Therefore, con-
models, (group P) and (group A), influences the founders should be carefully considered. Ideally,
BTU distribution pattern at SPECT-CT in TKA confounders are equally distributed in random-
with unresurfaced patellae. Since femoral TKA ized controlled studies. Be aware that this is not
malrotation, patellar height, and TT-TG are the case in observational data [18].
Table 8.3  It helps when the researcher defines the objec-
tives of the study with regard to the SMART criteria [6]
Clinical Vignette
S Specific
M Measurable Age being a confounder in a study investi-
A Achievable gating the association of physical activity
R Realistic and knee pain
T Time-related
8  How to Write a Study Protocol 69

identified the subjects or patients, what were the


Younger individuals tend to be more inclusion and exclusion criteria (study popula-
active and have a lower risk of knee pain. If tion), and what is investigated (study procedure).
the proportion of young people being more Explain each step of the study procedure meticu-
active with less pain is not equally distrib- lously. Only then an independent reader is able to
uted, the association between activity and follow and reproduce the study [4].
knee pain might be overestimated [21].
Vavken et al. investigated the consider-
The methods should include the following
ation of confounders in 126 controlled
aspects: design and setting, subjects, sam-
studies published in journals with a high-
ple size, description of the study procedure,
impact factor. Although 16% of the studies
data collection, data management, and
discussed confounding factors, they did not
finally the statistical tests to be used.
adjust their subsequent analysis. Only 1/3
of the authors controlled for confounding
factors [21]. However, confounders need to
be identified in the study protocol. Only 8.8 Study Design
then the later analysis can be adjusted [2].
Precisely define the study type and design. One
can differentiate between research on primary
8.6 Hypothesis and secondary data. Whereas research on pri-
mary data means the researcher is performing the
The hypothesis is based on the supposed relation actual study (e.g., clinical, experimental, and epi-
between variables. State the hypothesis in the demiological studies), research on secondary
null form. The null hypothesis says that there is data describes the process of analyzing studies
no relationship between variables and the that have already been performed (e.g., system-
researcher is going to challenge that statement. It atic review, meta-analysis).
represents the contradictory of what to expect. Chapter 11.9 provides a comprehensive list of
Statistical testing will reveal the probability that all possible study types. Although in practice
the null hypothesis is true or false. The alternative often difficult, the researcher should always aim
hypothesis is recognized when the null hypothe- for the study type providing the best scientific
sis is rejected. It represents the outcome the evidence and quality level.
researcher would expect [14].

8.9 Subjects

8.7 Material and Methods The study subjects or patients included need to be


characterized in detail. Including a flow chart
The methods section needs to explain how the showing the recruitment, screening, inclusion,
hypothesis is tested. It is of utmost importance to and exclusion criteria will help the reader to
find the optimal methodological approach for the understand this process (Fig. 8.1).
study. After a first draft, the researcher often has to The eligibility criteria need to be mentioned
critically revise the methodology considering along with all inclusion and exclusion criteria.
input from all collaborators [9]. An exact and pre- Inclusion criteria determine which subjects or
cise description of the methods used is the most patients are going to be included into the study.
important part of the study protocol. At the begin- All other criteria limiting eligibility are consid-
ning state the study design and then continue with ered as exclusion criteria [19].
the description of the study subjects. The reader Explain how and why the researcher decided
must be able to understand how the researcher to choose the sample size used in the study. This
70 L. B. Moser and M. T. Hirschmann

Does subject 8.11 Study Procedures


meet eligibility?

Yes
In this section, the researcher should clearly describe
what is going to be done with the s­ ubjects, how, and
Invite subject to
participate in study when. Provide a detailed description of the study
procedure with the planned interventions.
Consent process
successful?
8.12 Data Collection
Yes
Subjects enrolled?
Describe clearly who and how to obtain the data,
Fig. 8.1  Flowchart showing enrollment process
how to record it, and how long it will be stored
[4]. Name the locations where the data is going to
be collected, relevant events regarding recruit-
needs to be based on a sample size calculation.
ments, exposer, or follow-up [20].
Keep in mind that an inclusion of vulnerable pop-
State the data-collecting instruments (e.g.,
ulation (children, cognitively impaired adults)
electronic forms or paper forms). Explain the
requires special justification [5].
validation procedure for the instruments used.
Provide a detailed and unambiguous list of the
For outcome instruments, provide original refer-
eligibility criteria as in the example below:
ences. If the researcher is going to develop a
Enclose a statement which states that the
novel type of data-collecting instruments, as a
participation of the patient is entirely volun-
checklist or a questionnaire, the researcher needs
tary, and an unexplained withdrawal is possible
to provide all information. Do not forget to annex
at any time having no impact on patient man-
the original document to the study protocol.
agement [16].
Practical Case Example
Practical Case Example
This is a retrospective case series study. No
Any subject is entitled to withdraw from this
additional examinations other than the clinical
clinical investigation for any reason without obli-
routine will be conducted. All data is stored in a
gation or prejudice to further treatment.
centralized electronic database at the orthopedic
Declare how the patient consented to partici-
research unit (…). The data has been already col-
pate in the study.
lected in our ethical committee approval registry.
Practical Case Example
The patient shall give his written consent to
participate in the clinical documentation. The 8.13 Data Management
signed consent form remains within the clinical
study case file. The researcher needs to make a comprehensive
statement on the data management including data
entry and monitoring. Describe how the
8.10 Sample Size researcher plans to ensure privacy protection
(e.g., anonymization) and how long the data is
Choosing the appropriate study sample is para- stored. Declare who is able to access the data,
mount for successful research. The study sam- and state that all involved people underline medi-
ple ensures the sufficient power of the study and cal confidentiality. Add the statement by the IEC
might allow conclusive inferences [16]. and regulatory authorities for permitting data
Consider that having a small sample size might access, if applicable [16].
not be powerful enough to prove a difference, Mention how to introduce the data into the
although true difference actually exists. database. Describe what programs the researcher
However, a too large sample size might be a will use and how to perform data analysis. The
waste of resources [14]. best approach is to make a data analysis plan
8  How to Write a Study Protocol 71

where the researcher describes which measure- 8.15 Ethical Considerations


ments were done with which variable and which
statistical tests to apply. Also, record how to deal This part is particularly important for the ethical
with missing data [9]. committee. It will be carefully read; in particular the
ethical risks will be discussed between the members
of the ethical board. Present a risk-­benefit assess-
8.14 Statistical Analysis ment providing information on potential benefits,
risks, or inconveniences for the individual study
A researcher should have basic knowledge of sta- subjects. If necessary, justify the inclusion of vul-
tistics [11]. However, consulting a professional nerable populations. Include a statement that the
statistician might help to improve the quality of researcher conducts the study according to the study
the research methodology. At best the statistician protocol and good clinical practice (GCP) [16].
is already involved in the process of study plan-
ning, as the researcher needs to translate the
research question into a statistical problem. All Keep in mind that any changes of the study
statistical relevant information must be included protocol need to be amended to the ethical
in the sufficient level of details: exploratory or committee and the regulatory authorities.
descriptive statistics, level of significance, type
of outcome, effect measures including confi-
dence intervals, type of sample (unpaired or
paired), data distribution (not normally or nor- Practical Case Example
mally distributed data), and the statistical soft- The study will be performed in accordance
ware used [16]. with the declaration of Helsinki and the direc-
Just naming the tests that were used is not tives for good clinical practice (GCP) standards.
enough. The researcher needs to state why a test Ethical approval will be obtained from local ethi-
was chosen and which test for which purpose [4]. cal committee in accordance with national law.
Practical Case Example
A clinical study investigating if the femoral 8.16 Time Points and Timeline
prosthetic design in total knee arthroplasty
(TKA) influences the patellar loading and func- Creating a table including the time points and
tional scores in TKA with unresurfaced patellae timeline is useful for several reasons. It forces
Descriptive statistics (means, medians, the researcher to set a time frame and think
quartiles, ranges, standard deviations) will be about important deadlines and targets. The
performed to assess the demographics of the reader understands the sequence of events at a
patient population. Alignment and TKA com- single glance, and the timetable shows the
ponent position in all planes (sagittal, coro- reviewer that the project is expected to be com-
nal, and rotational) are noted in degree. Mean pleted in the foreseeable future [13] (Tables 8.4
and absolute relative BTU of corresponding and 8.5).
quadrants of the two groups will be compared
with a t-test.
Nonparametric Spearman’s correlation coeffi- Table 8.4  Practical case example of time points
cients will be used to correlate the patella height, Writing the study protocol January
the lateral tilt, and the components’ alignment Data collection March–
August
measurements with the intensity of the tracer
Data analysis September
uptake in each area of interest. Postoperative Discussion of results among research October
KSS scores of the two groups at 1 and 2  years staff
postoperative will be compared with a t-test. Writing of scientific article November
All data will be analyzed by an independent Submission to peer-reviewed journal December
professional statistician.
72 L. B. Moser and M. T. Hirschmann

Table 8.5  Practical case example of a timeline

Jan Feb Mar Apr May Jun Jul Aug Sep Oc t Nov Dec
Writing the study
protocol
Data collection
Data analysis

Discussion of results
among research staff

Writing of scientific
article

Submission to peer-
reviewed journal

8.17 Conflict of Interest sate serious adverse events (as injuries) the
patients suffer during their participation.
Do not forget to disclose the conflicts of interests, However, rules and requirements of trial insur-
such as nonfinancial or financial relationships ance vary from country to country. Therefore, the
with the industry. If applicable, state the study researcher needs to check the local insurance
sponsor. Declare how the sponsor contributed policy [10].
and what benefits the researcher will obtain by Practical Case Example 1
this agreement [4]. The sponsor will secure and maintain in full
These disclosures facilitate the identification force and effect, throughout the duration of the
of possible sponsor bias. Most journals require a clinical investigation, a clinical trial insurance
disclosure form of all contributing authors. A which was required in line with national regula-
short statement of relevant information will be tions and which will be evidenced by an insur-
listed on the paper in abbreviated form [17]. ance certificate.
Practical Case Example 1 Practical Case Example 2
One of the authors (XY) is a consultant for No subject insurance is required as it is a ret-
(company). rospective study and the patient is not put at any
Practical Case Example 2 risk or has to show up for follow-up.
The study was supported by a financial grant
from (company). The external funding source did
not have any influence on the investigation. 8.19 Reference Section
Practical Case Example 3
The authors of this paper declare no conflict of The reference section should be organized as
interest. in a scientific paper (see Chapter 11.9).
Provide original references, and number them
consecutively or in the order requested. For
8.18 Insurance every statement made, a citation should be
given. It is important that only articles are
The sponsor of the study is obliged to provide cited which really add to the study protocol.
clinical trial insurance. If the researcher is con- There should not be more than 30–40 refer-
ducting a clinical study, (s)he needs to compen- ences in the reference list [4].
8  How to Write a Study Protocol 73

8.20 Annexes 7. Doran PC.  Understanding dentures: instructions to


patients. J Colo Dent Assoc. 1981;59:4–8.
8. Fleming TR, DeMets DL.  Surrogate end points in
In particular for major grant applications, annexes clinical trials: are we being misled? Ann Intern Med.
could be added at the end. These might include 1996;125:605–13.
informed consent sheet, study questionnaire or CRF, 9. Fronteira I. How to design a (good) epidemiological
observational study: epidemiological research proto-
approval of ethical committee if already obtained, col at a glance. Acta Medica Port. 2013;26:731–6.
and CV of the principal and co-­investigator [1]. 10. Ghooi RB, Divekar D. Insurance in clinical research.
Perspect Clin Res. 2014;5:145–50.
Take-Home Message 11. Gilmore SJ.  Evaluating statistics in clinical trials:

making the unintelligible intelligible. Australas J
• Writing a brief and comprehensive study pro- Dermatol. 2008;49:177–84; quiz 185–176.
tocol is always the first step of your research 12.
Huang Y, Gilbert PB.  Comparing biomark-
project. ers as principal surrogate endpoints. Biometrics.
• It facilitates later execution of your study and 2011;67:1442–51.
13. Kanji S.  Turning your research idea into a proposal
the subsequent writing process. worth funding. Can J Hosp Pharm. 2015;68:458–64.
• Make sure that your study protocol includes 14. O’Brien K, Wright J.  How to write a protocol. J

all relevant information on your study idea, Orthod. 2002;29:58–61.
plan, and further analysis. 15. Richardson WS, Wilson MC, Nishikawa J, Hayward
RS. The well-built clinical question: a key to evidence-­
• Furthermore, as a comprehensive guide, it based decisions. ACP J Club. 1995;123:A12–3.
represents the main document for external 16. Rosenthal R, Schafer J, Briel M, Bucher HC, Oertli
evaluation of your study. D, Dell-Kuster S.  How to write a surgical clinical
• The idea of the study protocol is to convey to research protocol: literature review and practical
guide. Am J Surg. 2014;207:299–312.
any individual the purpose of the study and the 17.
Sade RM, Akins CW, Weisel RD.  Managing
probable impact of the project. conflicts of interest. J Thorac Cardiovasc Surg.
2015;149:971–2.
18. Skelly AC, Dettori JR, Brodt ED. Assessing bias: the
importance of considering confounding. Evid Based
References Spine Care J. 2012;3:9–12.
19. Van Spall HG, Toren A, Kiss A, Fowler RA. Eligibility
1. Al-Jundi A, Sakka S.  Protocol writing in clinical criteria of randomized controlled trials published in
research. J Clin Diagn Res. 2016;10:ZE10–3. high-impact general medical journals: a systematic
2. Babyak MA.  What you see may not be what you sampling review. JAMA. 2007;297:1233–40.
get: a brief, nontechnical introduction to overfit- 20. Vandenbroucke JP, von Elm E, Altman DG, Gotzsche
ting in regression-type models. Psychosom Med. PC, Mulrow CD, Pocock SJ, et al. Strengthening the
2004;66:411–21. Reporting of Observational Studies in Epidemiology
3. Bando K, Sato T.  Did you write a protocol before (STROBE): explanation and elaboration. PLoS Med.
starting your project? Gen Thorac Cardiovasc Surg. 2007;4:e297.
2015;63:71–7. 21. Vavken P, Culen G, Dorotka R.  Management

4. Barletta JF.  Conducting a successful residency of confounding in controlled orthopaedic trials:
research project. Am J Pharm Educ. 2008;72:92. a cross-­ sectional study. Clin Orthop Relat Res.
5. Brody BA, McCullough LB, Sharp RR.  Consensus 2008;466:985–9.
and controversy in clinical research ethics. JAMA. 22. Warren MD. Aide-memoire for preparing a protocol.
2005;294:1411–4. Br Med J. 1978;1:1195–6.
6. Doran G.  There’s a S.M.A.R.T. way to write man-
agement’s goals and objectives. Manag Rev.
1981;70:35–6.
The Ethical Approval Process
9
Karren Takamura and Frank Petrigliano

9.1 Important Documents the intervention of any element of


in Biomedical Ethics force, fraud, deceit, duress, over-­
reaching, or other ulterior form of con-
Prior to the Nuremberg Code, issued by the straint or coercion; and should have
Nuremberg Military Tribunal during the sufficient knowledge and comprehen-
Nuremberg trials in 1947 (also known as the sion of the elements of the subject mat-
“Doctors’ trial”), there was no generally accepted ter involved, as to enable him to make
code of ethics for human research. The Nuremberg an understanding and enlightened
Code is a ten-point statement of ethics to prevent decision. This latter element requires
abuse of human research subjects and establishes that, before the acceptance of an affir-
that participation in research must be voluntary mative decision by the experimental
(Fact Box 9.1). While the Nuremberg Code were subject, there should be made known
never formally adopted by any state or interna- to him the nature, duration, and pur-
tional agency, it is considered one of the most pose of the experiment; the method
influential documents in human medical research and means by which it is to be con-
and served as the basis for documents that later ducted; all inconveniences and hazards
followed [5]. reasonably to be expected; and the
effects upon his health or person,
which may possibly come from his
Fact Box 9.1: Nuremberg Code participation in the experiment. The
1. The voluntary consent of the human duty and responsibility for ascertain-
subject is absolutely essential. This ing the quality of the consent rests
means that the person involved should upon each individual who initiates,
have legal capacity to give consent; directs or engages in the experiment. It
should be so situated as to be able to is a personal duty and responsibility
exercise free power of choice, without which may not be delegated to another
with impunity.
2. The experiment should be such as to
yield fruitful results for the good of
K. Takamura · F. Petrigliano (*)
society, unprocurable by other meth-
Department of Orthopaedic Surgery, University of ods or means of study, and not random
California, Los Angeles, USA and unnecessary in nature.
e-mail: KTakamura@mednet.ucla.edu; fpetrigliano@
mednet.ucla.edu
© ISAKOS 2019 75
V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_9
76 K. Takamura and F. Petrigliano

3. The experiment should be so designed j­udgment required of him, that a


and based on the results of animal ­continuation of the experiment is likely
experimentation and a knowledge of to result in injury, disability, or death
the natural history of the disease or to the experimental subject.
other problem under study, that the
anticipated results will justify the per- Reference: reproduced from “Trials of
formance of the experiment. War Criminals before the Nuremberg Military
4. The experiment should be so con- Tribunals under Control Council Law No.
ducted as to avoid all unnecessary 10”, Vol. 2, pp. 181–182. Washington, D.C.:
physical and mental suffering and U.S. Government Printing Office, 1949.
injury.
5. No experiment should be conducted,
where there is an a priori reason to The Declaration of Helsinki was originally
believe that death or disabling injury developed by the World Medical Association
will occur; except, perhaps, in those (WMA) in 1964 in order to establish a set of ethi-
experiments where the experimental cal principles regarding research involving
physicians also serve as subjects. human subjects. This document is widely
6. The degree of risk to be taken should regarded as the cornerstone for medical research
never exceed that determined by the involving human subjects, including identifiable
humanitarian importance of the prob- human material and data [19]. The declaration is
lem to be solved by the experiment. primarily written for physicians, though it is
7. Proper preparations should be made intended as a guideline for the broader research
and adequate facilities provided to pro- community. The fundamental principles of the
tect the experimental subject against Declaration of Helsinki include respect for the
even remote possibilities of injury, dis- individual, right to self-determination, protection
ability, or death. of privacy, and the right to make informed deci-
8. The experiment should be conducted sions [19]. The document proclaims that the phy-
only by scientifically qualified per- sician’s duty is solely to the participant; the
sons. The highest degree of skill and participant’s welfare supersedes the interest and
care should be required through all benefit of science and society [19]. This docu-
stages of the experiment of those ment has undergone multiple revisions, most
who conduct or engage in the recently in 2013, addressing issues more relevant
experiment. to countries with limited resources, such as post-
9. During the course of the experiment, trial access to interventions, compensation and
the human subject should be at lib- treatment for individuals harmed during partici-
erty to bring the experiment to an pation in research, access to clinical trial for
end, if he has reached the physical or underrepresented groups, and need for dissemi-
mental state, where continuation of nation of research results [12, 19].
the experiment seemed to him to be
impossible.
10. During the course of the experiment, 9.2 History of Biomedical
the scientist in charge must be pre- Human Research
pared to terminate the experiment at in the United States
any stage, if he has probable cause to
believe, in the exercise of the good In the United States, the National Commission for
faith, superior skill and careful the Protection of Human Subjects of Biomedical
and Behavioral Research was created by the
9  The Ethical Approval Process 77

National Research Act of 1974 to identify ethical Institutional Review Board (IRB). The basic pro-
principles and guidelines for conducting human visions of the IRB are outlined by the Federal
research in response to the Tuskegee syphilis Policy for the Protection of Human Subjects, or
experiment [5]. The Tuskegee syphilis experiment “Common Rule” was published and codified by
was a prospective clinical study conducted between 15 federal department and agencies [3]. The
1932 and 1972 by US Public Health Service to Common Rule also outlines the basic provisions
study the natural history of untreated syphilis in for IRB, informed consent, and assurances of
rural African-American males in Alabama. The compliance [3]. Any research that is conducted
men were never told of their diagnosis, and by or for these federal agencies must abide by
although penicillin was found to be an effective the “basic policy for protection of human
cure for the disease, treatment was withheld [16]. A research subjects” (also known as Subpart A,
whistle-blower by the name of Peter Buxtun Part 46, Protection of Human Subjects, of Title
revealed the story to the press, leading to public 45, Public Welfare, in Code of Federal
outrage and major changes in US law and regula- Regulations (46 CFR 45)).
tion on how human research is conducted [2, 16]. The IRB is an independent, administrative
Other controversial research projects in the United group tasked with the responsibility of reviewing
States during this era include the Stanford prison and approving research on human subjects, with
experiment (1971), where the participants were the purpose of protecting the rights and welfare
unable to withdraw from the study [20], and Project of human subjects. IRBs, and human subject
MKUltra (1950s–1973) where subjects were not research in general, are regulated by the Office
informed of their participation in the studies [14]. for Human Research Protections (OHRP), an
In 1978, the National Commission for the organization within the Department of Health
Protection of Human Subjects of Biomedical and and Human Services (HHS). The goal of the IRB
Behavioral Research issued the Belmont Report, is to insure that proper informed consent is
which established three core principles (Fact Box obtained and documented, risks to subjects are
9.2) for research involving human studies: respect minimized, research design is sound and do not
for persons, beneficence, and justice [18]. The appli- unnecessarily expose subjects to risk, patient
cations of these principles led to the consideration of selection is equitable, appropriate data monitor-
the following requirements: informed consent, risk/ ing provisions are in place, and privacy and con-
benefit analysis, and patient selection [17]. fidentiality of the subjects are maintained [3].
The IRB also has the power to terminate or sus-
pend any research that is not in accordance with
Fact Box 9.2: Belmont Report, Three Core the policy [3]. The IRB is comprised of scientists,
Principles for Research Involving Human lay community members, physicians, and law-
Studies [18] yers [3]. The average size of the IRB is 14 mem-
1 . Respect for persons bers. One study found internal medicine to be
2. Beneficence most commonly represented and orthopedic sur-
3. Justice gery to be the least represented, among physician
members of IRBs [10].
The Common Rule also includes regulations
9.3  ommon Rule (Part 46,
C for addressing vulnerable populations, such as
Protection of Human pregnant women, fetuses, in  vitro fertilization,
Subjects, of Title 45, Public prisoners, and children. Subpart B of 46 CFR 45
Welfare, in Code of Federal affords special protections to pregnant women,
Regulations) fetuses, and neonates of uncertain viability or
nonviable neonates. Research must directly ben-
The National Research Act of 1974 established a efit the mother and/or the fetus; if not, risks to the
set of guidelines for research involving human fetus must be minimal, and the purpose of the
subjects, introducing the concept of the study must be “the development of important
78 K. Takamura and F. Petrigliano

biomedical knowledge that cannot be obtained and well-being of adolescents and cannot reason-
by any other means.” Subpart C of 46 CFR 45 ably or practically be carried out without the wav-
affords special protections to prisoners, to ensure ier” or if the research involves treatments that
that they are not exploited, but at the same time adolescents can receive without parental permis-
given equal opportunity to participate in research sion (may differ by state law) [3, 8]. Additionally,
studies. the investigators also need to present evidence that
the adolescents have the ability to understand the
research and their rights as participants and the
9.4 Research in Children research protocol must contain safeguards to pro-
and Adolescents tect the interests of the adolescent, consistent with
the risk presented to them [8].
Children and adolescents comprise an important
group of study participants in orthopedics.
Importantly, this is a vulnerable population that 9.5 IRB Submission
warrants additional protections. The risks and and Approval Process
benefits must be carefully evaluated, to prioritize
the welfare of the patient, while recognizing the The definition of research involving human sub-
positive potential benefits of research. This risk/ jects set forth by the common law [45 CFR
benefit evaluation often impacts participant 46.102 (d)] states that it is “[a] systematic inves-
inclusion/exclusion criteria and consideration for tigation, including research development, testing
how the standard of care may be affected by and evaluation, designed to develop or contribute
research activities [8]. to generalizable knowledge” [3]. A majority of
One of the major differences in terms of research involving human subjects adheres to this
research in children is that, by definition, they are definition and, subsequently, requires IRB
unable to provide informed consent [5]. Instead, approval, but there is a subset of research that
children can provide assent, which is defined in appears to adhere to this definition that does not
the policy as “a child’s affirmative agreement to require IRB oversight. This includes certain qual-
participate in research” (§46. 402 [b]), and par- ity improvement and quality assurance initia-
ents or legal guardians can grant permission for tives, case reports, and case reviews [13]. The
their child to participate (§46. 402 [c]). The pro- Food and Drug Administration (FDA) provides
cess of obtaining assent should address the devel- two general guidelines indicating that a quality
opmental stage of the child and provide improvement or quality assurance initiatives con-
opportunities for the child to discuss their will- stitute human research if the investigators will
ingness or unwillingness to participate, the seek publication in a scientific/national journal or
degree to which the child has control over the presentation at a national meeting or if the results
participation decision, and whether certain infor- will be applicable in a wider setting [13]. With
mation will or will not be shared with the parents regard to case series or case reports, there is no
[8]. Typically, local IRBs will provide guidelines regulatory guidance on this particular issue [13].
for the age and conditions where a child’s assent If there are any questions of whether IRB
is required. approval is required for a particular study, it is
If the research involves acute illnesses or inju- advisable to consult with the IRB prior to starting
ries, the investigators and IRB should provide for the study.
“ongoing process for permission and assent” to The IRB submission process can be an ardu-
accommodate for the evolving understanding in ous and time-consuming task that may involve
the changes of the child’s medical and mental con- multiple modifications and revisions. It can espe-
dition [8]. Waivers of parental permission for ado- cially be cumbersome in multicenter trials involv-
lescent participation should be considered by the ing multiple IRBs, and the variability between
IRB when the “research is important to the health institutions can hinder multi-institutional
9  The Ethical Approval Process 79

research [4]. One study in the United Kingdom Table 9.1  OHRP expedited review categories [3]
found that only 24% of studies submitted were 1. Clinical studies of drugs and medical devices only if
approved without modifications [11]. Common one of the following conditions are met:
reasons for proposal rejection were improperly  (a) Research on drugs for which an investigational
new drug application is not required
designed consent form, poor study design, unac-
 (b) Research on medical devices for which an
ceptable risk to subjects, and ethical and legal investigational device exemption application is
reasons (Fact Box 9.3) [10]. Some suggestions to not required or the medical device is cleared
the young investigator navigating the IRB include and approved for marketing and is being used
collaborating with an experienced mentor, famil- for which it has been cleared and approved for
2. Collection of blood samples (finger stick, heel stick,
iarizing oneself with the IRB guidelines and pro- ear stick, venipuncture)
cedures of the research site(s), and discussing the 3. Prospective collection of biological samples for
protocol with the IRB prior to submission [10]. research purposes by noninvasive means
4. Collection of data through noninvasive measures
(excluding X-rays and microwaves)
5. Research involving materials that have been
Fact Box 9.3: Common Reasons for Proposal collected or will be collected for non-research
Rejection [10] purposes
6. Collection of voice, video, digital, or image
1 . Improperly designed consent form
recordings for research
2. Poor study design 7. Research on individual or group characteristics or
3. Unacceptable risk to subjects behavior or research involving interviews, surveys, etc.
4. Ethical and legal reasons 8. Continuing review of research previously approved
by IRB if:
 (a) Enrollment of new subjects is closed or if all
subjects have completed the research-related
There are three levels of review that are identi- interventions or if the research remains active
only for long-term follow-ups
fied by federal regulation: expedited review, full or
 (b) No subjects have been enrolled and no
convened review, and exemption from review. If additional risks have been identified
the study poses only “minimal risk” to the subject,  (c) Remaining research activities are limited to
the study may be suitable for expedited review. data analysis
“Minimal risk” is defined as such “that the proba- 9. Continuing review of research (not conducted under an
bility and magnitude of harm or discomfort antici- investigational new drug application or investigational
device exemption that does not fit under items 2
pated in the research are not greater in and of through 8) for which the IRB has determined that the
themselves than those ordinarily encountered in research involves no greater than minimal risk and no
daily life or during the performance of routine additional risks have been identified
physical or psychological examination or tests” * This concise summary is modified and abbreviated from
[45 CFR 46.102(i)]. Studies that may be suitable the OHRP Expedited Review Categories [3]. Refer to the
OHRP for complete information
for expedited review based on aforementioned
definition are shown in Table 9.1. For minimal risk
studies, there is considerable variability in the IRB uments, specimens, and taste and food quality eval-
process [7]. These studies are not reviewed by the uation (for full list, see 45 CFR 46.101 (b)).
full IRB and typically reviewed by a subcommit- However, an exemption of a study must be made by
tee or administratively within the office [13]. the IRB, and no further notification is required
A study may be exempt from review if it meets from the IRB if that status is granted. A study
one of six of the federally defined exempt catego- requires full review if it poses “greater than mini-
ries. Examples include research conducted in mal risk.” Examples include Phase I, II, and III
established or commonly accepted educational clinical trials, studies involving vulnerable popula-
­settings, retrospective review of existing data, doc- tions, and studies including investigational devices.
80 K. Takamura and F. Petrigliano

9.6 Informed Consent The informed consent process may be expe-


dited or waived if the research is considered
The Common Rule sets forth the components of “minimal risk,” if the waiver or alteration of
informed consent in 45 CFR 46.116 [3]: the consent does not adversely affect the par-
ticipant’s rights and welfare, if the research
• A statement that the study involves research, cannot be practically achieved without the
its purpose, its duration, description of waiver/alteration, and if pertinent information
­procedures, and identification if the research will be given to the patients after the study if
is experimental appropriate.
• Description of any “reasonably foreseeable” Studies have demonstrated that partici-
risks and discomforts pants’ understanding of the informed consent
• Description of any possible benefits to the par- is oftentimes inaccurate or incomplete [1, 9].
ticipant or others that may be reasonably Additionally, one of the most requested
expected from the study changes required for study approval by the
• Disclosure of appropriate alternative interven- IRB are modifications to the consent form
tions (if any) that may be advantageous to the [10]. A systematic review found use of multi-
participant media and enhanced consent forms had lim-
• Statement describing the extent (if any) to ited success in improving participants’
which confidentiality of subject data is understanding, but having a study team mem-
maintained ber or a neutral educator spending one-­on-­one
• If the research involves more than minimal risk time with the participant was found to be the
and explanation of any compensation or if med- most effective way of improving their under-
ical treatments are available if an injury occurs standing [6]. It is important to keep in mind
• Information of the contact person regarding that the investigator’s obligations regarding
questions about the research, participants’ rights, the consent process does not end once the par-
and the contact person when an injury occurs ticipant signs the form; if the investigator
believes this to be true, they may be commit-
Additionally, the IRB may request for addi- ting a “serious disservice to the participant by
tional elements when appropriate: not observing the ethical standards” [13]. For
example, the investigator or a member of the
• Risks that may be “unforeseeable” (e.g., to the research team should be available to answer
embryo or fetus if the participant becomes questions regarding the study at any point in
pregnant) the study.
• Anticipated circumstances where the investi-
gator will terminate the participant’s involve- Take-Home Messages
ment in the study without their consent • Institutional regulations and laws exist to pro-
• Additional costs that the participant may incur tect human subjects in research.
• Consequences if a participant decides to with- • The IRB process may be difficult, oftentimes
draw from the study and procedures for with several modifications, but working with
“orderly termination” the IRB prior to submission is helpful and
• A statement that “significant new findings” recommended.
which may affect the participant’s willingness • Obtain proper informed consent (following
to continue during the course of the study will the requirements set forth by 45 CFR 46.116),
be provided to the participants and keep in mind that the investigator’s obli-
• An approximate number of participants in the gations do not end as soon as the participant
study provides their signature on the form.
9  The Ethical Approval Process 81

References 13. Parvizi J, Tarity TD, Conner K, Smith JB. Institutional


review board approval: why it matters. J Bone Joint
Surg Am. 2007;89:418–26. https://doi.org/10.2106/
1. Barrett R.  Quality of informed consent: measuring
JBJS.F.00362.
understanding among participants in oncology clini-
14.
Project MK Ultra, the Central Intelligence
cal trials. Oncol Nurs Forum. 2005;32:751–5. https://
Agency’s’ Program of Research into Behavioral
doi.org/10.1188/05.ONF.751-755.
Modification (1977), First Session ed. Washington:
2. Corbie-Smith G.  The continuing legacy of the
U.S. Government Printing Office (copy hosted at the
Tuskegee Syphilis Study: considerations for clinical
New York Times website); 1977.
investigation. Am J Med Sci. 1999;317:5–8.
15. Trials of War Criminals before the Nuremberg Military
3. Department of Health and Human Services NIoH,
Tribunals under Control Council Law No. 10, Vol. 2.
Office for Protection for Research Risks Code
Washington, D.C.: U.S. Government Printing Office;
of Federal Regulations  - Title 45 Public Welfare
1949. p. 181–2.
CFR 46.
16. Reverby SM.  Examining Tuskegee: the infamous

4. Dyrbye LN, et  al. Medical education research
syphilis study and its legacy. Chapel Hill: The
and IRB review: an analysis and comparison of
University of North Carolina Press; 2009.
the IRB review process at six institutions. Acad
17.
United States. National Commission for the
Med. 2007;82:654–60. https://doi.org/10.1097/
Protection of Human Subjects of Biomedical and
ACM.0b013e318065be1e.
Behavioral Research. The Belmont report: ethical
5. Fischer BA IV.  A summary of important docu-
principles and guidelines for the protection of human
ments in the field of research ethics. Schizophr Bull.
subjects of research. DHEW Publication no (OS)
2006;32:69–80. https://doi.org/10.1093/schbul/sbj005.
78-0012. The Commission; for sale by the Supt. of
6. Flory J, Emanuel E. Interventions to improve research
Docs. Bethesda: U.S.  Government Printing Office;
participants’ understanding in informed consent for
1978.
research: a systematic review. JAMA. 2004;292:1593–
18.
United States. National Commission for the
601. https://doi.org/10.1001/jama.292.13.1593.
Protection of Human Subjects of Biomedical and
7. Hirshon JM, Krugman SD, Witting MD, Furuno JP,
Behavioral Research., United States. National
Limcangco MR, Perisse AR, Rasch EK. Variability in
Commission for the Protection of Human Subjects
institutional review board assessment of minimal-risk
of Biomedical and Behavioral Research. Appendix,
research. Acad Emerg Med. 2002;9:1417–20.
the Belmont report: ethical principles and guidelines
8. Institute of Medicine BoHSP, Committee on Clinical
for the protection of human subjects of research.
Research Involving Children. Ethical conduct of
DHEW publication no (OS) 78-0013. The com-
clinical research involving children. Washington, DC:
mission; for sale by the Supt. of Docs. Bethesda:
National Academies Press; 2004.
U.S. Government Printing Office; 1978.
9. Joffe S, Cook EF, Cleary PD, Clark JW, Weeks
19.
World Medical Association. World Medical
JC. Quality of informed consent in cancer clinical tri-
Association Declaration of Helsinki: ethical princi-
als: a cross-sectional survey. Lancet. 2001;358:1772–
ples for medical research involving human subjects.
7. https://doi.org/10.1016/S0140-6736(01)06805-2.
JAMA. 2013;310:2191–4. https://doi.org/10.1001/
10. Jones JS, White LJ, Pool LC, Dougherty JM. Structure
jama.2013.281053.
and practice of institutional review boards in the
20. Zimbardo PG. On the ethics of intervention in human
United States. Acad Emerg Med. 1996;3:804–9.
psychological research: with special reference to the
11. Kent G.  Responses by four Local Research Ethics
Stanford prison experiment. In: Pimple KD, editor.
Committees to submitted proposals. J Med Ethics.
Research ethics. New York: Routledge; 2016.
1999;25:274–7.
12. Ndebele P. The Declaration of Helsinki, 50 years later.
JAMA. 2013;310:2145–6. https://doi.org/10.1001/
jama.2013.281316.
How to Assess Patient’s Outcome?
10
Yuichi Hoshino and Alfonso Barnechea

10.1 Introduction Note that one intervention or event can have


multiple outcomes, but, within an outcome, the
In orthopedic surgery, we find ourselves in the results are mutually exclusive (one just cannot
need to be able to report the results of whatever have at the same time a grade-II laxity and grade-­
procedure we do, whether within the context of a III laxity 3  months after primary ACL
specific pathology or in a broader sense when reconstruction).
analyzing a given surgical technique. Clearly, Outcomes as such need to be predefined, that
clinical results after any kind of treatment, is, if one wants to report a result, one needs to
whether non-operative or operative, which might know how to define it in terms that are both
be expressed through specific criteria, are defined understandable by peers and reproducible. In
as an outcome. For example, possible outcomes addition, an outcome should be measurable.
to be measured after arthroscopic Bankart repair Typically, an outcome instrument is a measurable
could be the postoperative, clinician-measured scale, score, or rating, which allows assessment
range of motion or the ability of the patient to of outcomes in a standardized manner and com-
resume his/her previous sports level. parison within the patients group or with others.
In probability theory, an outcome is a pre- Outcome instruments allow a reliable and repro-
defined result of a given event. An outcome is ducible documentation of patient’s health or
also understood as what is ultimately achieved by function at the same or different time points.
an activity. An outcome is useful for the determi-
nation and evaluation of the results of what we
do. For example, the event is our intervention 10.2 What Type of Outcome
(e.g., an arthroscopic procedure), and the out- Instruments Exist and What
come would be the measured result (e.g., the Are They Used for?
level of return to sports a given time after the
procedure). Outcome instruments can be quantitative or qual-
itative. (Quantitative instruments measure out-
comes with numeric values, which can be
continuous (e.g., the measurement of tibial dis-
Y. Hoshino (*)
placement in millimeters on a KT-1000 test or the
Department of Orthopaedic Surgery, Graduate School
of Medicine, Kobe University, Kobe, Japan number of degrees of passive shoulder external
rotation) or discontinuous (the number of stairs a
A. Barnechea
Department of Orthopedic Surgery, Edgardo person can climb). Qualitative instruments ­measure
Rebagliati Martins National Hospital, Lima, Peru more subjective results, such as the tenderness of a

© ISAKOS 2019 83
V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_10
84 Y. Hoshino and A. Barnechea

joint on palpation or on motion or the overall Patient-reported outcome instruments


degree of satisfaction after a procedure). (PROMs) are important because they show how
A further differentiation can be made based on satisfied patients are with what we do. Sometimes
their area of evaluation. One can differentiate these outcomes are referred to as “subjective” but
outcome instruments assessing health status should generally be avoided due to the misinter-
more globally, disease specific, or specifically pretation of these as “not objective” [14].
each joint, organ, or region. However, why are PROMS so important for
Another way of separating outcome instru- clinical outcome measurements?
ments is based on their source of reporting. The Firstly, it is their ability to measure not only
clinician- or independent observer-based instru- the “objective” goals of a given procedure or
ments can be distinguished from patient-reported treatment but the real impact on the patient’s
instruments. quality of life, which is the most important rea-
Clinician-reported outcome instruments: son of every treatment.
Within these outcome instruments, outcomes are Secondly, health policy focuses more fre-
measured objectively by someone other than the quently on the patients’ well-being. PROMs are a
patient, such as a physician, technician, or valuable method to obtaining information directly
researcher. Typical examples are: from the ultimate target of health interventions—
the patients—and thus not only measuring the
• Anterior tibial displacement in millimeters impact of these interventions but guiding us
measured by KT-1000 testing toward new solutions [30].
• Degrees of shoulder abduction by manual As stated earlier, patient-reported outcome
examination measurements can also be classified into two
• Degrees of knee varus by X-ray measurements groups:

Patient-reported outcome instruments: These –– General health status: They report the patient’s
outcome instruments are subjective and are overall well-being and functionality; they can
reported by the patient such as: be applied to multiple medical etiologies and
across multiple patients with different cultural
• Degree of shoulder pain at rest measured by and educational backgrounds [14]
the visual analogue scale –– Disease/joint/region specific: They measure a
• Ability to perform activities such as opening a condition’s effect on the patient. Disease-­
door knob, twisting a jar cap, or walking to the specific measurements focus on a subgroup of
bathroom from the bedroom patients affected by a condition and can mea-
• Degree of return to the patient’s previous sports sure the effect of changes in it. They have the
level advantage of measuring general effects of a dis-
ease other than those related to its location.
Also, patient-reported experience measures Region specific (or joint specific) measures the
(PREMs) are instruments for real-time feedback effect of any condition that affects a specific part
about the experience of patients’ quality of care. of the body; these measurements can be applied
Three domains have been described: patient safety to diverse etiologies that affect a specific region.
(includes hygiene and physical safety), clinical The disadvantage sometimes is that they cannot
effectiveness (results of procedures and aftercare), differentiate the etiology if more than one is
and patient experience (compassion, dignity, present (e.g., the ASES self-­ evaluation form
respect, etc.). It has been shown there is a weak score can be affected if the patient has both a
association between PREM and PROM results: In rotator cuff injury and a ­secondary frozen shoul-
an English study, this was shown after hip and knee der). They have the advantage that they can
surgery using institutional PREMs and well-vali- measure small changes on a specific region and
dated PROMs (Oxford hip and knee scores) [5]. are relatively easy to apply [14]
10  How to Assess Patient’s Outcome? 85

First, the researchers should determine


Fact Box 10.1 what is evaluated for his/her research purpose
Types of PROMs (subjective own measure- specifically. Outcome measures can be
ment of patient status). selected based on which region, such as the
– General health status knee, shoulder, or others, and which parame-
 For example, Medical Outcomes Study Short ters, such as pain, subjective function, and/or
Form (SF) 36 [31] satisfaction.
– Disease specific The first idea which outcome instruments
 For example, Western Ontario and McMaster should be used can be taken from previous papers
Universities Osteoarthritis Index
(WOMAC) [4] (hip/knee within a similar research topic.
osteoarthritis) Performance of each outcome measure can
– Joint/region specific be evaluated by psychometric properties,
 • Upper DASH (disabilities of the arm, including its reliability, validity, and respon-
limb shoulder, and hand) index [3]
siveness. Reliability is commonly appraised by
  • Shoulder ASES (American Shoulder and
Elbow Society) self-evaluation the consistency and accuracy of the evaluation.
form score [26] Repeatability of the measurement can be esti-
Shoulder activity level [6] mated by the intraclass correlation coefficient
  • Elbow Oxford elbow score [8] (ICC) for continuous measures or the Cohen
 • Hand/ PRWHE (patient-rated wrist/hand coefficient for discrete measures, and greater
wrist: evaluation questionnaire) [19]
than 0.8 of those coefficients are generally
  • Hip Hip disability and osteoarthritis
outcome score (HOOS) [24] acceptable. The standard error of a measure-
Hip outcome score (HOS) [20] ment (SEM) is used to estimate the precision of
  • Knee WOMAC [4] the evaluation. The magnitude of the ICC
Knee injury and Osteoarthritis defines a measure’s ability to discriminate
Outcome Score (KOOS) [17, 27] among subjects, and the SEM quantifies error
IKDC (subjective form) [12]
in the same units as the original measurement.
Lysholm score [18]
Tegner activity scale [29] Validity refers to how exactly the outcome
 • Foot and FAAM Foot and Ankle Ability measure evaluates what it is expected to assess.
ankle Measure (FAAM) [21] Each outcome measure has a specific area of
Foot Function Index (FFI) [7] evaluation, and the researcher should consider
Ankle Osteoarthritis Scale how much the purpose of his/her research would
(AOS) [9]
  • Lumbar Oswestry Low Back Pain
be addressed by the selected outcome measure.
Disability Questionnaire [10] For example, KOOS score has a great validity to
evaluate the subjective symptom and function of
the knee related to the osteoarthritis [17], whereas
it demonstrates poor validity for sport-related
knee injury and treatment.
10.3 H
 ow to Select Appropriate Researchers should also consider the respon-
Patients’ Outcome siveness of the outcome measure. Responsiveness
Instruments for My Study? is determined by how much the outcome measure
can detect clinically important change over time
There are a variety of outcome instruments, and/or intervention. If the selected outcome mea-
which are frequently used in orthopedic research. sure has only poor responsiveness, the clinically
When a research study is designed, it is an impor- significant effect would be missed. Minimal
tant step to select the best and most appropriate detectable change (MDC) and minimally clini-
outcome instrument. When the wrong choice of cally important difference (MCID) are generally
outcome instruments is used, the scientific value used to indicate the responsiveness of an outcome
is reduced or at worst diminished. measure.
86 Y. Hoshino and A. Barnechea

The MDC is calculated from the ICC and the tional outcome measures to assess the clinical
standard deviation of the outcome measure and and/or scientific impact of the study results.
implies the range of possible measurement error. Most clinical outcome measures do not require
If the change of an outcome measure is lower administration cost, but, in rare cases, e.g., SF-36,
than its MDC, this change can technically be additional cost is needed for collecting and ana-
regarded as nothing. Even if the change of an out- lyzing the data and limits the application of the
come measure is more than MDC, it is still unde- outcome measure.
termined if the patients perceive improvement or
deterioration. The difference over the MCID
would be highly likely to affect the patient
Fact Box 10.2
impression significantly. Each outcome measure
Here are the issues which should be consid-
has its MDC and MCID [1, 2, 11, 13, 15–17, 22,
ered to find the best selection of outcome
23, 25, 28], but they are not always referable in
instruments for your study. Technically
the literature especially for some new outcome
solid and practically usable PROM can be
measures. The researcher should be aware of the
selected through these items.
responsiveness of their selected outcome mea-
sure and would be better to select those whose
• Technical issues
MDC and MCID are available.
–– Is it reliable?
Further selection process goes through con-
ICC and SEM should be checked for
sideration of their practical administration. There
continuous data while Cohen
are some factors which should be considered,
coefficient for discrete data.
such as usability, applicability, and affordability.
–– Is it valid for your research purpose?
Questionnaires should be easy to use and under-
–– Is it responsive enough to answer
standable. Only then lay persons such as patients
your research question?
can answer these without any trouble. Language
MDC and MCID of the outcome
problems might happen in some non-English
measure should be checked.
speaking countries. Because majority of the out-
• Practical issues
come instrument questionnaires are written in
–– Is it commonly used in the same field
English, proper translation and cross-cultural
of research?
adaptation of the questionnaires are necessary. A
–– Is it simple to use and analyze?
properly translated version of the questionnaire
–– Is it readable and understandable for
would be very helpful, but they are not always
your patients?
available for some languages. Mistranslation or
–– Is it available and affordable for you?
misunderstanding by the patients might impair
the quality of the outcome instrument. If there is
no available translated version of the question-
naire, the researcher should make his/her own
translation and go through validation process. Clinical Vignette
Also, the scoring should be simple to interpret A physician innovated a new postoperative
the results with minimal effort. rehabilitation protocol after the ACL recon-
When you design a clinical study, at least one struction. He wanted to demonstrate the
PROM and one clinician-based outcome instru- clinical advantage of the newly developed
ment which are widely accepted in your field of rehabilitation method and considered a clin-
interest should be utilized for each study. ical research. The purpose of this research
Otherwise, the results would not be compared to was determined to compare the short-term,
those reported in the previous papers. Rarely i.e., 2-year, clinical outcome after the ACL
used outcome measures might need other addi- reconstruction between c­onventional and
10  How to Assess Patient’s Outcome? 87

important patient relevant outcomes to antirheumatic


newly developed postoperative rehabilita- drug therapy in patients with osteoarthritis of the hip
or knee. J Rheumatol. 1988;15:1833–40.
tions in the young athletes. In addition to 5. Black N, Varaganum M, Hutchings A.  Relationship
the clinical evaluations of the knee stability between patient reported experience (PREMs) and
including KT-measurement, IKDC score patient reported outcomes (PROMs) in elective sur-
was collected as a physician-­reported out- gery. BMJ Qual Saf. 2014;23:534–42.
6. Brophy RH, Beauvais RL, Jones EC, Cordasco FA,
come. Since the patient population was Marx RG.  Measurement of shoulder activity level.
young and the follow-up was relatively Clin Orthop Relat Res. 2005;439:101–8.
short, Lysholm and IKDC subjective scores 7. Budiman-Mak E, Conrad KJ, Roach KE.  The Foot
were selected for this study as a patient- Function Index: a measure of foot pain and disability.
J Clin Epidemiol. 1991;44:561–70.
reported outcome. In this case, KOOS score 8. Dawson J, Doll H, Boller I, Fitzpatrick R, Little
was supplementary assessed but not C, Rees J, et  al. The development and validation
reported due to a lack of relevance to this of a patient-­reported questionnaire to assess out-
study purpose. comes of elbow surgery. J Bone Joint Surg Br.
2008;90(4):466–73.
9. Domsic RT, Saltzman CL. Ankle Osteoarthritis Scale.
Foot Ankle Int. 1998;19:466–71.
Take-Home Message 10. Fairbank JCT, Couper J, Davies JB, O’Brien JP. The
Oswestry low back pain disability questionnaire.
• Patient’s outcome measures are requisite for Physiotherapy. 1980;66:271–3.
every clinical research. 11. Greco NJ, Anderson AF, Mann BJ, Cole BJ, Farr J,
• There are a variety of choices for the patient’s Nissen CW, et al. Responsiveness of the International
outcome measures, and the researchers should Knee Documentation Committee Subjective Knee
Form in comparison to the Western Ontario and
select just the right set of them to address their McMaster Universities Osteoarthritis Index, modified
study purpose. Cincinnati Knee Rating System, and Short Form 36 in
• Appropriate selection of the outcome measure patients with focal articular cartilage defects. Am J
goes through clarifying the study targets, list- Sports Med. 2010;38(5):891–902.
12. Irrgang JJ, Anderson AF, Boland AL, Harner CD,
ing up practically usable ones, and deciding Kurosaka M, Neyret P, et al. Development and vali-
one or more depending on their reliability, dation of the International Knee Documentation
validity, and responsiveness. Committee Subjective Knee Form. Am J Sports Med.
2001;29:600–13.
13. Irrgang JJ, Anderson AF, Boland AL, Harner CD,
Neyret P, Richmond JC, et  al. International Knee
References Documentation Committee. Responsiveness of
the International Knee Documentation Committee
1. Ansari NN, Naghdi S, Hasanvand S, Fakhari Z, Subjective Knee Form. Am J Sports Med.
Kordi R, Nilsson-Helander K. Cross-cultural adapta- 2006;34(10):1567–73.
tion and validation of Persian Achilles tendon Total 14. Irrgang JJ, Lubowitz JH. Measuring arthroscopic out-
Rupture Score. Knee Surg Sports Traumatol Arthrosc. come. Arthroscopy. 2008;24(6):718–22.
2016;24(4):1372–80. 15.
Kemp JL, Collins NJ, Roos EM, Crossley
2. Barber-Westin SD, Noyes FR, McCloskey KM.  Psychometric properties of patient-reported
JW.  Rigorous statistical reliability, validity, and outcome measures for hip arthroscopic surgery. Am J
responsiveness testing of the Cincinnati Knee Rating Sports Med. 2013;41(9):2065–73.
System in 350 subjects with uninjured, injured, or 16. Kocher MS, Steadman JR, Briggs KK, Sterett WI,
anterior cruciate ligament-reconstructed knees. Am J Hawkins RJ.  Reliability, validity, and responsive-
Sports Med. 1999;27:402–16. ness of the Lysholm knee scale for various chon-
3. Beaton DE, Katz JN, Fossel AH, Wright JG, Tarasuk dral disorders of the knee. J Bone Joint Surg Am.
V, Bombardier C.  Measuring the whole or the 2004;86-A(6):1139–45.
parts? Validity, reliability, and responsiveness of the 17. Lyman S, Lee YY, Franklin PD, Li W, Cross MB,
Disabilities of the Arm, Shoulder and Hand outcome Padgett DE. Validation of the KOOS, JR: a short-form
measure in different regions of the upper extremity. J knee arthroplasty outcomes survey. Clin Orthop Relat
Hand Ther. 2001;14:128–46. Res. 2016;474(6):1461–71.
4. Bellamy N, Watson Buchanan W, Goldsmith CH, 18. Lysholm J, Gillquist J.  Evaluation of knee ligament
Campbell J, Stitt LW. Validation study of WOMAC: surgery results with special emphasis on use of a scor-
A health status instrument for measuring clinically ing scale. Am J Sports Med. 1982;10:150–4.
88 Y. Hoshino and A. Barnechea

19. MacDermid JC, Turgeon T, Richards RS, Beadle M, and Patient Generated Index in people with chronic
Roth JH.  Patient rating of wrist pain and ­disability: knee pain commenced on oral analgesia: analysis of
a reliable and valid measurement tool. J Orthop data from a randomised controlled clinical trial. Qual
Trauma. 1998;12(8):577–86. Life Res. 2017;26(3):761–6.
20. Martin RL, Philippon MJ. Evidence of validity for the 26. Richards RR, An KN, Bigliani LU, Friedman RJ,
Hip Outcome Score in hip arthroscopy. Arthroscopy. Gartsman GM, Gristina AG, et  al. A standardized
2007;23:822–6. method for the assessment of shoulder function. J
21. Martin RL, Irrgang JJ, Burdett RG, Conti SF, Van Shoulder Elb Surg. 1994;3:347–52.
Swearingen JM.  Evidence of validity for the Foot 27. Roos EM, Lohmander LS.  The Knee injury and

and Ankle Ability Measure (FAAM). Foot Ankle Int. Osteoarthritis Outcome Score (KOOS): from joint
2005;26:968–83. injury to osteoarthritis. Health Qual Life Outcomes.
22. Marx RG, Jones EC, Allen AA, Altchek DW,
2003;1:64.
O’Brien SJ, Rodeo SA, et  al. Reliability, valid- 28.
SooHoo NF, Li Z, Chenok KE, Bozic
ity, and responsiveness of four knee outcome KJ. Responsiveness of patient reported outcome mea-
scales for athletic patients. J Bone Joint Surg Am. sures in total joint arthroplasty patients. J Arthroplast.
2001;83-A(10):1459–69. 2015;30(2):176–91.
23. Mohtadi NG, Griffin DR, Pedersen ME, Chan
29. Tegner Y, Lysholm J.  Rating systems in the evalua-
D, Safran MR, Parsons N, et  al. Multicenter tion of knee ligament injuries. Clin Orthop Relat Res.
Arthroscopy of the Hip Outcomes Research 1985;198:43–9.
Network. The Development and validation of a self-­ 30. Walton MK, Powers JH III, Hobart J, Patrick D,

administered quality-of-life outcome measure for Marquis P, Vamvakas S, et  al. International Society
young, active patients with symptomatic hip dis- for Pharmacoeconomics and Outcomes Research Task
ease: the International Hip Outcome Tool (iHOT-33). Force for Clinical Outcomes Assessment. Clinical
Arthroscopy. 2012;28(5):595–605. Outcome Assessments: Conceptual Foundation-­
24. Nilsdotter AK, Lohmander LS, Klässbo M, Roos
Report of the ISPOR Clinical Outcomes Assessment—
EM.  Hip disability and osteoarthritis outcome score Emerging Good Practices for Outcomes Research
(HOOS)—validity and responsiveness in total hip Task Force. Value Health. 2015;18(6):741–52.
replacement. BMC Musculoskelet Disord. 2003;4:10. 31. Ware JE Jr, Sherbourne CD. The MOS 36-item short-­
25. Papou A, Hussain S, McWilliams D, Zhang W,
form health survey (SF-36). I. Conceptual framework
Doherty M. Responsiveness of SF-36 Health Survey and item selection. Med Care. 1992;30:473–83.
Basics of Outcome Assessment
in Clinical Research
11
Monique C. Chambers, Sarah M. Tepe,
Lorraine A. T. Boakye, and MaCalus V. Hogan

11.1 Introduction ­ edical interventions are best suited for the local
m
patient population. Finally, transparency in clini-
The World Health Organization defines an out- cal outcomes empowers healthcare institutions to
come measure as the “change in the health of an make healthcare decisions that help to improve
individual, group of people, or population that is patient care and ultimately drive down cost.
attributable to an intervention or series of interven- There has been an increased focus on vari-
tions” [13]. Measuring clinical outcomes is essen- ous outcome measures because of clinical inter-
tial to track progress as healthcare institutions ventions. Much of the focus on clinical and
continue to shift toward more value-based poli- translation research seeks to satisfy the scien-
cies, programs, and practice. Outcome measures tific curiosity of the clinician and/or scientist.
are largely motivated by national standards of care However, as research guides clinical manage-
and financial incentives to minimize waste and ment, it is important to apply the appropriate sci-
cost burden associated with care delivery. Clinical entific methods to accurately assess how medical
outcome measures may include health-­related fac- interventions impact patient outcomes. In this
tors, health system-related factors that serve as a chapter, we will discuss these outcome measures
proxy for medical resource utilization, and/or the to guide clinicians that are pursuing research and
subjective perspective of patients. Tracking clini- to equip young investigators with the necessary
cal outcomes allows healthcare organizations to tools to adequately design studies that utilize the
identify variations in care delivery to properly appropriate measure to assess outcomes based on
align healthcare practices within an organization. the clinical problem.
Additionally, it provides a model to assess evi-
dence-based practices and determine which
11.2 Principles of Outcome
Measures
M. C. Chambers
Foot and Ankle Injury Research [F.A.I.R] Group,
Department of Orthopaedic Surgery, University of An outcome measure is the result of a test that is
Pittsburgh School of Medicine, Pittsburgh, PA, USA used objectively to determine the baseline func-
S. M. Tepe · L. A. T. Boakye · M. V. Hogan (*) tion of a patient at the beginning of treatment
Department of Orthopaedic Surgery, University of [13]. Once treatment has commenced, the same
Pittsburgh School of Medicine, Pittsburgh, PA, USA metric can be used to determine progress and
e-mail: hoganmv@upmc.edu

© ISAKOS 2019 89
V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_11
90 M. C. Chambers et al.

treatment efficacy. Appropriate outcome mea- clinical investigators should understand which
sures allow one to assess the quality of medical data sources are best suited to answer their clini-
interventions. The most useful outcomes should cal question. Mandates related to the meaning-
be measurable based on discrete parameters and ful use of electronic medical records lead the
distinct time points. The measurement tool Food and Drug Administration (FDA) to estab-
should be accurate within a small range of error. lish aims of electronic data sources. It is also
The outcome measure should also be reliable important to understand how various data
with the ability to achieve the same result if sources are generated. Common data sources
applied again by a well-trained provider. The tool are generated from administrative claims and
should be validated to ensure consistency with health insurance enrollment, clinical encounters
other measures used to track outcomes [10]. As with providers, patient surveys, and clinical tri-
such, the goal of measuring outcomes is to pro- als, among other sources. There are many
vide a baseline assessment for the improvement advantages/disadvantages to these sources
of patient outcomes, patient satisfaction, and the (Table 11.1). The appropriate collection, main-
cost burden associated with healthcare. tenance, and utilization of data sources will
result in the accurate assessment of medical
treatments and outcomes that are specific to the
11.3 Considering Various Data patient population of interest. Clinical investi-
Sources gators should have a critical eye for the data
source used and should seek to constantly
Regulatory agencies seek to streamline the col- improve the thorough collection of medical data
lection and utilization of data sources; therefore, and outcomes assessment.

Table 11.1  Understanding data sources [14]


Data sources Advantages Disadvantages
Administrative data •  Standardized coding systems • Intent of data is for billing, rather than
•  Electronic accessibility quality reporting/care
•  Easy to access when de-identified •  Concerns with accuracy
Medical records • Most comprehensive view of medical • Costly to obtain and maintain data
picture protection
• Electronically available across most •  Risk of data breach
practices
Patient • Focuses on patient experience as a measure • Patients may misunderstand the
surveys for outcomes/satisfaction questions
• Questionnaires, such as PROs, that have • Responses may differ based on the time
been validated to reflect quality outcomes within the episode of care
• Results may be better understood by • Need for questionnaire implementation
patients AND providers and administration which may be
costly
•  Sampling bias of activated patients
Standardized clinical •   More comprehensive data set •  May have incomplete data
data •  Data set already exist • May reflect different medical practices
• Assess outcomes across multiple domains based on regional population standards
of cares
Shared data/national • May enhance understanding of results of •   May lead to inaccurate conclusions
databases [16] individual trials • Lack of homogeneity in the populations
• Allows pooling of multiple studies to across studies
expand the scientific rigor and implications • Endpoints may be defined differently
•  Improves medical accuracy across studies
11  Basics of Outcome Assessment in Clinical Research 91

11.4.1 Clinical Quality Measures


Fact Box 11.1: The FDA Aims for Electronic (CQMs)
Data Sources [15]
• Eliminate unnecessary duplication of Qualitative research also seeks to address inade-
data quacies in quality of care that cannot be directly
• Reduce the possibility for transcription explained by large data pools. A focus on quality
errors improvement projects promotes accountability
• Encourage entering source data during a and transparency in healthcare outcomes. Clinical
subject’s visit quality outcome measures help establish the stan-
• Facilitate remote monitoring of data dard of care by measuring or quantifying health-
• Promote real-time access for data review care processes, outcomes, and organizational
• Facilitate the collection of accurate and structure. Healthcare organizations and clinical
complete data researchers use parameters such as mortality,
readmission, and complications as a measure to
improve both high-quality care and cost goals.
These goals include care that is effective, safe,
11.4 Qualitative Methods efficient, patient-centered, equitable, and timely
care [6]. Additionally, the Centers for Medicare
A primary area of interest with data sources and Medicaid Services (CMS) uses several clini-
deals with the use of medical data to assess cal quality measures to calculate overall hospital
qualitative outcomes. An increase in qualitative quality performance.
studies came about following the development
of the Patient-­ Centered Outcomes Research
Institute (PCORI) in 2010. The PCORI was Fact Box 11.3: Clinical Quality Measures
funded by the Congress to help patients make (CQMs) of Hospital Performance [6]
informed decisions regarding their healthcare • Mortality (22%)
decisions. Qualitative research methods priori- • Safety of care (22%)
tize patient needs, experiences, and perspec- • Readmissions (22%)
tives. Qualitative research provides clarity to • Patient experience (22%)
clinical practice in ways that are seldom • Effectiveness of care (4%)
revealed through quantitative methodologies • Timeliness of care (4%)
[12]. Qualitative studies may be conducted • Efficient use of medical imaging (4%)
through one-on-one interviews, focus group • Other MSK complications
studies, or surveys.

11.5 Performance-Based
Outcome Measures
Fact Box 11.2: Qualitative Research
• Seeks to ask a question Performance-based outcome measures have
• Emphasizes people and process been used for decades in orthopedics includ-
• Acknowledges that how clinical prac- ing in biomechanical and kinematic studies.
tice occurs is more important than doing Performance-based outcome measures usually
a task involve observation of the patient in some capac-
• Results should be placed within the con- ity to assess functional status. The advantage of
text of the qualitative research question collecting objective data means that the results
should be reproducible and without bias [16].
92 M. C. Chambers et al.

Despite the widespread use of performance based status and well-being [2]. The collection of PROs
outcomes, many have not been validated or cor- has become more prevalent in physician practices
related with clinical outcomes. Clinical outcome across the United States. The use of PROs is a
measures may be applied inappropriately to a valuable tool for physicians, patients, and clinical
population that the outcome measures were not investigators. PROs have been collected by phy-
designed or validated to assess. Such measures sicians for many years; however, they have his-
should be modified and validated for the current torically been used only to determine the efficacy
population of interest prior to use in a clinical of various treatments [2].
investigation. Nonetheless, performance-based In the past few years, the focus of a patient’s
outcomes present an objective method to deter- self-assessment has shifted towards the idea that
mine a patient’s functional capacity. the patient perspective and input is an integral
aspect of clinical assessment and progress.
Patients can actively participate in medical
11.6 Self-Reported Outcome decision-­making to help guide their treatment and
Measures understand improvement or changes that may
occur. Each patient responds to treatment in a
Outcomes in healthcare are not only representa- unique way. Therefore, it is important that, while
tive of the quality of care provided but also reflect standardized, the questionnaires are answered
the service as experienced by the patient [1]. accurately by the patient. While physicians remain
Self-­reported outcome measures were developed the educators, patients can maintain a certain
as a means of gathering information based on degree of autonomy by self-reporting symptoms,
patient experience and perception. Many of these progress, and health-related quality of life [3].
measures was designed to assess treatment effec- Medical professionals, researchers, hospital
tiveness and to emphasize the importance of the administrators, and even insurers still use the
individual’s own perspective when making the information as a guide to what treatments are
evaluation of care [2, 8]. These outcomes may most clinically and financially effective. The con-
include pain scores, patient satisfaction scores, as stant progression toward value-based policies and
well as patient-reported outcomes (PROs). The implementation will lead to increased value of
challenge is to encourage healthcare profession- PRO utilization, Orthopaedic surgery will likely
als (HCPs) to focus on individual patient percep- be impacted especially given the higher rate of
tions of disease impact as well as disease activity elective procedures within our field [4].
measures, with the aims of providing patient-
centered care through shared decision-making
and improving outcomes considered important 11.7.2 Collecting PROs Within
by patients themselves [3]. a Systematic Performance
Platform

11.7 Patient-Reported Outcome The collection of PROs has traditionally been


Measures managed via telephone calls or clinical visits.
Data were recorded in paper charts and then later
11.7.1 Development of PROs manually abstracted or transcribed into a clinical
database. This system was an inefficient way for
PROs are often a hybrid of performance-based systems to track the completeness and accuracy
and self-reported measures that provide a more of outcomes collection. The meaningful use
holistic picture of the patient’s activity and satis- mandate within the Patient Protection and
faction. PROs are standardized, validated ques- Affordable Care Act (ACA) incentivized health-
tionnaires that are completed by patients to care systems to not only implement an appropri-
measure their perceptions of their own functional ate electronic medical record (EMR) but to also
11  Basics of Outcome Assessment in Clinical Research 93

establish systems designed to optimize data col- healthcare system. In order to ensure the accurate
lection, analysis, and interpretation. The use of assessment of PRO data, the information should
the EMR affords the opportunity to optimize be thorough, complete, and collected at key
PRO data collection. stages within the disease state [7]. Any variations
The development of a system-wide perfor- within the data set create limitations to both the
mance platform allows for uniformity in the col- use and the validity of outcomes data.
lection process. To optimize PRO collection, there
should be a standardized system in place for
patients to complete specific PRO questionnaires 11.7.3 Utilization of PROs
at distinct time periods within the disease process
and an episode of care. The platform can be built Patient-reported outcomes create an environment
into the EMR database with predetermined or “red that fosters shared decision-making. Decisions
flag” reminders to alert healthcare providers when made with patient input often make that patient
patients are due for the next set of surveys. This feel more at ease because he or she can take on a
information is then stored within the EMR and can more active role in regards to guiding treatment.
be pulled as part of a large data set to assess the Although tracking PROs has become more com-
impact on the patient population. This information monplace in orthopaedic research, there has been
can also be retrieved and viewed as part of the indi- difficulty in translating the results into changes
vidual patient’s EMR to track the patient’s prog- within clinical practice. To best serve as a guide
ress over time. Access to the PRO results over time for clinical decision-­making, the collection and
allows the physician to use the patient’s daily assessment of PRO outcomes should take clinical
activities and overall health status in the shared relevance into account. To accomplish this goal,
decision-making process for treatment. specific PROs should be validated based on the
Despite the vast benefits of collecting and specific orthopedic disorder or disease process,
­utilizing PROs within the clinical and research particularly for PROs that measure general health
setttings, there remain many challenges that pre- status, which may or may not be greatly impacted
vent PROs from being implemented across the by the orthopedic condition.
While this example is not the direct result of a
research study, the continuous use of PROs in the
Clinical Vignette clinical decision-making process can help to
A patient with an acute ankle injury may shape clinical guidelines and impact the results
have SF-12 and PROMIS-10 scores that of larger population studies as it relates to the uti-
reflect poor health status at baseline. If this lization of PROs. This type of clinical relevance
patient were to undergo surgical interven- should be within the aims of clinical investiga-
tion, there may be a limited change in their tions related to all clinical outcome measures.
SF-12 or PROMIS-10 scores. This patient With clinical implications at the forefront of out-
would be more accurately assessed based come research study design, researchers are bet-
on foot- and ankle-specific outcome mea- ter equipped to develop the right questions and
sures, such as the AOFAS, GRC, and collect the appropriate data to address the root
FAAM scores. Additionally, if this patient cause of outcome discrepancies.
were to be included in a study that assessed
scores at 3-, 6-, and 12-month follow-up,
there would be a large selection bias. 11.7.4 Linking PROs to Comparative
Therefore, the change in PRO measures as Effectiveness Research
the result of a medical intervention is
equally, if not more, important to evaluate To improve the efficiency of PRO collection, the
as the raw pre- and post-operative scores. EMR provides new opportunities to have meaning-
ful use of the data collected and pooled in ways that
94 M. C. Chambers et al.

are innovative and necessary for the shared deci- cal practitioner, there should be consideration for
sion-making process. Comparative effectiveness the clinical implications that can be applied based
research (CER) attempts to synthesize current on the results of any given study. A better under-
medical investigations on various medical inter- standing of the added value of PROs for CER will
ventions with the goal of identifying the optimial help to develop best practices and define more for-
management plan for several orthopedic injuries mally when PROs have demonstrated utility [9].
[11]. As such, the purpose of CER is multifaceted
and aligns well with the utilization of PRO data. Take-Home Message
Linking PRO results provides an evidence-based • The shift from volume-based, physician-
approach to the diagnosis, treatment, prevention, focused care to value-based, patient-centered
and monitoring of orthopaedic conditions [5]. A models has sparked a change in the way treat-
comprehensive approach must be adopted to ensure ment effectiveness is assessed.
that the diverse patient population has been • The use of clinical outcome measures that
accounted for in medical research. PROs that reflect the true value of high-quality and cost-
account for diversity within our population is the efficient care should be considered when eval-
only way to effectively educate patients on which uating medical interventions and determining
treatment has the most appropriate outcomes for if treatment modifications are necessary in
their specific condition and demographics. distinct patient populations.
Unfortunately, many patient populations that • These principles should serve as a guide to the
would be impacted by such medical interventions basis of outcome assessment and study design
are excluded from or opt out of participation in sci- for clinical outcome investigations.
entific studies. As a scientific investigator and clini-

Fact Box 11.4: Patient-Reported Outcomes


Clinical Vignette • PRO data should be thorough, complete,
A 54-year-old patient who is moderately and collected at key stages within the
active and has developed ankle arthritis fol- disease state
lowing an ankle arthroscopy for a prior • Linking PRO results provides an
osteochondral lesion of the talus may be evidence-­based approach to the diagno-
considering an ankle fusion. With the appro- sis, treatment, prevention, and monitor-
priate tracking of regionally specific PRO ing of orthopedic conditions
scores, such as the Foot and Ankle Ability • Better understanding of the added value
Measure (FAAM), American Orthopedic of PRO for CER will help to develop
Foot and Ankle Society Score (AOFAS), best practices
and Global Rate of Change, results may
reveal a drastic improvement in the patients’
functional status as it relates to ankle perfor-
mance. Additionally, general health-related 11.8 Useful Resources
outcome scores, such as short function-12 or
PROMIS-10, may also suggest that the • Kane, RL.  Jones & Bartlett Learning’s
patient is functionally improved and perhaps Understanding Health Care Outcomes
within a range that surgical intervention may Research, 2nd Edition. 2006.
have little to no improvement on his/her • https://www.cms.gov/Medicare/Quality-
functional status or well-being. For this Initiatives-Patient-Assessment-Instruments/
patient, an ankle fusion or arthroplasty may QualityMeasures/index.html.
be the wrong choice and lead to greater mor- • Keith D.  Baldwin; Rachel L.  Slotcavage;
bidity and higher medical resource utiliza- G. Russell Huffman. Orthoepidemiology 101:
tion without adequate clinical benefit. The Basics of the Art of Outcomes Research
in Orthopaedic Surgery. UPOJ. Vol 19.
11  Basics of Outcome Assessment in Clinical Research 95

References 9. Segal C, Holve E, Sabharwal R. Collecting and using


patient-reported outcomes (PRO) for comparative
effectiveness research (CER) and patient-centered
1. AAOS.  Principles of patient reported outcome mea-
outcomes research (PCOR): challenges and oppor-
sures (PROMs) reporting: information statement.
tunities. 2013. http://www.academyhealth.org/files/
2015. https://aaos.org/uploadedFiles/PreProduction/
Collecting%20and%20Using%20PRO%20for%20
About/Opinion_Statements/advistmt/1044%20
CER%20and%20PCOR.pdf.
Principles%20of%20Patient%20Reported%20
10. Selecting the Right Measures for Your Report. Agency
Outcome%20Measures%20(PROMs)%20Reporting.
for healthcare research and quality. https://www.ahrq.
pdf. Accessed 19 Nov 2017.
gov/professionals/quality-patient-safety/talkingqual-
2. Dawson J, Doll H, Fitzpatrick R, Jenkinson C, Carr
ity/create/gather/index.html. Accessed 16 Mar 2018.
AJ. The routine use of patient reported outcome mea-
11. Slutsky J, Atkins D, Chang S, et al. Comparing medi-
sures in healthcare settings. BMJ. 2010;340:c186.
cal interventions: AHRQ and the effective health
https://doi.org/10.1136/bmj.c186.
care program. In: Methods guide for effectiveness
3. Fautrel B, Alten R, Kirkham B, de la Torre I,
and comparative effectiveness reviews [Internet].
Durand F, Barry J, et  al. Call for action: how to
Rockville: Agency for Healthcare Research and
improve use of patient-reported outcomes to guide
Quality; 2008.
clinical decision making in rheumatoid arthri-
12. Smythe L, Giddings LS.  From experience to defini-
tis. Rheumatol Int;2018. https://doi.org/10.1007/
tion: addressing the question: “what is qualitative
s00296-018-4005-5.
research?”. Nurs Prax N Z. 2007;23:37–57.
4. Ferguson C, Rocha JL, Lalli T, Irrgang JJ, Hurwitz S,
13. Tinker A.  The top 7 outcome measures and 3 mea-
Hogan MV. Developing performance and assessment
surement essentials. HealthCatalyst. https://www.
platforms in foot and ankle surgery. Foot Ankle Int.
healthcatalyst.com/The-Top-7-Outcome-Measures-
2016;37(6):670–9.
and-3-Measurement-Essentials. Accessed 17 Feb
5. Health Services Research Information Central.
2008.
Comparative effectiveness research. https://www.nlm.
14. Understanding Data Sources. Agency for healthcare
nih.gov/hsrinfo/cer.html. Accessed 21 Mar 2018.
research and quality. https://www.ahrq.gov/profes-
6. https://www.cms.gov/Medicare/Quality-Initiatives-
sionals/quality-patientsafety/talkingquality/create/
Patient-Assessment-Instruments/QualityMeasures/
understand.html. Accessed 21 March 2018.
index.html. Accessed 18 Feb 2018.
15. US Food and Drug Administration. Guidance for

7. MOTION Group. Current concepts review: patient-­
industry: electronic source data in clinical investiga-
reported outcomes in orthopaedics. J Bone Joint Surg
tions. http://www.fda.gov/downloads/drugs/guid-
Am. 2018;100:436–42.
ancecomplianceregulatoryinformation/guidances/
8. Rolfson O, Rothwell A, Sedrakyan A, Chenok KE,
ucm328691.pdf. Accessed 17 Feb 2018.
Bohm E, Bozic KJ, et al. Use of patient-reported out-
16. Wang S, Hsu CJ, Trent L, Ryan T, Kearns NT,

comes in the context of different levels of data. J Bone
Civillico EF, et  al. Evaluation of performance-based
Joint Surg Am. 2011;93(Suppl 3):66–71. https://doi.
outcome measures for the upper limb: a systematic
org/10.2106/JBJS.K.01021.
review. PM&R. 2018;10(9):951–962.e3.
Types of Scoring Instruments
Available
12
José F. Vega and Kurt P. Spindler

12.1 Introduction two millennia would pass between King


Nebuchadnezzar’s legume experiment and the
Research, of any kind, requires the measurement earliest examples of clinical research. It was not
of an outcome. Possibly the earliest recorded until the turn of the twentieth century that Dr.
account of “research” dates back to Ernest Amory Codman, considered by many to
550 BC. According to the “Book of Daniel,” of be the father of outcomes research, began advo-
The Bible’s Old Testament, the Babylonian King cating his “end result” system, which he
Nebuchadnezzar mandated that all Babylonians famously described as “The common sense
consumed a diet exclusively of meat and wine, as notion that every hospital should follow every
this would keep them in excellent health. A hand- patient it treats, long enough to determine
ful of young men of royal lineage with herbivo- whether or not the treatment has been successful,
rous inclinations objected to this diet. The king and then to inquire, ‘If not, why not?’ with a
decreed that these young men would eat only view to preventing similar failures in the future
beans and water for 10  days, after which he [9].” This idea (which was received quite poorly
would assess their level of nourishment. To the by his peers and ultimately cost him his job at
king’s surprise, he found the legume lovers to be the Massachusetts General Hospital) led Codman
better nourished than those sustained on meat to develop the first registry, which he propheti-
and wine, and, thus, he allowed the diet to con- cally envisioned as a means to implement his
tinue [7, 14, 69, 81]. “end result” idea as standard of care on a national
Research and outcome measurement has cer- level [13].
tainly come a long way from legumes and nour- It would take another 40 years from Codman’s
ishment, but rigorous clinical research is still in development of the “end result” to the first ran-
its early years, relatively speaking. More than domized controlled trial, in which streptomycin
was compared with placebo in the treatment of
pulmonary tuberculosis, which took place in
J. F. Vega 1946 [7]. Nearly another 50  years would pass
Cleveland Clinic Lerner College of Medicine,
before Gordon Guyatt, a young internist from
Cleveland, OH, USA
e-mail: vegaj@ccf.org McMaster University in Ontario, Canada, coined
the term “evidence-based medicine,” and mod-
K. P. Spindler (*)
Cleveland Clinic Sports Health Center, ern orthopedic research, which has come to rely
Garfield Heights, OH, USA more and more heavily on PROMs, is younger
e-mail: spindlk@ccf.org, stojsab@ccf.org still [16, 81].

© ISAKOS 2019 97
V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_12
98 J. F. Vega and K. P. Spindler

Fact Box 12.1 outcome measure alone would not have


The first mention of the phrase “evidence-­ provided such a complete picture demon-
based medicine” in a major American med- strating no difference between the BTB and
ical journal occurred in the November 1992 ST/G grafts in this population. The impor-
edition of the Journal of the American tance of utilizing multiple outcome mea-
Medical Association (JAMA). In 2001, a sures of both varieties is underscored by
mere 9  years after its debut, the phrase the use of both types of data to perform
appeared in >2500 publications. large-scale systematic reviews and meta-­
analyses [62, 68].

12.2 Types of Scoring


Instruments: Patient-­
Reported or 12.3 Patient-Reported Outcome
Observer-Recorded? Measures: General, Joint-­
Specific, and Generic
Two major categories of scoring instruments Options
exist for use in clinical research: PROMs and
observer-recorded outcome measures (OROMs). Orthopedics as a whole, and sports medicine in
The former rely exclusively on patient input (e.g., particular, is a field in which traditional outcome
chronicity, frequency, severity, and impact of measures, such as mortality, are of little utility.
symptoms, instability, limitations, etc.), while the This is because orthopedics (and, again, orthope-
latter take into account that which an observer dic sports medicine in particular) radically
can measure (e.g., knee laxity, degree of articular impacts quality of life—more so than quantity of
cartilage damage, functional ability, etc.). Neither life. Unfortunately, quality of life is often much
is superior or inferior to the other, but instead, the more difficult to measure and is an outcome that
two work hand in hand to provide a complete pic- hinges on many dimensions (e.g., pain, function,
ture of a patient’s status. limitations, etc.). This challenge is not unique to
orthopedics, and, thus, research into the develop-
ment and use of PROMs has grown considerably
Clinical Vignette 1 over the last two decades [25].
To illustrate the utility of PROMs in con- PROMs are validated questionnaires completed
junction with OROMs, consider a 2006 by patients to generate a score that can be tracked
study conducted by Svensson et  al. The over time (to observe a change in a specific patient)
authors compared PROMs, instrumented and/or compared with scores from other patients.
knee laxity, and results of a functional test- While initially designed for use in clinical research,
ing in female patients who had undergone they are being included more and more frequently
anterior cruciate ligament reconstruction as a routine part of patient care, as the results are
(ACL-R) using either bone-patellar tendon-­ readily interpreted by health-care providers and
bone (BTB) or four-strand semitendinosus/ can be used to help direct patient care.
gracilis (ST/G) autografts. At 2-year fol- PROMs come in a multitude of varieties,
low-­up, the investigators found no signifi- including general health measures, joint-specific
cant differences between the groups with measures, and more generic measures. The list of
regard to PROMs, instrumented laxity available PROMs is long and continues to grow
(using the KT-1000 arthrometer), or func- almost every day. Rather than attempt to provide
tion (one-leg-hop test) [70]. The use of one a comprehensive review of all PROMs, the com-
ing pages will review the PROMs most com-
monly used in the current literature.
12  Types of Scoring Instruments Available 99

questionnaire was then administered to nearly


Fact Box 12.2 2500 US military veterans to develop the
Over the past decade, PROMs have becom- Veterans RAND-36 (VR-­ 36) [37, 38]. The
ing increasingly more present as not only a VR-12 is derived from the VR-36 and, despite
research tool but also a metric by which being only a third of the length of its parent sur-
performance can be measured. The use of vey, yields accurate estimates of the VR-36. The
PROMs is steadily becoming a “standard VR-12 measures health-­related quality of life
of care” at multiple institutions across the across seven domains: physical functioning,
globe, as PROM data from individuals is role limitations due to physical and emotional
used to guide treatment decisions, while problems, bodily pain, energy/fatigue, social
aggregate PROM data is used to keep pro- functioning, mental health, and general health.
viders accountable and to measure their The results of the VR-12 are reported as two
performance. In 2015, the AAOS Board of scores—a physical component score (PCS) and
Directors convened the Quality Outcomes a mental component score (MCS) [44].
Data (QOD) Work Group to further investi- Another commonly used general health
gate and evaluate PROMs. PROM is the SF-12 (a condensed version of the
SF-36). The benefit of the VR-12 over the SF-12
is that it is readily available (the rights to the
12.4 General Health PROMs SF-12 are owned by a private corporation cur-
rently), utilizes a five-point rating scale based
General health PROMs are designed to quantify a on item response theory (compared with the
patient’s overall state of health including both SF-12, which utilizes “yes/no” options; this
physical health and mental health. Three of the helps to mitigate floor and ceiling effects), and
most commonly used general health PROMs also includes two questions to assess patient
include the Veterans RAND 12 (VR-12), the perception of change in health status over
PROMIS Global 10, and the EuroQol-5D time—something that is very important. Like
(EQ-5D). the VR-12, the SF-36 is also available in the
public domain but, again, is lengthier and does
not provide substantial more information than
12.4.1  The VR-12 the VR-12.

The VR-12 was developed by the RAND corpo-


ration (the name is a shortening of research and
development) as a by-product of the Medical Clinical Vignette 2
Outcomes Study, a cross-sectional study involv- The utility and importance of general
ing over 22,000 patients with the goal of devel- health PROMs such as the VR-12 are illus-
oping practical tools to monitor patient outcomes trated by their use in large, randomized
[71]. The Medical Outcomes Survey resulted in controlled trials. A recent publication in
the development of the 36-item Short-Form the New England Journal of Medicine uti-
Health Survey (SF-36). The SF-36, which fea- lized the VR-12 to assess the impact of
tures generic quality-of-life measures, was aggressive blood pressure control on qual-
developed with the goal of using the score to ity of life (utilizing data from the Systolic
explain variations in patient outcomes (e.g., Blood Pressure Intervention Trial
lower SF-36 scores indicate “less healthy” pop- [SPRINT]). The authors concluded that
ulations that might do poorly following a certain aggressive blood pressure control did not
intervention due to their poor health status negatively impact PROMs as measured by
rather than the intervention itself). A few small the VR-12 [6].
changes were made to the SF-36, and this “new”
100 J. F. Vega and K. P. Spindler

12.4.2  PROMIS Global 10 For example, the VR-12, PROMIS Global


10, and EuroQol-5D have been compared with
The Patient-Reported Outcomes Measurement one another in patients undergoing knee arthros-
Information System (PROMIS) Global 10 is copy, and all three were found to have equal
another general health PROM and was developed responsiveness. However, none of these general
by the NIH in 2004. It is a ten-question survey health outcome measures were as responsive as
that measures physical functioning, fatigue, pain, the Knee injury Osteoarthritis Outcome Score
emotional distress, and social health [29]. Bridges (KOOS), a joint-specific PROM described
have been developed to allow comparisons below [54]. This supports the notion that, in
between the PROMIS 10 Global and VR-12 PCS orthopedics, general health outcome measures
and MCS [63]. These are algorithms that allow should always be accompanied by a joint-spe-
for the conversion of the VR-12 PCS and MCS cific PROM.
scores to the PROMIS 10 Global scale so that What follows is a brief discussion of joint-­
they can be directly compared. specific PROMs. As mentioned previously, this
list is not meant to be exhaustive but rather fea-
tures the joint-specific PROMs that can be used
12.4.3  EuroQol-5D to collect quality orthopedic data.

The EuroQol-5D (EQ-5D) was developed in the


late 1980s by an interdisciplinary five-country 12.5.1  Oswestry Disability Index
group that set out to develop a brief, general (ODI; Spine)
health measurement tool. They ultimately settled
on five questions, each of which addresses a sin- The Oswestry Disability Index (ODI) is a ten-­
gle dimension: mobility, self-care, usual activi- question survey developed in 1980 for use in
ties, pain/discomfort, and anxiety/depression. patients with back pain. The ten questions are
Each question has five levels [19]. scored from 0 to 5 and assess intensity of pain,
ability to care for oneself, ability to walk, abil-
ity to sit, ability to stand, ability to travel, sex-
Fact Box 12.3 ual function, social life, and sleep quality [23].
As of 2009, the use of the EQ-5D is consid- Lower scores indicate less disability. The ODI
ered a standard of care by England’s is considered by many to be the gold standard
National Health Service (NHS) and is for assessing the impact of back pain. This
being administered pre- and postopera- notion is supported by the use of the ODI in
tively to all patients undergoing elective large randomized controlled trials and in meta-
total hip and total knee arthroplasty (along- analyses [64, 80].
side a joint-specific PROM) [11].

12.5.2  Neck Disability Index (NDI;


Spine)
12.5 Joint-Specific Patient-­
Reported Outcome Measures
The Neck Disability Index is a modified version
of the ODI that was designed for use in patients
While useful and informative, general health
with cervical spine complaints and shares many
PROMs are not used as primary end points in
of its characteristics. It has the same length (ten
most orthopedic research because, when used in
questions) and is scored in the same fashion. Like
isolation, they lack the responsiveness needed to
the ODI, NDI is one of the most widely used
assess the true impact of an orthopedic
PROMs for neck health. It is commonly used in
intervention.
12  Types of Scoring Instruments Available 101

large randomized trials and as a primary outcome questions scored from 0 to 3 and assesses pain,
in meta-analysis [31, 58]. Also like the ODI, NDI instability, and ability to perform activities of
has been demonstrated to have acceptable psy- daily living with the affected shoulder. The total
chometric properties [27, 73]. score is converted to a 100-point scale, with
higher scores indicating better outcomes [59].
The ASES shoulder outcome score has been
12.5.3  Disabilities of the Arm, validated for use in patients with a wide range of
Shoulder, and Hand Score shoulder diagnoses, including shoulder instabil-
(DASH) and Quick-DASH ity [50].
(Hand, Wrist, and Elbow)

The Disabilities of the Arm, Shoulder, and Hand 12.5.5  Western Ontario Shoulder
Score (DASH) was developed by the Upper Instability Index (WOSI;
Extremity Collaborative Group (EUCG), a joint Shoulder)
initiative made up of members from AAOS, the
Council of Musculoskeletal Specialty Societies The Western Ontario Shoulder Instability
(COMSS), and the Institute for Work and Index (WOSI) was developed in 1998 for use
Health. It debuted in 1996 and was designed to in patients with complaints related to shoulder
evaluate disorders of the upper limb, from the instability. It consists of 21 questions that
shoulder to the hand [2, 32]. The DASH consists evaluate physical symptoms, sports/recre-
of 38 questions, each of which is scored on a ation/work, lifestyle, and emotions. Each
five-point Likert scale. Lower scores indicate question is scored on a 100-mm visual analog
less impairment. What makes the DASH rather scale (VAS), making the total score 2100.
unique is that patients are assessed on their abil- Lower scores indicate less impairment and
ity to complete a task regardless of which limb symptoms [40, 66]. Because this instrument
is needed to perform that action. This is consid- was designed with a rather narrow focus (for
ered both a strength and limitation of the use in shoulder instability only), it is reason-
instrument. able to report the minimal clinically important
The Quick-DASH is an abbreviated version difference (a concept that will be discussed
that was developed and released in 2005. The shortly), which has been reported as 220
Quick-DASH includes only 11 questions but still (10.4%) in the literature [39].
maintains excellent correlation with its parent
survey [4].
12.5.6  Oxford Shoulder Score (OSS;
Shoulder)
12.5.4  American Shoulder and Elbow
Surgeons Standardized The Oxford Shoulder Score (OSS) is a 12-item
Shoulder Outcome Score PROM developed in 1996 by a group of research-
(ASES; Shoulder) ers at Oxford University with the intention of
measuring the effect of surgical intervention for a
The American Shoulder and Elbow Surgeons variety of shoulder diagnoses, except instability
Standardized Shoulder Outcome Score (ASES) [17]. Each item is scored on a five-point scale.
is the second half of a dual outcome measure. The sum of the 12 questions is then converted
The first half of the assessment includes a set of such that higher scores indicate improved out-
physician-­rated questions. These are typically comes [18]. The OSS has been shown to be valid,
not reported in the literature. The second half, reliable, and sensitive to changes over time after
which is completed by patients, consists of ten shoulder surgery [79].
102 J. F. Vega and K. P. Spindler

12.5.7  Hip Disability WOMAC in its entirety and thereafter adds the
and Osteoarthritis Outcome same two domains that were developed to create
Score (HOOS) and HOOS Joint the HOOS (sports/recreation and knee-related
Replacement (HOOS JR; Hip) quality of life) [61]. In some ways, the inclusion
of the WOMAC represents a limitation of the
The Hip Disability and Osteoarthritis Outcome KOOS (and the HOOS), as two of the most com-
Survey (HOOS) is a 40-item outcome measure monly performed knee procedures—anterior cru-
developed by Roos and colleagues in the early ciate ligament reconstruction and arthroscopic
2000s. It includes the Western Ontario McMaster partial meniscectomy—are typically done on
osteoarthritis score (WOMAC) in its entirety and younger populations that do not have osteoarthri-
adds an additional two dimensions that assess sports/ tis. However, the benefit of having the WOMAC
recreation and hip-related quality of life, thus allow- built into the questionnaire is that the KOOS (or
ing the HOOS to capture five total domains (includ- the HOOS) can be used in prospective cohort
ing pain, other symptoms, and activities of daily studies or longitudinal databases involving popu-
living, which are captured by the WOMAC) [52]. lations in which osteoarthritis (whether primary
One aspect that makes the HOOS (and the or posttraumatic) is expected to occur.
KOOS) unique is that a total score is not calculated. The KOOS has been validated for use in a
Rather than combined scores, each domain is scored variety of knee diagnosis ranging from osteoar-
separately, producing five subscale scores that range thritis to ACL rupture [5, 22, 36]. The KOOS
from 0 to 100, with higher scores indicating better subscales are also kept separate, although a
outcomes. This is both a strength and limitation to “global” score, dubbed the KOOS4, has been uti-
the HOOS (and KOOS), as the subscales allow for lized in large randomized trials (despite a lack of
more granular assessment of changes to particular evidence supporting the use of the KOOS sub-
domains, while the inability to calculate a total scales in this fashion) [24]. The KOOS4 repre-
score leads to more significant floor/ceiling effects sents an average of four KOOS subscales
and limits one’s ability to compare HOOS scores to (likewise, the KOOS5 represents an average of
other PROMs that generate a single score. The all five KOOS subscales) and is useful for assess-
HOOS has been shown to have excellent psycho- ing outcomes for interventions that are expected
metric properties [41]. Another advantage of the to affect multiple KOOS subscales simultane-
HOOS is that it is responsive enough for use in both ously. The KOOS has been demonstrated to have
short- and long-term studies. excellent psychometric properties as well [15].
The HOOS JR is a six-question short-form Like the HOOS, the KOOS is suitable for use in
version of the HOOS that was developed by both short- and long-term studies.
Lyman and colleagues at the Hospital for Special The KOOS JR is a seven-question short-form
Surgery. It utilizes questions from the “pain” and version of the KOOS that was developed by the
“activities of daily living” domains and has been same HSS group that pioneered the HOOS JR
validated for use in patients undergoing total hip and has been validated for use in total knee
arthroplasty (THA) [46]. arthroplasty (TKA) [45].

12.5.8  Knee Injury and Osteoarthritis 12.5.9  International Knee


Outcome Score (KOOS) Documentation Committee
and KOOS Joint Replacement Subjective Knee Form (IKDC-­
(KOOS JR; Knee) SKF; Knee)

Like the HOOS, the Knee injury and Osteoarthritis The International Knee Documentation
Outcome Score (KOOS) was developed by Roos Committee Subjective Knee Form (IKDC-SKF)
et  al. in the late 1990s. It too includes the debuted in the early 2000s, after years of devel-
12  Types of Scoring Instruments Available 103

opment by Irrgang et al., and features ten ques- currently able to participate. Scores range from 0
tions that address symptoms, sports activities, (unable to work because of knee problems) to 10
and function [34]. It was intended to be a PROM (competitive sports such as soccer, football,
that could be used for any condition involving the rugby on a national/elite level) [10, 72].
knee and has been shown to be valid, reliable,
and responsive across multiple diagnoses [26, 30,
35]. The raw score is transformed to a 100-point 12.5.12  Foot and Ankle Ability
scale, with higher scores indicating better Measure (FAAM; Foot
outcomes. and Ankle)

The Foot and Ankle Ability Measure (FAAM) is


12.5.10  Marx Activity Rating Scale a region-specific PROM designed by Martin
(Knee) et al. in 2005 for use in patients with leg, ankle,
and foot disorders. It contains two subscales—
The Marx Activity Rating Scale was developed “sports” and “activities of daily living”—which,
by Marx and colleagues and debuted in 2001. It is combined, have 29 questions that are scored on a
a four-item questionnaire that assesses the peak five-point Likert scale. The raw scores of each
frequency of patients’ participation in running, subscale are kept separate and converted to a per-
cutting, deceleration, and pivoting activities over centage, with higher scores indicating better out-
the last 12  months [49]. Answers are on a 0–4 comes. The FAAM has been shown to be valid,
scale, with a maximum score of 16. Higher scores reliable, and responsive across a spectrum of
indicate more frequent participation in the afore- diagnoses involving the leg, foot, and ankle [12,
mentioned activities. 20, 47, 48].

12.5.11  Tegner Activity Level (Knee) 12.5.13  Foot and Ankle Disability


Index (FADI; Foot and Ankle)
The Tegner Activity Scale is an alternative to the
Marx. It asks patients to indicate the highest level The Foot and Ankle Disability Index (FADI) was
of activity that they participated in prior to their also designed by Martin et  al. and includes the
knee injury and the highest level that they are FAAM in its entirety along with an additional five
questions (four regarding pain and one regarding
sleep). It is scored in the same fashion as the
Clinical Vignette 3 FAAM and interpreted in the same way as well.
One of the largest and most well-known Like the FAAM, the FADI has been shown to
studies in sports medicine is the Multicenter have acceptable psychometric properties [20, 28].
Orthopaedics Outcome Network (MOON)
ACL reconstruction prospective cohort
study. The MOON cohort now includes 12.5.14  Foot and Ankle Outcome
nearly 3500 ACL reconstructions and has Score (FAOS; Foot and Ankle)
achieved >80% follow-up at 2, 6, and
10  years post-op. The MOON group was The Foot and Ankle Outcome Score (FAOS) is a
one of the first to use PROMs as a primary lower leg adaptation of the KOOS and features
outcome in an ACL reconstruction cohort. the same number of questions in the same sub-
The co-investigators of MOON chose to scale format [60]. Like the KOOS, the five FAOS
utilize three PROMs—KOOS, IKDC, and subscales are scored separately and range from 0
Marx [67]. to 100, with higher scores indicating superior
outcomes.
104 J. F. Vega and K. P. Spindler

12.5.15  Achilles Tendon Total identify the driver of the poor outcome (e.g., the
Rupture Score (ATRS; Foot patient could have pain, poor function, additional
and Ankle) symptoms, or a combination of the three).
The SANE is typically administered as an
The Achilles Tendon Total Rupture Score (ATRS) adjunct to other PROMs such as the IKDC or the
was developed to assess outcomes of patients ASES rather than as a standalone PROM.
having suffered total Achilles tendon ruptures. It
features ten questions and has shown satisfactory
psychometric properties [53]. 12.6.2  Patient Acceptable Symptom
State (PASS)

12.6 Single-Item Measures The Patient Acceptable Symptom State (PASS)


of Outcome is another single-item outcome measure that has
grown in popularity over the past decade. The
While PROMs such as those mentioned above PASS is believed to be a threshold beyond which
provide significant detail in terms of symptoms the patient “feels well” [43]. Although there are
and function from the patient’s perspective, they multiple versions of the PASS question, one of
are often difficult to interpret clinically, particu- the most common versions is “taking into
larly at baseline. Single-item outcome measures account all the activity you have during your
such as the Single Assessment Numeric daily life, your level of pain, and also your activ-
Evaluation (SANE) and the Patient Acceptable ity limitations and participation restrictions, do
Symptom State (PASS) are more readily inter- you consider the current state of your [joint] sat-
preted in the clinic, are very brief to administer, isfactory?” Much of the current research involv-
and can provide additional information when ing the PASS aims to identify thresholds of
used in conjunction with lengthier PROMs. other commonly used PROMs (such as the
KOOS or the IKDC) that correlate with PASS
[21, 33, 51, 76]. It remains unclear whether
12.6.1  Single Assessment Numeric achievement of PASS is driven by a degree of
Evaluation (SANE) change in symptoms or by the realization of a
symptom threshold. This is an area currently
The Single Assessment Numeric Evaluation under investigation.
(SANE) is a single question that asks, “How Like the SANE, the PASS is typically admin-
would you rate your [joint] today as a percentage istered alongside other PROMs rather than as a
of normal (0 to 100% scale, with 100% being standalone measure.
normal)?” Despite its simplicity, the SANE cor-
relates well with longer PROMs such as the
IKDC (for use in knee-related diagnoses), the 12.7 Observer-Reported Outcome
ASES (for use in shoulder-related diagnoses), Measures (OROMs)
and other PROMs [56, 57, 65, 74, 75, 77]. The
advantages of the SANE include its brevity (one Patient-reported outcome measures provide a
question) and interpretability (from the patients’ wealth of information regarding the impact of
perspectives). disease on patient symptoms, quality of life, and
However, the major shortcoming of the SANE function from the patient’s perspective.
(and other single-item outcome measures) is its However, they fail to capture certain variables
multidimensional nature, which inhibits inter- that are more relevant to physicians than to
pretability when patients report low SANE patients (e.g., laxity measurement following an
scores, as the clinician or researcher is unable to ACL reconstruction). Thus, observer-reported
12  Types of Scoring Instruments Available 105

outcome measures (OROMs) are also important 12.7.4  Functional Tests


tools for use in clinical research. These include
physical exam measurements, functional tests, Another category of OROMs commonly used in
and imaging classification schemes. The list of orthopedic research is functional testing. One
available OROMs is even longer than the list of example of a commonly used functional test is
commonly used PROMs, and, thus, this portion the hop test following ACL reconstruction [1]. In
of the chapter will review important characteris- an ideal world, a functional test bridges the gap
tics of OROMs to consider when choosing one between a PROM and a pure OROM; however,
rather than attempting to review a handful of functional tests tend to be limited by significant
OROMs. issues in terms of reliability and reproducibility.

12.7.1  Physical Examination 12.7.5  Imaging Measurement


Measurements Techniques

Physical exam measurements can be separated Imaging measurement techniques are other com-
into two major categories—traditional “hands- monly used OROMs. Examples of such measures
­on” techniques and instrumented techniques. include the Kellgren-Lawrence grading scheme
and the Osteoarthritis Research Society
International (OARSI) classification score, both of
12.7.2  Traditional “Hands-On” which are used to assess radiographic changes
Techniques associated with knee osteoarthritis [42]. Additional
examples include the modified Outerbridge scale,
Traditional “hands-on” techniques include basic the Whole-Organ Magnetic Resonance Imaging
physical exam variables such as range of motion Score (WORMS), and the Knee Osteoarthritis
(ROM) and strength, as well as more nuanced Scoring System (KOSS), all of which can be used
examination techniques or special tests (e.g., to evaluate magnetic resonance images in a way
Lachman exam, meniscal testing, special exami- that is standardized and reliable [55, 78].
nation maneuvers of the shoulder, etc.). When
choosing which, if any, of these OROMs to Take-Home Message
include when designing a study, it is important to • Numerous scoring instruments are available
consider characteristics such as reliability (both for use in clinical research.
inter and intra-rater), repeatability, and reproduc- • It is important to recognize the strengths and
ibility [3]. limitations of the instrument(s) that will be
used when designing a clinical research study.
• No perfect scoring instrument exists, and the
12.7.3  Instrumented Techniques soundest clinical research typically utilizes a com-
bination of multiple instruments including both
Instrumented techniques, such as the use of the PROMs (general and joint-specific) and OROMs.
KT-1000 arthrometer or the Telos SD 900 for the
measurement of knee laxity, are used to minimize
measurement error (thus increasing the reliability References
of a measurement) [8]. While the increase in reli-
ability is beneficial, instrumented techniques 1.
Abrams GD, Harris JD, Gupta AK, et  al.
typically increase the cost of conducting a study Functional performance testing after anterior cru-
ciate ligament reconstruction. Orthop J Sports
considerably when compared to using “hands- Med. 2014;2(1):2325967113518305. https://doi.
­on” techniques. org/10.1177/2325967113518305.
106 J. F. Vega and K. P. Spindler

2. Angst F, Schwyzer H-K, Aeschlimann A, Simmen 14. Collier R.  Legumes, lemons and streptomycin: a

BR, Goldhahn J.  Measures of adult shoulder func- short history of the clinical trial. Can Med Assoc
tion: Disabilities of the Arm, Shoulder, and Hand J. 2009;180(1):23–4. https://doi.org/10.1503/
Questionnaire (DASH) and its short version cmaj.081879.
(QuickDASH), Shoulder Pain and Disability Index 15. Collins NJ, Prinsen CA, Christensen R, Bartels EM,
(SPADI), American Shoulder and Elbow Surgeons Terwee CB, Roos EM. Knee Injury and Osteoarthritis
(ASES) Society Standardized Shoulder Assessment Outcome Score (KOOS): systematic review and
Form, Constant (Murley) Score (CS), Simple Shoulder meta-analysis of measurement properties. Osteoarthr
Test (SST), Oxford Shoulder Score (OSS), Shoulder Cartil. 2016;24(8):1317–29. https://doi.org/10.1016/j.
Disability Questionnaire (SDQ), and Western Ontario joca.2016.03.010.
Shoulder Instability Index (WOSI). Arthritis Care 16. Davis JC, Bryan S.  Patient Reported Outcome

Res. 2011;63(S11):S174–88. https://doi.org/10.1002/ Measures (PROMs) have arrived in sports and exer-
acr.20630. cise medicine: why do they matter? Br J Sports
3. Bartlett JW, Frost C.  Reliability, repeatability and Med. 2015;49(24):1545–6. https://doi.org/10.1136/
reproducibility: analysis of measurement errors in bjsports-2014-093707.
continuous variables. Ultrasound Obstet Gynecol. 17. Dawson J, Fitzpatrick R, Carr A. Questionnaire on the
2008;31(4):466–75. https://doi.org/10.1002/ perceptions of patients about shoulder surgery. J Bone
uog.5256. Joint Surg Br. 1996;78(4):593–600.
4. Beaton DE, Wright JG, Katz JN, Upper Extremity 18. Dawson J, Rogers K, Fitzpatrick R, Carr A.  The

Collaborative Group. Development of the Oxford shoulder score revisited. Arch Orthop Trauma
QuickDASH: comparison of three item-reduction Surg. 2009;129(1):119–23. https://doi.org/10.1007/
approaches. J Bone Joint Surg Am. 2005;87(5):1038– s00402-007-0549-7.
46. https://doi.org/10.2106/JBJS.D.02060. 19. Devlin NJ, Brooks R. EQ-5D and the EuroQol group:
5. Bekkers JEJ, de Windt TS, Raijmakers NJH, Dhert past, present and future. Appl Health Econ Health
WJA, Saris DBF.  Validation of the Knee Injury and Policy. 2017;15(2):127–37. https://doi.org/10.1007/
Osteoarthritis Outcome Score (KOOS) for the treat- s40258-017-0310-5.
ment of focal cartilage lesions. Osteoarthr Cartil. 20. Eechaute C, Vaes P, Van Aerschot L, Asman S, Duquet
2009;17(11):1434–9. https://doi.org/10.1016/j. W.  The clinimetric qualities of patient-assessed
joca.2009.04.019. instruments for measuring chronic ankle instability:
6. Berlowitz DR, Foy CG, Kazis LE, et al. Effect of inten- a systematic review. BMC Musculoskelet Disord.
sive blood-pressure treatment on patient-reported out- 2007;8:6. https://doi.org/10.1186/1471-2474-8-6.
comes. N Engl J Med. 2017;377(8):733–44. https:// 21. Emerson Kavchak AJ, Cook C, Hegedus EJ, Wright
doi.org/10.1056/NEJMoa1611179. AA.  Identification of cut-points in commonly used
7. Bhatt A.  Evolution of clinical research: a history hip osteoarthritis-related outcome measures that
before and beyond James Lind. Perspect Clin Res. define the patient acceptable symptom state (PASS).
2010;1(1):6–10. Rheumatol Int. 2013;33(11):2773–82. https://doi.
8. Boyer P, Djian P, Christel P, Paoletti X, Degeorges R. org/10.1007/s00296-013-2813-1.
[Reliability of the KT-1000 arthrometer (Medmetric) 22. Engelhart L, Nelson L, Lewis S, et  al. Validation of
for measuring anterior knee laxity: comparison with the knee injury and osteoarthritis outcome score sub-
Telos in 147 knees]. Rev Chir Orthop Reparatrice scales for patients with articular cartilage lesions of
Appar Mot. 2004;90(8):757–64. the knee. Am J Sports Med. 2012;40(10):2264–72.
9. Brand RA. Ernest Amory Codman, MD, 1869–1940. https://doi.org/10.1177/0363546512457646.
Clin Orthop. 2009;467(11):2763–5. https://doi. 23. Fairbank JC, Couper J, Davies JB, O’Brien JP.  The
org/10.1007/s11999-009-1047-8. Oswestry low back pain disability questionnaire.
10. Briggs KK, Steadman JR, Hay CJ, Hines SL. Lysholm Physiotherapy. 1980;66(8):271–3.
score and Tegner activity level in individuals with nor- 24. Frobell RB, Roos EM, Roos HP, Ranstam J,

mal knees. Am J Sports Med. 2009;37(5):898–901. Lohmander LS.  A randomized trial of treatment
https://doi.org/10.1177/0363546508330149. for acute anterior cruciate ligament tears. N Engl J
11. Browne JP, Cano SJ, Smith S. Using patient-reported Med. 2010;363(4):331–42. https://doi.org/10.1056/
outcome measures to improve health care: time for a NEJMoa0907797.
new approach. Med Care. 2017;55(10):901. https:// 25. Garratt A, Schmidt L, Mackintosh A, Fitzpatrick

doi.org/10.1097/MLR.0000000000000792. R.  Quality of life measurement: bibliographic study
12. Carcia CR, Martin RL, Drouin JM.  Validity
of patient assessed health outcome measures. BMJ.
of the foot and ankle ability measure in ath- 2002;324(7351):1417.
letes with chronic ankle instability. J Athl Train. 26.
Greco NJ, Anderson AF, Mann BJ, et  al.
2008;43(2):179–83. Responsiveness of the International Knee
13. Codman EA. The classic: the registry of bone sarco- Documentation Committee Subjective Knee
mas as an example of the end-result idea in hospital Form in comparison to the Western Ontario and
organization. Clin Orthop. 2009;467(11):2766–70. McMaster Universities Osteoarthritis Index, modi-
https://doi.org/10.1007/s11999-009-1048-7. fied Cincinnati Knee Rating System, and Short Form
12  Types of Scoring Instruments Available 107

36  in patients with focal articular cartilage defects. 38. Kazis LE, Ren XS, Lee A, et  al. Health status in
Am J Sports Med. 2010;38(5):891–902. https://doi. VA patients: results from the Veterans Health Study.
org/10.1177/0363546509354163. Am J Med Qual. 1999;14(1):28–38. https://doi.
27. Hains F, Waalen J, Mior S.  Psychometric properties org/10.1177/106286069901400105.
of the neck disability index. J Manip Physiol Ther. 39. Kirkley A, Griffin S, Dainty K. Scoring systems for the
1998;21(2):75–80. functional assessment of the shoulder. Arthroscopy.
28. Hale SA, Hertel J.  Reliability and sensitivity of the 2003;19(10):1109–20. https://doi.org/10.1016/j.
foot and ankle disability index in subjects with chronic arthro.2003.10.030.
ankle instability. J Athl Train. 2005;40(1):35–40. 40. Kirkley A, Griffin S, McLintock H, Ng L. The devel-
29. Hays RD, Bjorner JB, Revicki DA, Spritzer KL,
opment and evaluation of a disease-specific quality
Cella D. Development of physical and mental health of life measurement tool for shoulder instability. The
summary scores from the patient-reported outcomes Western Ontario Shoulder Instability Index (WOSI).
measurement information system (PROMIS) global Am J Sports Med. 1998;26(6):764–72. https://doi.org
items. Qual Life Res. 2009;18(7):873–80. https://doi. /10.1177/03635465980260060501.
org/10.1007/s11136-009-9496-9. 41. Klässbo M, Larsson E, Mannevik E.  Hip dis-

30. Higgins LD, Taylor MK, Park D, et  al. Reliability ability and osteoarthritis outcome score. An
and validity of the International Knee Documentation extension of the Western Ontario and McMaster
Committee (IKDC) Subjective Knee Form. Joint Bone Universities Osteoarthritis Index. Scand J Rheumatol.
Spine. 2007;74(6):594–9. https://doi.org/10.1016/j. 2003;32(1):46–51.
jbspin.2007.01.036. 42. Kohn MD, Sassoon AA, Fernando ND. Classifications
31. Hu Y, Lv G, Ren S, Johansen D.  Mid- to long- in brief: Kellgren-Lawrence classification of osteoar-
term outcomes of cervical disc arthroplasty ver- thritis. Clin Orthop. 2016;474(8):1886–93. https://
sus anterior cervical discectomy and fusion for doi.org/10.1007/s11999-016-4732-4.
treatment of symptomatic cervical disc disease: a 43. Kvien TK, Heiberg T, Hagen KB. Minimal clinically
systematic review and meta-analysis of eight pro- important improvement/difference (MCII/MCID)
spective randomized controlled trials. PLoS One. and patient acceptable symptom state (PASS):
2016;11(2):e0149312. https://doi.org/10.1371/jour- what do these concepts mean? Ann Rheum Dis.
nal.pone.0149312. 2007;66(Suppl 3):iii40–1. https://doi.org/10.1136/
32. Hudak PL, Amadio PC, Bombardier C. Development of ard.2007.079798.
an upper extremity outcome measure: the DASH (dis- 44. Laucis NC, Hays RD, Bhattacharyya T.  Scoring the
abilities of the arm, shoulder and hand) [corrected]. The SF-36 in orthopaedics: a brief guide. J Bone Joint Surg
Upper Extremity Collaborative Group (UECG). Am J Am. 2015;97(19):1628–34. https://doi.org/10.2106/
Ind Med. 1996;29(6):602–8. https://doi.org/10.1002/ JBJS.O.00030.
(SICI)1097-0274(199606)29:6<602::AID-AJIM4> 45. Lyman S, Lee Y-Y, Franklin PD, Li W, Cross MB,
3.0.CO;2-L. Padgett DE.  Validation of the KOOS, JR: a short-­
33. Ingelsrud LH, Granan L-P, Terwee CB, Engebretsen form knee arthroplasty outcomes survey. Clin Orthop.
L, Roos EM.  Proportion of patients reporting 2016;474(6):1461–71. https://doi.org/10.1007/
acceptable symptoms or treatment failure and their s11999-016-4719-1.
associated KOOS values at 6 to 24 months after 46. Lyman S, Lee Y-Y, Franklin PD, Li W, Mayman

anterior cruciate ligament reconstruction: a study DJ, Padgett DE.  Validation of the HOOS, JR: a
from the Norwegian Knee Ligament Registry. short-form hip replacement survey. Clin Orthop.
Am J Sports Med. 2015;43(8):1902–7. https://doi. 2016;474(6):1472–82. https://doi.org/10.1007/
org/10.1177/0363546515584041. s11999-016-4718-2.
34. Irrgang JJ, Anderson AF, Boland AL, et  al.
47. Martin RL, Irrgang JJ. A survey of self-reported out-
Development and validation of the international knee come instruments for the foot and ankle. J Orthop
documentation committee subjective knee form. Am J Sports Phys Ther. 2007;37(2):72–84. https://doi.
Sports Med. 2001;29(5):600–13. https://doi.org/10.11 org/10.2519/jospt.2007.2403.
77/03635465010290051301. 48. Martin RL, Irrgang JJ, Burdett RG, Conti SF,

35. Irrgang JJ, Anderson AF, Boland AL, et  al.
Van Swearingen JM.  Evidence of validity for
Responsiveness of the International Knee the foot and ankle ability measure (FAAM).
Documentation Committee Subjective Knee Form. Foot Ankle Int. 2005;26(11):968–83. https://doi.
Am J Sports Med. 2006;34(10):1567–73. https://doi. org/10.1177/107110070502601113.
org/10.1177/0363546506288855. 49. Marx RG, Stump TJ, Jones EC, Wickiewicz TL,

36. Johnson DS, Smith RB.  Outcome measurement in Warren RF. Development and evaluation of an activ-
the ACL deficient knee—what’s the score? Knee. ity rating scale for disorders of the knee. Am J Sports
2001;8(1):51–7. Med. 2001;29(2):213–8.
37. Jones D, Kazis L, Lee A, et al. Health status assessments 50. Michener LA, McClure PW, Sennett BJ.  American
using the veterans SF-12 and SF-36: methods for evalu- Shoulder and Elbow Surgeons Standardized Shoulder
ating otucomes in the Veterans Health Administration. Assessment Form, patient self-report section: reli-
J Ambul Care Manage. 2001;24(3):68–86. ability, validity, and responsiveness. J Shoulder Elb
108 J. F. Vega and K. P. Spindler

Surg. 2002;11(6):587–94. https://doi.org/10.1067/ autograft for ACL reconstruction: is there a differ-


mse.2002.127096. ence in graft failure rate? A meta-analysis of 47,613
51. Muller B, Yabroudi MA, Lynch A, et  al. Defining patients. Clin Orthop. 2017;475(10):2459–68. https://
thresholds for the patient acceptable symptom state doi.org/10.1007/s11999-017-5278-9.
for the IKDC Subjective Knee Form and KOOS 63. Schalet BD, Rothrock NE, Hays RD, et  al. Linking
for patients who underwent ACL reconstruction. physical and mental health summary scores from
Am J Sports Med. 2016;44(11):2820–6. https://doi. the veterans RAND 12-Item Health Survey (VR-12)
org/10.1177/0363546516652888. to the PROMIS® Global Health Scale. J Gen Intern
52. Nilsdotter AK, Lohmander LS, Klässbo M, Roos
Med. 2015;30(10):1524–30. https://doi.org/10.1007/
EM.  Hip disability and osteoarthritis outcome score s11606-015-3453-9.
(HOOS)—validity and responsiveness in total hip 64. Schmidt S, Franke J, Rauschmann M, Adelt D,

replacement. BMC Musculoskelet Disord. 2003;4:10. Bonsanto MM, Sola S.  Prospective, randomized,
https://doi.org/10.1186/1471-2474-4-10. multicenter study with 2-year follow-up to compare
53.
Nilsson-Helander K, Thomeé R, Silbernagel the performance of decompression with and with-
KG, et  al. The Achilles tendon Total Rupture out interlaminar stabilization. J Neurosurg Spine.
Score (ATRS): development and validation. Am 2018;28:1–10. https://doi.org/10.3171/2017.11.
J Sports Med. 2007;35(3):421–6. https://doi. SPINE17643.
org/10.1177/0363546506294856. 65. Shelbourne KD, Barnes AF, Gray T.  Correlation

54. Oak SR, Strnad GJ, Bena J, et al. Responsiveness com- of a single assessment numeric evaluation (SANE)
parison of the EQ-5D, PROMIS Global Health, and rating with modified Cincinnati knee rating sys-
VR-12 questionnaires in knee arthroscopy. Orthop J tem and IKDC subjective total scores for patients
Sports Med. 2016;4(12):2325967116674714. https:// after ACL reconstruction or knee arthroscopy. Am
doi.org/10.1177/2325967116674714. J Sports Med. 2012;40(11):2487–91. https://doi.
55. Peterfy CG, Guermazi A, Zaim S, et  al. Whole-­
org/10.1177/0363546512458576.
organ magnetic resonance imaging score (WORMS) 66. Smith MV, Calfee RP, Baumgarten KM, Brophy

of the knee in osteoarthritis. Osteoarthr Cartil. RH, Wright RW.  Upper extremity-specific measures
2004;12(3):177–90. https://doi.org/10.1016/j. of disability and outcomes in orthopaedic surgery. J
joca.2003.11.003. Bone Joint Surg Am. 2012;94(3):277–85. https://doi.
56. Pietrosimone B, Luc BA, Duncan A, Saliba SA, Hart org/10.2106/JBJS.J.01744.
JM, Ingersoll CD.  Association between the single 67. Spindler KP, Huston LJ. O’Donoghue Sports Injury
assessment numeric evaluation and the Western Award 10 year outcomes and risk factors after ACL
Ontario and McMaster Universities Osteoarthritis reconstruction: a multicenter cohort study. Orthop J
Index. J Athl Train. 2017;52(6):526–33. https://doi. Sports Med. 2017;5(7 Suppl 6):2325967117S00247.
org/10.4085/1062-6050-52.5.07. https://doi.org/10.1177/2325967117S00247.
57. Provencher MT, Frank RM, Macian D, et al. An anal- 68. Spindler KP, Kuhn JE, Freedman KB, Matthews CE,
ysis of shoulder outcomes scores in 275 consecutive Dittus RS, Harrell FE.  Anterior cruciate ligament
patients: disease-specific correlation across multiple reconstruction autograft choice: bone-tendon-bone
shoulder conditions. Mil Med. 2012;177(8):975–82. versus hamstring: does it really matter? A systematic
https://doi.org/10.7205/MILMED-D-11-00234. review. Am J Sports Med. 2004;32(8):1986–95.
58. Radcliff K, Davis RJ, Hisey MS, et al. Long-term eval- 69. Sur RL, Dahm P.  History of evidence-based medi-
uation of cervical disc arthroplasty with the Mobi-C© cine. Indian J Urol. 2011;27(4):487–9. https://doi.
cervical disc: a randomized, prospective, multicenter org/10.4103/0970-1591.91438.
clinical trial with seven-year follow-up. Int J Spine 70. Svensson M, Sernert N, Ejerhed L, Karlsson J,

Surg. 2017;11:31. https://doi.org/10.14444/4031. Kartus JT. A prospective comparison of bone-patellar
59. Richards RR, An KN, Bigliani LU, et al. A standard- tendon-bone and hamstring grafts for anterior cruci-
ized method for the assessment of shoulder function. ate ligament reconstruction in female patients. Knee
J Shoulder Elb Surg. 1994;3(6):347–52. https://doi. Surg Sports Traumatol Arthrosc. 2006;14(3):278–86.
org/10.1016/S1058-2746(09)80019-0. https://doi.org/10.1007/s00167-005-0708-8.
60. Roos EM, Brandsson S, Karlsson J.  Validation of 71. Tarlov AR, Ware JE, Greenfield S, Nelson EC, Perrin
the foot and ankle outcome score for ankle ligament E, Zubkoff M.  The Medical Outcomes Study. An
reconstruction. Foot Ankle Int. 2001;22(10):788–94. application of methods for monitoring the results of
https://doi.org/10.1177/107110070102201004. medical care. JAMA. 1989;262(7):925–30.
61. Roos EM, Roos HP, Lohmander LS, Ekdahl C,
72. Tegner Y, Lysholm J.  Rating systems in the evalu-
Beynnon BD.  Knee Injury and Osteoarthritis ation of knee ligament injuries. Clin Orthop.
Outcome Score (KOOS)—development of a self-­ 1985;198:43–9.
administered outcome measure. J Orthop Sports Phys 73. Vernon H, Mior S.  The Neck Disability Index: a

Ther. 1998;28(2):88–96. https://doi.org/10.2519/ study of reliability and validity. J Manip Physiol Ther.
jospt.1998.28.2.88. 1991;14(7):409–15.
62. Samuelsen BT, Webster KE, Johnson NR, Hewett TE, 74. Williams GN, Gangel TJ, Arciero RA, Uhorchak

Krych AJ. Hamstring autograft versus patellar tendon JM, Taylor DC. Comparison of the single assessment
12  Types of Scoring Instruments Available 109

numeric evaluation method and two shoulder rating arthroscopic correlation. J Bone Joint Surg Am.
scales. Outcomes measures after shoulder surgery. 2014;96(14):1145–51. https://doi.org/10.2106/
Am J Sports Med. 1999;27(2):214–21. https://doi.org JBJS.M.00929.
/10.1177/03635465990270021701. 79. Younis F, Sultan J, Dix S, Hughes PJ. The range of the
75. Williams GN, Taylor DC, Gangel TJ, Uhorchak JM, Oxford Shoulder Score in the asymptomatic popula-
Arciero RA.  Comparison of the single assessment tion: a marker for post-operative improvement. Ann R
numeric evaluation method and the Lysholm score. Coll Surg Engl. 2011;93(8):629–33. https://doi.org/1
Clin Orthop. 2000;373:184–92. 0.1308/003588411X13165261994193.
76. Wright AA, Hensley CP, Gilbertson J, Leland JM III, 80. Zaina F, Tomkins-Lane C, Carragee E, Negrini

Jackson S. Defining patient acceptable symptom state S.  Surgical versus non-surgical treatment for
thresholds for commonly used patient reported out- lumbar spinal stenosis. Cochrane Database
comes measures in general orthopedic practice. Man Syst Rev. 2016;(1):CD010264. https://doi.
Ther. 2015;20(6):814–9. https://doi.org/10.1016/j. org/10.1002/14651858.CD010264.pub2.
math.2015.03.011. 81. Zimerman AL. Evidence-based medicine: a short his-
77.
Wright RW, Baumgarten KM.  Shoulder out- tory of a modern medical movement. Virtual Mentor.
comes measures. J Am Acad Orthop Surg. 2013;15(1):71. https://doi.org/10.1001/virtualmentor.
2010;18(7):436–44. 2013.15.1.mhst1-1301.
78. Wright RW, Ross JR, Haas AK, et  al. Osteoarthritis
classification scales: interobserver reliability and
Health Measurement
Development and Interpretation
13
Andrew Firth, Dianne Bryant, Jacques Menetrey,
and Alan Getgood

13.1 W
 hat Is an Outcome 13.1.1 Discriminative, Predictive,
Measure, and Why Measure and Evaluative Outcomes
Health Outcomes?
To select an appropriate outcome measure for
In 1948, the World Health Organization defined research, the objective of the measurement must
health as ‘a state of complete physical, mental, be clear and aligned with the objectives of the
and social well-being [28]’. This idealistic con- study, the population being investigated, and the
cept of health as a comprehensive, theoretical study methodology [7]. For example, an instru-
framework is problematic for those attempting to ment used to discriminate between individuals
quantify health within an individual to inform might be used to determine a patient’s eligibility
decision-making or within or between groups to for study participation or may be a diagnostic
inform broader clinical recommendations. tool designed to classify individuals as negative
Despite the complexity of health measurement, or positive for disease. For example, the Kellgren
outcome measures are designed with one of three and Lawrence system is used to classify the
purposes in mind: discrimination, evaluation, or severity of radiographic knee osteoarthritis (OA)
prediction. In addition, outcome measures may [10] and is used in conjunction with clinical
be surrogates for important outcomes or them- symptoms to discriminate between individuals
selves measure outcomes that are of direct impor- with advanced OA and those with mild OA.
tance to patients. A predictive instrument, on the other hand, is
used to predict future events. For example, there
is some evidence that a test of a patient’s landing
A. Firth · A. Getgood (*)
biomechanics post-anterior cruciate ligament
Fowler Kennedy Sport Medicine Clinic, 3M Centre,
University of Western Ontario, London, ON, Canada (ACL) reconstruction can predict reinjury [6, 18]
e-mail: afirth5@uwo.ca; alan.getgood@uwo.ca and is therefore used by some clinicians as a
D. Bryant guide for making recommendations regarding
Faculty of Health Sciences, Elborn College, return to sport.
University of Western Ontario, London, ON, Canada Finally, an evaluative measurement tool is
e-mail: dianne.bryant@uwo.ca
used to assess change and is most commonly
J. Menetrey used to measure the effect of a therapy. For exam-
Centre de medicine du sport et de l’exercice,
ple, if an investigator wishes to determine
Hirslanden Clinique La Colline, University Hospital
of Geneva, Geneva, Switzerland whether a regimen of aerobic exercise will
e-mail: Jacques.menetrey@lacolline.ch improve the quality of life, pain, and mobility in
© ISAKOS 2019 111
V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_13
112 A. Firth et al.

patients with osteoarthritis of the knee, she/he the incidence of adverse outcomes (e.g. death,
will administer each of the evaluative measures MI, stroke, revision surgery, etc.) and patient-­
before the intervention and then again after the reported outcome measures (PROMs).
intervention and assess whether scores have A PROM is a subjective assessment completed
improved. Similarly, if a researcher’s objective is by the patient where they are asked to score
to compare the effectiveness of two or more ther- aspects of their own perception of their health
apies, each group of patients will complete the [20]. Some PROMs include only questions related
evaluative measure at the endpoint of interest so to functional ability, called patient-­reported func-
that between-group comparisons can be made. tional ability questionnaires (e.g. Lower Extremity
Functional Score (LEFS), American Shoulder and
Elbow Surgeons (ASES) Score), while others
13.1.2 Surrogate- and Patient-­ attempt to measure health from a more compre-
Important Outcomes hensive perspective and include items querying
physical, mental, and social well-being (e.g.
Surrogate outcomes are not necessarily mean- 12-Item Short-Form Survey (SF-12), Western
ingful to patients but are believed to proxy out- Ontario Rotator Cuff (WORC) Index). The latter
comes that are directly important to patients [2]. are referred to as health-related quality-of-life
Early orthopaedic research often relied solely (HRQOL) instruments.
on surrogate outcomes like imaging, HRQOL measures often fall into one of two
performance-­based tests, or physiological tests categories: disease-specific or generic. Disease-­
to provide evidence of a treatment’s success. specific measures are designed to ask patients
Today, surrogate outcomes are most appropriate about constructs of health directly affected by the
for smaller, explanatory, or proof-of-concept disease in question (e.g. the Western Ontario
studies prior to investing in larger, more prag- Rotator Cuff (WORC) Index is specific to patients
matic studies that will apply to broader popula- with rotator cuff pathology). Generic measures,
tions. For example, shoulder range of motion is on the other hand, ask questions about health
a proxy for shoulder function but is not a direct from a non-specific, highly applicable perspec-
measure of function since an individual may not tive. Thus, while disease-specific measures are
require full motion to complete desired tasks or more sensitive to changes in health [27], generic
can find a way to compensate for a loss of measures have a larger scope and can therefore
motion and still accomplish desired tasks in a be interpreted across different health states,
fulfilling way. Other examples of common sur- including those who are healthy [7].
rogate outcomes are radiographic imaging of Often, the measurement properties of the most
joint space narrowing, a surrogate for pain and widely accepted outcome measures are exten-
impaired function; bone mineral density, a sur- sively published and provide evidence that the
rogate for risk of fragility fracture; strength, a measure is accurate, precise, and able to detect
surrogate for functional ability; etc. change in the population of interest. If this evi-
Although surrogate measures are important dence is not available, those hoping to use the
determinants of health, and help to provide expla- instrument may elect to first assess its measure-
nations for impairments or predict future patient-­ ment properties before implementing its use in
important health issues, pragmatic studies that clinic or as part of a research study or run the risk
aim to make recommendations for practice of collecting uninterpretable data. In the next sec-
change should be measuring effectiveness using tion, we describe some of the most common mea-
a measure of direct importance to patients, like surement properties.
13  Health Measurement Development and Interpretation 113

13.2 Measurement Properties


of an Outcome Measure the LSI is 80%. You want to understand the
error associated with this measurement. In a
13.2.1 Reliability study conducted by Reid et al. [19] in 2007
evaluating the measurement properties of the
Reliability is about the precision of an outcome LSI for all four hop tests, they reported the
measure and evaluates whether it produces con- SEM for the LSI for the single leg hop test to
sistent results when repeatedly administered in a be 4.94%. To place a 95% confidence inter-
sample from a population with a stable health sta- val around the patient’s LSI of 80%, you
tus [23]. Every outcome score consists of two would multiply the SEM by 1.96 (the corre-
components: the patient’s true score and random sponding z-value for the desired confidence
error. An outcome measure is considered more level) to find the lower and upper limit of the
reliable when the random error is small, because possible range of results. In this case, the
a patient’s test performance will more closely individual’s LSI result would have a 95%CI
reflect their true score on the outcome measure of ±6.84%. In other words, the individual’s
with less variability over repeated testing. LSI may be as low as 73% and as high as
We can think of reliability in two different ways: 87%. As a clinician making decisions about
relative or the extent to which the instrument can return to sport, you might evaluate the risk of
differentiate between individuals and absolute or returning to sport if the LSI is actually 73%.
the extent to which repeated administrations of the
test produce consistent results. The most useful
estimate of an instrument’s relative reliability is the In terms of absolute reliability, the most com-
standard error of measurement (SEM). mon methods are test-retest and inter- and intra-­
The SEM is presented in units consistent with rater reliability. Test-retest reliability assumes
the original tool and is an estimate of the error asso- that the health of the patient remains stable but
ciated with an individual’s score. If the magnitude that there may be differences in the results related
of the error is known, it can be used to communi- to errors in how the test is performed. For exam-
cate the accuracy of a single patient’s score on an ple, the drop vertical jump test requires the indi-
outcome measure and can also be used to create a vidual to drop straight down from a box and upon
threshold to determine whether a change in score landing, jump up as high as they can. As you
represents real change over and above measure- might imagine, even though the status of the indi-
ment error. Specifically, the SEM is the square root vidual’s knee stability is unlikely to change over
of the within-patient variance (calculated from the three consecutive tests, the individual’s landing
difference in score within a stable group of indi- biomechanics may be quite different each time.
viduals over repeated measures) for the sample. Regular instrument calibration is also part of
achieving good test-retest reliability.

Clinical Vignette Fact Box 13.1: Interpreting ICC Scores


Consider the following: your patient is ICC scores range from 0 to 1 with values
1-year post ACL reconstruction. One of the closer to 1 indicating better reliability. One
criterion that you use to determine readiness commonly used interpretation of ICC
to return to sport is whether the patient has a scores was suggested by Shrout and Fleiss
limb symmetry index (LSI) of at least 90% in 1979 [22]:
for each of four different hop tests (forward,
triple, and crossover hops for ­distance and ICC < 0.4 = poor reliability
6-m timed hop). Your patient completes the 0.4 > ICC < 0.75 = fair to good reliability
forward hop for distance for both limbs, and ICC > 0.75 = excellent reliability
114 A. Firth et al.

Intra-rater reliability is an assessment of the 13.2.2 Validity


agreement between multiple testing sessions
scored by the same rater [14]. Intra-rater reliabil- Evaluating the validity of an outcome measure is
ity is an important measurement property when all about providing evidence that an instrument is
there is some subjectivity about how the test is able to measure what it was intended to measure
performed or interpreted. For example, using a [17]. It’s important that assessments of validity
goniometer to measure range of motion will carry take place within the population of interest even
some error since location of anatomical land- if there is sufficient evidence of validity in other
marks and placement of the instrument may vary populations. There are several methods to evalu-
between repeated administrations. Inter-rater ate validity. In this chapter, we will describe face,
reliability measures the agreement between content, criterion, and construct validity.
assessments scored by more than one independent Face validity is a superficial and insufficient
rater [14] and is important to understand if more means to assess validity. The basic premise is that
than one person is responsible for measuring persons with perhaps insufficient expertise and
patient outcomes. For example, there may be an unsystematic and incomprehensive approach
more than one research assistant (RA) working at comment as to whether the instrument appears to
your clinic who will contribute measurement data measure what is intended [16].
to your surveillance database or multiple RAs On the other hand, when developing an instru-
working at different sites participating in a multi- ment, content validity is approached by system-
centre research study. Both intra- and i­nter-­rater atically operationalizing the content that
reliability are usually improved with standardized comprehensively represents the construct. For
protocols and training opportunities for personnel example, when an instrument is being developed,
responsible for outcome measurement. content validity is likely to be achieved if items
Although there is no consensus regarding the are generated by a large sample of individuals
accepted timeframe between testing sessions for who have experience with the disease including
any evaluation of reliability, the length of time patients, their caregivers and family members,
should be such that true change is unexpected but and expert clinicians [17]. Post development,
not so close together that agreement is overesti- proper evaluation of content validity involves
mated because the individual or rater remembers large patient groups and expert clinicians who
their previous score or rating [1, 15]. Statistics use set criteria to determine whether the construct
used to express the level of absolute reliability of health as it relates to a specific health issue has
are the Kappa for nominal measures (e.g. lift-off been captured comprehensively. For example, a
test for the presence or absence of a subscapularis new questionnaire meant to measure quality of
tear), a weighted Kappa for ordered outcomes life in patients suffering from pathology related
(e.g. Kellgren and Lawrence Grade 0, 1, 2, 3, or 4 to the foot and ankle may begin by organizing
osteoarthritis), or an intraclass correlation coef- items into broad categories of health (physical,
ficient (ICC), for continuous outcome measures social, mental) and then into domains that might
like most PROMs. include such topics as pain, functional ability for
Although it is important that the results pro- ADLs, work expectations, participation in family
vided by an outcome measure are reproducible, and community roles, anxiety, depression, etc.
reproducibility does not guarantee validity. For Criterion validity compares the results of the
example, a risk assessment for ACL rupture may new instrument to the best available method
be highly reliable when completed by a patient (often coined ‘gold standard’) to evaluate their
with knee osteoarthritis, yet it provides no useful association to one another; the larger the associa-
information regarding disease progression as the tion, the more evidence there is for criterion
questionnaire lacks context and meaning in this validity [17]. For instance, an open surgical pro-
population. Reliability is important, but we require cedure was at one time the only means to evalu-
validity to provide meaning to outcome measures. ate the integrity of the rotator cuff. When
13  Health Measurement Development and Interpretation 115

advances in imagining became available, like Both criterion and construct validity can be
magnetic resonance imaging and ultrasound, evaluated at a single point in time as just
these imaging methods were compared to obser- described, called cross-sectional validity, or, over
vations during surgery to determine their accu- time, to evaluate whether changes in the new
racy. In this example, the observations made instrument are associated to changes in existing
during surgery served as the gold standard. measures [3, 11, 13, 25]. If we return to our
Statistics like sensitivity and specificity are often example of a new knee OA-specific question-
used to communicate accuracy. naire, we could hypothesize that, over time, the
True gold standards are common for measures mental component score (MCS) of the SF12 may
of structures, physiology, physical function, and remain relatively unchanged despite small to
performance; however, they rarely exist for con- moderate changes in the knee OA-specific ques-
structs like HRQOL. For this more abstract para- tionnaire. Selecting outcome measures that have
digm, we evaluate validity using a theoretical demonstrated longitudinal construct validity
framework of health against which we make within the relevant study population is important
assumptions about how the instrument should when designing clinical trials where instruments
function if it’s accurately capturing the construct. are used to assess health status prior to and fol-
This type of validity is referred to as construct lowing treatment.
validity [9]. Although important, validity is insufficient for
To assess construct validity, we hypothesize an investigator wanting to make clinical recom-
the magnitude and direction of the correlation mendations based on the results of an evaluative
between the new outcome measure and related outcome measure. While validity shows that the
aspects of health to determine whether the new outcome measure is associated with changes in
test behaves as expected. For example, we may health, it lacks interpretability. Interpretability of
hypothesize that as the severity of disease changes in health states begins with our final two
increases, the quality of life score will decrease measurement properties, sensitivity to change
(directional hypothesis) and that the association and responsiveness. Sensitivity to change refers
between severity and quality of life will be small to the ability of an outcome measure to detect
(expected magnitude). As part of constructing changes in health when change has occurred but
these hypotheses, it’s common to compare the does not provide any insight as to the significance
performance of the new instrument to outcome of the change to those being treated [13].
measures designed to assess a similar attribute Conversely, responsiveness describes the out-
(convergent validity) [17, 23]. For example, we come measure’s ability to detect meaningful
may hypothesize that as the score on our new change from the patient’s perspective, no matter
disease-specific quality of life questionnaire how small the change is [13].
increases, so too will the physical component
score (PCS) of the generic health-related quality
of life instrument, the 12-Item Short-Form 13.2.3 Responsiveness
Survey (SF12).
Conversely, discriminant validity assesses Assessing responsiveness requires a marker of
whether the outcome measure performs better clinical importance. Traditional methods of
than an instrument designed to assess a general or determining responsiveness use either an anchor-­
unrelated construct when measuring the outcome based (patient defined) or distribution-based (sta-
of interest [17, 23]. For instance, the investigator tistical) approach. The goal of either method is to
could compare a new knee OA-specific question- define the minimally clinically important differ-
naire, to a generic measure of mental health, like ence (MCID). MCID, originally proposed by
the mental component score (MCS) of the SF12, Jaeschke et al. [8] in 1989, is the smallest change
and expect that mental health is only very weakly in the outcome of interest that informed parties
associated to OA-related quality of life. (patient/clinician) consider important, whether
116 A. Firth et al.

favourable or harmful, which would cause those the difference (i.e. Is the difference sufficient to
involved to consider changing treatment [21]. change practice?). Specifically, a study result may
Using the anchor-based approach requires the achieve statistical significance even though the dif-
patient to complete the new questionnaire before ference is small and unimportant if the sample is
and after change is expected and to complete a highly homogeneous and/or quite large. On the
global rating of change (GRC) questionnaire. A other hand, an important difference may never
GRC requires the patient to indicate whether they reach statistical significance in a study with a het-
have improved, remained unchanged, or worsened; erogeneous sample and/or small sample size.
and if they have changed, they next indicate by how Clinicians scrutinizing or reporting results should
much they have changed. An estimate of the MCID place little interest on the p-value and instead con-
is made by calculating the average change score of sider indicators of precision, like sample size and
patients who indicated that they changed by a confidence intervals. Understanding the play of
‘small but important amount’ on the GRC. chance or random sampling error is also extremely
When determining responsiveness from a important in deciding how much weight to place
distribution-­based perspective, investigators may on the results of a study.
construct two normal distributions: the first con-
structed from the change scores of patients who
should have remained stable over the two mea- 13.3.1 Random Sampling Error
surements and the second constructed from the and Sample Size
change scores of patients who are expected to
have changed over the two measurements. The Producing a study with results that are applicable
value of the MCID is said to be greater than the to the population is the overarching goal of clini-
majority of scores in the first distribution and less cal research. When we conduct a study, the par-
than the majority of scores in the second ticipants are only a sample of the population. The
distribution. smaller our sample, the more likely we are, just
by chance, to recruit a nonrepresentative sample.
The more representative your sample, the more
13.3 I nterpreting the Results: Bias likely that estimates of treatment effect observed
and Precision with the sample will also apply to the population.
As the sample size becomes larger (closer to the
Once the appropriate outcome measures have size of the population) or the results of repeated
been used to collect data, the challenge becomes studies are pooled together (e.g. in a meta-­
presenting the data in a meaningful way. Simply analysis), the probability of a random sampling
presenting the results of statistical tests of the error decreases. A sample size estimate takes into
data (i.e. p-values) or the summary measures consideration the variability between individuals
(e.g. mean change pre- to post- or mean differ- in the population and the size of the expected dif-
ence between groups) is often meaningless to ference and provides an estimate of the size of the
those without an incredible amount of experience sample required to overcome random sampling
using the same outcome measure, which is error. Although a larger sample size will help to
extremely rare outside of intensive research overcome the risk of random sampling error, it
programmes. cannot overcome sampling bias in the results that
The problem with only presenting p-values (the arise from purposefully recruiting an unrepresen-
probability of an observed result given the null tative sample (e.g. selecting only those most
hypothesis is true) [26] is that p-values are highly likely to be responsive, adherent, etc.).
sample-specific because they are influenced by the In the next section, we introduce the impor-
sample’s variability and sample size. A p-value tance of including estimates of precision, like
does not express the magnitude of effect, the confidence intervals (CIs) when reporting or
reproducibility of the result, or the importance of interpreting the results of a study. But even confi-
13  Health Measurement Development and Interpretation 117

dence intervals cannot overcome random sam- For example, the 95% CI around the relative
pling error; thus, one should always be wary of risk of revision surgery is 3.5–8.6 (where
the applicability of studies with a small sample p  <  0.05); both sides of the CI are relaying the
size (usually defended by a sample size calcula- same message that the risk of surgery is much
tion with an unreasonably large expected effect greater in one group compared to the other. In
size) or a highly selective method of sampling. this situation, the study can make definitive con-
clusions; in a large study with a representative
sample, this study convincingly demonstrates the
13.3.2 Confidence Intervals (CIs) benefit of one intervention over the other. If, on
the other hand, the 95% CI around the relative
A CI represents the range in which the true score risk of revision surgery is 1.01–10.5, the lower
is expected to lie. Selecting a 95% CI represents boundary of the CI is implying that there is not
means the investigator can be 95% certain that much risk of revision surgery in one group versus
the true population mean would be contained another, whereas the upper boundary of the CI is
within the interval if repeated representative sam- implying that there is 10× greater risk of revision
ples were studied from the population. CIs should surgery in one group compared to the other. In
be reported no matter the summary measure this situation, even though p < 0.05, the message
being used to present the results (e.g. relative is unclear; the study cannot make definitive
risk, odds ratio, mean difference, etc.). Then, conclusions.
instead of depending on the p-value to reach con- Interpreting the CIs around the results of stud-
clusions about the results of a study, one should ies that use PROMs presents an additional chal-
interpret the lower and upper boundaries pro- lenge. Specifically, how do we know whether the
vided by the CI. If the lower and upper boundar- difference being presented represents an impor-
ies of the CI are implying similar conclusions, tant finding?
then the study may make definitive conclusions
assuming the study sample is sufficiently large
and representative of the population. 13.4 Reporting and Interpreting
the Results

Fact Box 13.2: Confidence Intervals 13.4.1 Interpreting Change


Confidence intervals allow the reader to in Individual Patients
interpret the clinical meaningfulness of
study results instead of just statistical sig- To determine whether true change, as opposed
nificance. When interpreting statistical sig- to error, is the cause for different scores over
nificance and CIs, consider: time for an individual patient, we can calculate
a minimally detectable change (MDC). The
If p  <  0.05 (statistically significant), MDC is the threshold that defines true change
does the lower limit of the CI include over error and is calculated as the
differences that are not important? If it SEM  *  (z-score)  *  sqrt(2). Consider the previ-
doesn’t, the results are definitive in ous example where the SEM for the hop test LSI
favour of treatment, while if it does, the was 4.94%. Given the results of the first hop test
results are uncertain. (LSI  =  80%), you have asked your patient to
If p > 0.05 (not statistically significant), undergo another 6 weeks of physiotherapy, this
does the upper limit of the CI include time concentrating on training landing biome-
important differences? If it does, the chanics and sport-specific exercises. Upon com-
results are uncertain, while if it doesn’t, pletion of this additional rehabilitation, the new
the treatment is definitively ineffective. value for the LSI is 95%. For a MDC(95), the
z-score is 1.96, which means that the hop test
118 A. Firth et al.

LSI needs to change by approximately 14  cm MCID available in the literature for most PROMs
for the clinician to be certain that the change iswill have been predominantly determined using
real and not measurement error. In our scenario, within-group change (as described in our reliabil-
the difference in LSI between time one and time ity section), which is appropriate for interpreting
two is (95–80) 15 cm, and thus, the clinician can pre-post-intervention studies, but not appropriate
be certain that true change has occurred. Further,for interpreting between-group studies. Here’s
the lower boundary of the 95% CI around the why; consider the amount of change that will be
second LSI is 88%, which is much closer to observed from pre- to postoperative total knee
the desired 90% in terms of recommending that replacement. Following recovery from a TKA,
the patient returns to sport. we expect quite remarkable improvements. For
example, in a study by Kahn et  al. [9] that
included 172 patients who underwent TKA, the
13.4.2 Interpreting Change Within preoperative total WOMAC score was around 35
a Group of Patients points, and the postoperative score was around 12
points; 95% CI around the mean change is
To report the results of a within-group change, approximately 20–24 points. If the MCID is
one could report the proportion of patients who approximately 15 points [4], then the study can
have changed by an amount greater than the conclude that patients undergoing a TKA will
MDC threshold or, more meaningfully, could experience an important improvement.
present the mean change from pre- to post- and However, what is the average difference
use the MCID to interpret the 95% CI around between groups when both groups are undergo-
the difference. For example, if the MCID for a ing TKA? Perhaps the comparison is between
PROM is 10 points and the average pre- to post-­ two different types of implants. In this situation,
change score is 15 points (95%CI 5 points to 25 all patients are expected to benefit from TKA by
points), then the lower boundary of the CI an important amount. However, the research
implies that not all patients will achieve an question is about measuring the difference
important change and, thus, the study cannot between groups. Surely we do not expect the dif-
conclude that the intervention will provide ference between groups to be as large as the
important improvements for all patients. On the change within groups, especially when both
other hand, if the 95% CI around the pre- to groups are receiving active treatment (versus no
post-change score is 11–19 points, then (if the treatment). A 1993 study by Goldsmith et al. [5]
study is sufficiently large) the study can confi- determined the magnitude of the between-group
dently conclude that the intervention is highly MCID is significantly smaller than the within-­
likely to provide important improvements for group MCID (between 20 and 40%). Thus, if the
most patients. MCID for a between-group comparison is around
5 points, then in the study by Victor et  al. [24]
that compared two different implants for TKA
13.4.3 Interpreting Change Between (n  =  131) where the mean difference between
Groups of Patients groups was 3 points (95%CI 10 points to 15
points), we can determine that the study is grossly
To report the results of a between-group compari- underpowered because the upper and lower
son, one could compare the proportion of patients boundaries of the CI include 5 (the possibility of
within each group who change by an amount an important difference); unfortunately, both
greater than the MDC threshold or, more mean- boundaries conclude in favour of the other
ingfully, could present the mean between-group implant. To make a definitive conclusion that the
difference and use the MCID to interpret the 95% two implants offer a similar outcome (as mea-
CI around this difference. However, the latter sured by the WOMAC), both CIs would exclude
presents a problem because the value of the five WOMAC points.
13  Health Measurement Development and Interpretation 119

13.4.3.1 N  umber Needed to Treat • Reliability, validity, and responsiveness are


(NNT) measurement properties that provide evidence
One final, and perhaps the most intuitive, for the clinician to evaluate the suitability of
method to present the results of a between- an outcome measure specific to the sample of
group comparison using PROMs is to provide interest.
the number needed to treat (NNT). The NNT is • Data collected from outcome measures should
the average number of patients who need to be be presented using statistics that are easy to
treated with the experimental treatment to pre- interpret and relay the clinical significance of
vent one additional bad outcome (e.g. the num- the findings.
ber of patients that need to be treated for one of
them to benefit compared with a control in a Checklist for Interpreting Health Outcome
clinical trial). It is defined as the inverse of the Measures
absolute risk ­reduction (1/RD). Thus, it is pos- 1. What is the objective of the study?
sible to determine the proportion of patients (a) Was the outcome measure selected appro-
within each treatment group who surpass the priate for the study objective/design?
MDC threshold and present the results as an 2. Has the outcome measure been shown to be
NNT or the number of patients that must be valid and reliable in the population of
treated with the intervention to achieve a clini- interest?
cally meaningful improvement in one patient 3. If the objective is to evaluate change, has the
(compared to the control) [12]. Thus, an NNT outcome measure been shown able to detect
of five indicates that one can expect that 20% important change in the study population?
more patients will experience an improvement (a) Is there a reported SEM to interpret indi-
if they receive the experimental treatment com- vidual change?
pared to the control. The NNT is more easily (b) Is there a reported MCID to interpret

interpreted because the reader does not need to group changes?
understand MDC, MCID, or how to use them to 4. Are group differences presented with confi-
interpret CIs; this work has already been done dence intervals?
by the researcher [1]. (a) Does the interpretation of the lower
The clinical significance of the data presented boundary of the 95% confidence interval
at the end of a clinical research study is depen- offer the same conclusion as the upper
dent on the established measurement properties boundary of the 95% CI?
of the outcome measure in the sample of interest. (b) When considering the risk of random

Group differences and statistical significance are sampling error (as it relates to sample
convenient ways to present data; however, inves- size) and sampling bias (as it relates to the
tigators presenting or analysing the results should representativeness of the population),
include more clinically meaningful presentations what is the degree of certainty in the
of their data to improve their readability and conclusions?
potential to change practice. 5. Has the study made appropriate conclusions
and, if reasonable to do so, presented the
Take-Home Message results in terms of clinical relevance?
• The development of outcome measures has
allowed clinicians to better evaluate the com-
plex framework of health, and these instru- References
ments have become an integral part of health
research; they are utilized to demonstrate 1. Bryant D, Guyatt G.  Patient reported outcome mea-
sures. In: Arnold R, editor. Pharmoeconomics. Boca
patientimportant changes in health and inform Raton: CRC Press; 2009.
clinical decision-making regarding the effec- 2. Cochrane Collaboration glossary page [cited 2018 Jan
tiveness of treatment. 16]. http://community.cochrane.org/glossary#letter-S.
120 A. Firth et al.

3. Deyo RA, Diehr P, Patrick DL.  Reproducibility and retest reliability of health status instruments. J Clin
responsiveness of health status measures. Statistics Epidemiol. 2003;56(8):730–5.
and strategies for evaluation. Control Clin Trials. 16. McDowell I, Newell C. Measuring health: a guide to
1991;12(4 Suppl):142S–58S. rating scales and questionnaires. 2nd ed. New York:
4. Escobar A, Quintana JM, Bilbao A, Aróstegui I, Oxford University Press; 1996.
Lafuente I, Vidaurreta I.  Responsiveness and clini- 17. Messick S.  Validity. In: Linn R, editor. Educational
cally important differences for the WOMAC and measurement. Phoenix: Oryx Press; 1993. p. 13–103.
SF-36 after total knee replacement. Osteoarthr Cartil. 18. Paterno MV, Schmitt LC, Ford KR, Rauh MJ, Myer
2007;15(3):273–80. GD, Huang B, et al. Biomechanical measures during
5. Goldsmith CH, Boers M, Bombardier C, Tugwell landing and postural stability predict second anterior
P.  Criteria for clinically important changes in out- cruciate ligament injury after anterior cruciate liga-
comes: development, scoring and evaluation of rheu- ment reconstruction and return to sport. Am J Sports
matoid arthritis patient and trial profiles. OMERACT Med. 2010;38(10):1968–78.
Committee. J Rheumatol. 1993;20(3):561–5. 19. Reid A, Birmingham TB, Stratford PW, Alcock GK,
6. Hewett TE, Myer GD, Ford KR, Heidt RS, Colosimo Giffin JR.  Hop testing provides a reliable and valid
AJ, McLean SG, et  al. Biomechanical measures of outcome measure during rehabilitation after ante-
neuromuscular control and valgus loading of the rior cruciate ligament reconstruction. Phys Ther.
knee predict anterior cruciate ligament injury risk 2007;87(3):337–49.
in female athletes: a prospective study. Am J Sports 20. Rothman ML, Beltran P, Cappelleri JC, Lipscomb
Med. 2005;33(4):492–501. J, Teschendorf B, Group MFP-ROCM.  Patient-­
7. Jackowski D, Guyatt G. A guide to health measure- reported outcomes: conceptual issues. Value Health.
ment. Clin Orthop Relat Res. 2003;413:80–9. 2007;10(Suppl 2):S66–75.
8. Jaeschke R, Singer J, Guyatt GH.  Measurement 21. Schünemann HJ, Guyatt GH.  Commentary—good-
of health status. Ascertaining the minimal clini- bye M(C)ID! Hello MID, where do you come from?
cally important difference. Control Clin Trials. Health Serv Res. 2005;40(2):593–7.
1989;10(4):407–15. 22.
Shrout PE, Fleiss JL.  Intraclass correlations:
9. Kahn TL, Soheili A, Schwarzkopf R.  Outcomes of uses in assessing rater reliability. Psychol Bull.
total knee arthroplasty in relation to preoperative 1979;86(2):420–8.
patient-reported and radiographic measures: data 23. Streiner D, Norman G. Health measurement scales: a
from the osteoarthritis initiative. Geriatr Orthop Surg practical guide to their development and use. Oxford:
Rehabil. 2013;4:117–26. Oxford University Press; 1995.
10. Kellgren JH, Lawrence JS.  Radiological assess-
24. Victor J, Ghijselings S, Tajdar F, Van Damme G,

ment of osteo-arthrosis. Ann Rheum Dis. 1957; Deprez P, Arnout N, et al. Total knee arthroplasty at
16(4):494–502. 15-17 years: does implant design affect outcome? Int
11. Kirshner B, Guyatt G.  A methodological frame-
Orthop. 2014;38(2):235–41.
work for assessing health indices. J Chronic Dis. 25. Ware J, Kosinski M, Keller SD.  A 12-Item Short-­
1985;38(1):27–36. Form Health Survey: construction of scales and pre-
12. Laupacis A, Sackett DL, Roberts RS.  An assess- liminary tests of reliability and validity. Med Care.
ment of clinically useful measures of the conse- 1996;34(3):220–33.
quences of treatment. N Engl J Med. 1988;318(26): 26. Wasserstein R, Lazar N.  The ASA’s statement on

1728–33. p-values: context, process and purpose. Am Stat.
13. Liang MH. Longitudinal construct validity: establish- 2016;70(2):129–33.
ment of clinical meaning in patient evaluative instru- 27. Wiebe S, Guyatt G, Weaver B, Matijevic S, Sidwell
ments. Med Care. 2000;38(9 Suppl):II84–90. C.  Comparative responsiveness of generic and spe-
14. Marshall G, Hays R, Nicholas R.  Evaluating agree- cific quality-of-life instruments. J Clin Epidemiol.
ment between clinical assessment methods. Int J 2003;56(1):52–60.
Methods Psychiatric Res. 1994;4:249–57. 28. World Health Organization. Preamble to the constitu-
15. Marx RG, Menezes A, Horovitz L, Jones EC, Warren tion of the World Health Organization as adopted by
RF.  A comparison of two time intervals for test-­ the international health conference, Geneva; 1948.
How to Document a Clinical Study
and Avoid Common Mistakes
14
in Study Conduct?

Caroline Mouton, Laura De Girolamo,


Daniel Theisen, and Romain Seil

14.1 Introduction patients, and a lower retention rate of patients is


observed [18]. The additional constraints to the
Over the last two decades, the complexity of study protocol, as well as the emergence of
organizing a clinical study has drastically guidelines of best clinical practice, have thus
increased. Between 1999 and 2005, the number inevitably led to a greater administrative burden
and frequency of study procedures (measure- and greater time duration of studies.
ments, questionnaires, visits, etc.) have been From an orthopedic surgeons’ perspective,
reported to grow at an annual rate of up to 8.7% these changes are often perceived as a significant
[18]. During the same period, the number of eli- barrier to conduct clinical research, because this
gibility criteria has increased by more than 12% evolution is in perfect opposition to the current
annually [18]. As a consequence, a greater time is evolution of their clinical reality which is often
nowadays needed to enroll a sufficient number of dominated by an increasing economic pressure
and by clinical productivity. As a consequence,
the time and resources available for clinical
research are decreasing. Clinical research is con-
ducted in many centers by students or residents
who often only have a limited time frame to con-
duct a study, although it may help them to
C. Mouton (*)
Department of Orthopaedic Surgery, Centre
advance in their clinical careers. Likewise, dedi-
Hospitalier de Luxembourg—Clinique d’Eich, cated and professional staff members to support
Luxembourg, Luxembourg them are often not available. Nevertheless, ortho-
e-mail: mouton.caroline@chl.lu pedic surgeons need to be familiar and comply
L. De Girolamo with these international requirements in order to
Orthopaedic Biotechnology Laboratory, Galeazzi continue producing high-quality research and be
Orthopaedic Institute, Milan, Italy
able to compete with other medical specialties.
D. Theisen Clinical studies should be conducted in accor-
Sports Medicine Research Laboratory, Luxembourg
Institute of Health, Luxembourg, Luxembourg
dance with good clinical practice (GCP). A pro-
spective study, may it be an observational cohort
R. Seil
Department of Orthopaedic Surgery, Centre
or a controlled trial, requires several authoriza-
Hospitalier de Luxembourg—Clinique d’Eich, tions, approvals, and/or notifications from local
Luxembourg, Luxembourg health authorities, Independent Ethics Committee
Sports Medicine Research Laboratory, Luxembourg (IEC)/Institutional Review Board (IRB), data pro-
Institute of Health, Luxembourg, Luxembourg tection authorities, insurances, and/or hospitals.

© ISAKOS 2019 121


V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_14
122 C. Mouton et al.

Although these authorizations may differ from one 14.2.1  Protocol, Case Report Form
country or a hospital to another, a study should not (CRF), Information
be started until all legal requirements are fulfilled. to the Patient, Informed
Regulatory compliance can sometimes be Consent Form (ICF),
laborious and time-consuming, with the conse- and Amendments
quence of delaying the start of a clinical study.
However, this waiting time may be efficiently This section should include the initial study pro-
used by the investigator to establish the organiza- tocol as submitted to the IEC/IRB as well as all
tional dimension of the study. Associated with an available subsequent amendments. The latter are
efficient communication, it will limit time loss documented changes to the protocol initially
and allow staying on track to complete and pub- approved by the IEC/IRB.
lish the study in a timely manner. Several inad- The CRF is a printed or electronic document
vertent consequences may occur during the study used to record all data required by the study pro-
conduct in case of an insufficient preparation, an tocol such as patient demographics data, study
early discontinuation of the study, a delay in the interventions, outcomes, and adverse events. It
recruitment of subjects, a high rate of patients should not be confused with source documents
lost to follow-up, low data quality, or a delay in which will be described later in this chapter. A
data analysis. Greater awareness of the entire blank copy of the case report form (CRF) as well
process of the study conduct and its common as a blank copy of patient-related outcomes mea-
mistakes is therefore the key to success for a sures (PROMs) and adverse event (AE) report
high-quality study, which may be considered for must be available on-site. This section of the
publication in a high-ranked journal. binder should also include the latest version of all
The aim of this chapter is (1) to give an over- documents used for the recruitment of partici-
view of the legal requirements to be respected dur- pants, that is to say the information provided to
ing a study conduct as well as (2) to provide the patient, the informed consent form (ICF), and
practical advice to make good use of the legally all advertisements used to recruit participants
required on-site documents, to organize the study, (newspaper, radio, posters, and flyers) [16].
and to avoid common mistakes in the monitoring of As for the protocol, the CRF or the informa-
the study or data handling. The link between each tion to the patients may be subjected to changes
phase of the study conduct and how it may prevent during the study conduct (i.e., following an
the investigator to publish will also be discussed. amendment in the protocol). To avoid time loss, it
is important to evaluate properly how a possible
change in the study protocol may affect the other
14.2 On-Site Documents: study documents (i.e., CRF, PROM, and AE) and
Definitions and Use submit all new versions together.

On-site documents should demonstrate that the


principal investigator (PI) is conducting the study 14.2.2  Approval/Favorable Opinion
according to standards and applicable regulatory of Independent Ethics
requirements. In case of an audit, they can be Committee (IEC)/Institutional
reviewed by the legal authorities or the sponsor(s). Review Board (IRB)
The minimum list of essential documents to be and Annual Reports
available on-site during the study conduct is
reported here [1]. This list mainly includes docu- On-site binders must include study interim or
ments which have been previously submitted to annual reports as well as all documents received
authorities. Guidance on how to properly use from the IEC/IRB to certify that the study proto-
these documents for the study conduct is pro- col and its amendments have been reviewed and
vided in this chapter [2]. approved by legal authorities.
14  How to Document a Clinical Study and Avoid Common Mistakes in Study Conduct? 123

14.2.3  Signed Informed Consent 14.2.6  Adverse Events (AE)


Forms (ICF) and Notification of Serious
Adverse Events (SAEs)
Participation in a study must be on a voluntary
basis; the patient’s decision must not be influ- An adverse event (AE) is “any untoward medical
enced by the investigator/staff. If the patient is occurrence, unintended disease or injury, or
willing to participate into a study, the informed untoward clinical signs (including abnormal labo-
consent is documented by means of a written, ratory findings) in subjects, users or other per-
signed, and dated ICF.  The latter attests that sons, whether or not related to the investigational
the study was explained to the patient and medical device” [1]. All AE should be reported
understood and that his/her consent was freely during the study conduct. If AE are integrated into
given. When a subject is not capable of giving an eCRF, the same rules than for the CRF apply.
informed consent, the permission of a legally Furthermore, serious adverse events (SAEs) must
authorized representative should be obtained in be reported to the IRB/IEC and to the hospital
accordance with the applicable law (i.e., for within a given time frame from the occurrence of
minors, the parents must sign on his/her the adverse event, as well as to sponsors if appli-
behalf). cable. They include any event related or not to the
participation in the study (i.e., related or not to the
study device/technique/drug) that has either led to
the death of the patient or that has resulted in a
14.2.4  Source Documents life-threatening illness or injury, in a permanent
impairment of a body structure or function, or in a
Source documents are original documents, data, hospital admission. Any medical intervention per-
and records (i.e., hospital/medical records, labo- formed to prevent illness, injury, or impairment is
ratory notes, X-rays, etc.). They should be attrib- also considered to be a SAE.
utable, legible, contemporaneous, original, and
accurate (ALCOA) [6] and distinguished from
the CRF. For each data collected, the source doc- Clinical Vignette 1: Discussion on Adverse
umentation (medical file) should be explicitly Events
mentioned to enable the PI or an independent Ms. X comes for a medical visit 3  months
observer to reconfirm the data (even years after after total knee replacement. She also has
completion of the study). consented before the surgery to participate in
a post-marketing study. At the physical
examination, the doctor detects knee stiff-
ness that requires manipulation under anes-
14.2.5  Signed, Dated, and 
thesia. The clinical coordinator is present,
Completed CRF
and the doctor asks for the adverse event
(AE) form. After discussion, both the doctor
If study data are recorded on paper during study
and coordinator agree to report a serious
visits, the documentation must be retained and
adverse event (SAE). Although knee stiff-
documented in this section. For electronical CRF
ness is a known complication after total knee
(eCRF), results can be entered directly into the
arthroplasty, it led to the patient hospitaliza-
eCRF by an authorized staff member to avoid
tion, and this intervention is useful to prevent
paper transcription. In this case, the eCRF is the
further impairment. The SAE is reported
source. Guidance has been made available for
both to the sponsor and the hospital although
electronic source data by the FDA to clarify
the event may not be directly related to the
when the CRF is considered as the source docu-
medical device (knee arthroplasty).
ment or not [4].
124 C. Mouton et al.

14.2.7  Subject Screening, 14.2.10  Curriculum Vitae of Principal


Enrollment, and Dropout Log and Co-investigators

The screening and enrollment logs report all These documents are helpful to document quali-
patients screened and informed about the study. It fications and eligibility of investigators to con-
is used to report the outcome of the screening (as duct a study and/or provide medical supervision
screening failures can occur). It usually includes of subjects.
the following items: investigator name, site,
patient initials, date inform consent signed, ver-
sion of consent, date of screening, reasons for 14.2.11  Other On-Site Documents
screening failure, reason for withdrawal/exclu- According to the Study
sion, study code assigned, and staff initials. Any
withdrawal or lost to follow-up during the study In addition to the previously cited documents, if
conduct will be reported separately on the drop- the protocol includes known technical proce-
out log. dures, it is recommended to report any update
Logs can help to conduct the study in a more that may occur during the study conduct in test-
efficient and organized manner. Documentation ing procedures, including updates on normal
of the reasons why the screening has failed (i.e., value(s), certification, accreditation, or quality
patient refused to participate in the study, patient control. From a scientific point of view, it may
older than the upper limit of age considered) can help to trace any outlier/abnormal value when
provide information regarding the ability of the analyzing the data and help to correctly interpret
study team to enroll patients and retain them. the data.
This information may be useful to adapt the If biological samples are collected and stored
recruitment period. during the study, it is important to document the
location and identification process of retained
samples in case a test has to be repeated.
14.2.8  Subject Identification If the study includes the investigation of a
Code List product (i.e., medical device), the investigator’s
brochure and updates, instructions for handling
This log enables the PI to track the correspon- investigational product, marketing authorization
dence between enrolled subject name and allo- (i.e., CE mark), and procedures for shipment and
cated study code. This list is kept confidential, storage must be available on-site.
well secured, and/or protected by password if it is Finally, if the study is being sponsored, other
digital. Only the PI and delegated staff should documents may apply, such as an up-to-date
have access to this list. insurance, financial agreements between
involved parties, monitoring visit reports to doc-
ument visits, and findings of the monitor and any
14.2.9  Study Staff and Training Log relevant communication other than site visits
including letters, meeting notes, notes of tele-
The PI can delegate tasks to his staff through the phone calls, etc.
signature sheet. The latter documents signatures
and initials of all persons authorized (co-­
investigators) to screen subjects, assess inclu- 14.3 Monitoring the Study
sion/exclusion criteria, gather patient consent,
make entries and/or corrections on CRFs, per- Once the regulatory binders contain all required
form measurements, etc. It can be completed documents, the study can theoretically start.
with a training log to prove that all staff members Beyond GCP requirements, it may be recom-
have been trained for the study procedures. mended to conduct the study according to the
14  How to Document a Clinical Study and Avoid Common Mistakes in Study Conduct? 125

considered during the review process of a manu-


Fact Box 14.1 script is correctly addressed. Existing checklists
Clinical studies should be conducted in and statements to report studies may therefore be
accordance with good clinical practice consulted before the study starts to properly plan
(GCP). Any substantial change to the pro- the latter [9, 10, 25].
tocol during the study conduct requires
approval from the Independent Ethics
Committee (IEC)/Institutional Review 14.3.1  Responsibilities of the PI
Board (IRB).
Consent is documented by means of a The PI assumes the responsibility for proper con-
written, signed, and dated informed con- duct of the study. He/she is responsible for pro-
sent form (ICF) which attests that the infor- tecting the rights, safety, and welfare of subjects
mation was explained and understood and under his/her care during a clinical study. During
that consent was freely given. the study conduct, whether tasks are delegated or
Source documents are original docu- not, the PI should ensure that:
ments, data, and records from the hospital,
while the case report form (CRF) is a –– The conduct of the study is in compliance
printed or electronic document used to with the protocol.
report pseudonymized data according to –– No deviation is allowed from the protocol
the study protocol (i.e., the name of the without prior favorable opinion from the IRB/
patient does not appear on the document, IEC (protocol amendment).
only the study code allocated to the patient –– Any deviation from the protocol is docu-
for his/her participation). A list to track the mented and explained (i.e., a deviation may
correspondence between enrolled subject occur and be unavoidable if the PI wants to
name and allocated study code must be eliminate immediate hazard(s) to the
kept by the PI separately. participant).
Subject screening, enrollment, and –– Medical devices or drugs, if investigated, are
dropout logs must be used to report screen- used in accordance with the approved
ing of patients, outcomes of screening, and protocol.
any withdrawal or lost to follow-up. –– Codes in randomized controlled trials are bro-
All adverse events (AE) should be ken only in accordance with the protocol
reported (all abnormal findings during the (unblinding).
patient follow-up). Serious adverse events –– Obtaining and documenting patient consent
(SAEs) must be reported to the IRB/IEC are performed according to the applicable reg-
and to the hospital within a given time ulatory requirement.
frame (usually within the first 24  h) from –– Data in the CRFs and in all required reports
the occurrence of the adverse event. are accurate, complete, and legible.
The PI can delegate tasks to his staff, –– A summary of the study status is reported
which should be documented by a signa- annually to the IRB/IEC.
ture sheet and training log. –– All SAEs are reported immediately to the reg-
ulatory authorities.

quality of manuscript foreseen. For example, the The above list is not exhaustive but includes
PI must be aware that a loss to follow-up greater the main responsibilities of the PI during study
than 20% will inevitably lead to a lower level of conduct. Further guidance on investigator
evidence of the results. He/she should thus ensure responsibilities has been made available by the
that the follow-up of patients as well as all aspects FDA [5].
126 C. Mouton et al.

14.3.2  Communication 14.3.3.1 H  ow Will Eligible Patients


and Delegation Be Identified?
The prescreening phase helps to identify poten-
Communicating with the research team (espe- tially eligible patients. If the study includes a sur-
cially when delegating) before and during the gical procedure (i.e., anterior cruciate ligament
study conduct is critical. On-field employees can reconstruction or total knee replacement), given
be a tremendous source of feedback (protocol that time between indication and surgery is suf-
deviations, difficulties, experiences, and adverse ficient for the patient to consider his/her partici-
events) and ideas. Ideally, the PI should first pation, the easiest way may be to identify patients
inform all persons involved in patient care through the surgical planning. If only the PI can
(nurses, physios, radiologists, secretaries, etc.) identify patients with the pathology of interest,
that a study is ongoing. It can be achieved by only he/she may provide the information to the
transmitting the synopsis of the study and/or by patient or inform the study staff team to enroll the
organizing an introduction session. patient. In any case, inclusion/exclusion criteria
The PI may also consider informing the gen- should be applied strictly.
eral practitioner (GP) of the participant through a
doctor-to-doctor referral letter. This is a common 14.3.3.2 W  hat May Limit
practice to inform the GP about the treatment his/ the Recruitment?
her patient underwent and thus to detect any pos- Enrollment during the first month is a strong pre-
sible adverse events at an early stage. dictor of study completion [20]. It is thus critical
During a study, dedicated research staff may to correctly plan how eligible patients will be
be required to support clinical staff and patients. contacted and recruited. From the clinician’s
To comply with all activities induced by the point of view, barriers to recruitment may include
clinical study, the PI can delegate tasks pro- underestimation of the prevalence of the condi-
vided that the study staff log explicitly men- tion studied requiring longer enrollment time to
tions it and that delegates are trained. The reach the desired sample size [21], time con-
delegates should have full knowledge of the straints, lack of staff and training, and difficulties
protocol and should be familiar with the condi- with the consent procedure [26]. From the
tion studied. The PI should furthermore encour- patient’s perspective, additional demands of the
age his staff to take advantage of the numerous study compared with the routine care and con-
online GCP training courses to obtain the GCP cerns about information, data privacy, and con-
certificate. The PI may also plan training ses- sent may constitute some factors that limit their
sions for new procedures. willingness to participate in a study.

14.3.3.3 H  ow Will Eligible Patients


14.3.3  Enrollment of Subjects into Be Contacted?
the Study: Screening, For nonintervention studies (i.e., observational
Recruitment, Eligibility, studies), an “opt-out” recruitment strategy (con-
and Informed Consent tact the patient instead of waiting that the patient
expresses his/her willingness to participate) is
Informed consent is the process by which a sub- advised [22]. Compared to emails and letters,
ject voluntarily confirms his/her willingness to direct phone calls by the investigator(s) are rec-
participate in a study. It must be obtained from ognized as the most effective method for recruit-
each participant prior performing any specific ment [27].
study procedures. The latter include any activity A letter signed by the PI together with the
not included in the standard treatment. information may also previously be sent to the
To plan enrollment, several questions may be patient (or given during the medical visit by the
asked: PI). The letter could indicate to contact the dedi-
14  How to Document a Clinical Study and Avoid Common Mistakes in Study Conduct? 127

cated person of the team for further question on –– The participation to the study: participation on
the study and/or inform that he/she will be con- a voluntary basis, the possibility to refuse to
tacted by a team member (naming a specific per- participate and to withdraw from the study at
son may here be useful so that the patient is any time without penalty or loss of benefits,
waiting for this phone call). any compensation and amount if applicable,
reasons for possible early termination of par-
14.3.3.4 H  ow to Report Prescreened ticipation/study (i.e., patient not compliant
Patients? with visits, medical condition interfering with
Screening logs should remain restricted in con- study protocol), and new information if new
tent but should document as many eligibility cri- findings become available that may be rele-
teria as possible [15]. Since there is no informed vant to the subject’s willingness to continue
consent from the patient, research procedures or participation in the study.
interventions should not yet take place. –– Data protection: confidentiality of personal
Practically, the site may create a prescreening information recorded and records identifying
sheet to follow track for screened/contacted the subject, protection of privacy, and access
patients. This sheet should include screening to pseudonymized data from third parties if
date, eligibility criteria that can be assessed applicable.
using the medical record (i.e., patients planned –– Additional medication administered if needed,
for ACL reconstruction, age, type of graft procedure, and insurance if a damage occurs.
planned), and contact date. Only procedures that –– Approval of the study by EC/IRB.
are performed as part of the routine clinical prac- –– Principal investigator/funder of study/contact
tice may be looked at, and only results used for person for any question related to the study
determining study eligibility should be screened and in case of adverse event
before obtaining consent [17].
14.3.3.6 H  ow to Record Patient
14.3.3.5 H  ow to Instruct the Patient Consent?
About the Study? Informed consent is documented by means of a
Participants should receive information both in written, signed, and dated ICF.  Each consent
written and oral formats (call, visit) in a nontech- form (usually a minimum of 2) must be dated and
nical language. The PI or the delegated person signed both by the patient and the investigator.
should capture the patient’s perspective and take The PI will be responsible for any misconduct on
time to answer his/her questions. Patients are the process of ICF (wrong date, falsified signa-
indeed less likely to enter studies that they find ture, missing consent). One consent is then kept
difficult to understand and that require multiple for the regulatory binders, and one is given to the
follow-ups [29]. It may be worth mentioning participant. The hospital may also require a copy
whether visits are part of the clinical routine or for the electronic health record.
not and explicitly mention additional visits and
procedures. Patients may indeed not realize that 14.3.3.7 W  hat If New Information
some procedures will solely be performed for About the Medical Device/
research purposes and are not required for their Drug Becomes Available
medical care. During the Study or If
Information given to patients should include the Protocol Procedure
information about: Changes?
If new information on risk and/or benefits arises
–– The study: its purpose, participation duration or if major protocol amendments occur during
and number of visits, subject’s involvement the study, the PI should ensure that subjects are
and responsibilities, side effects, risks and informed and re-consent to participate in the
benefits, and alternatives to treatment. study.
128 C. Mouton et al.

14.3.3.8 W  hat to Do If team should keep track of visits and windows in
the Recruitment Is Slower a sheet/file. Questionnaires to be filled in by the
as Expected? patient, if applicable, may be sent some time
Poor subject recruitment and retention is 1 of the before the visit, together with a reminder of the
15 common reasons for failure in clinical research appointment. If the patient does not come to the
[11]. The PI should be able to predict recruitment visit, the study staff should make all efforts pos-
rate according to his facilities, patients, and pro- sible to contact the patient to avoid a loss to fol-
tocol. He/she should also be able to identify any low-­up. These efforts should be documented.
factors that could prevent the team to properly Any definitive loss to follow-up should be
recruit for the study. During the study conduct, reported in the dropout log.
the PI should thus review the recruitment on a
regular basis (i.e., with the help of the screening 14.3.4.2 Compliance with Protocol
log). If the recruitment rate is lower than expected, and Protocol Amendments
a discussion with the study staff may help to Adherence to the study protocol is essential. Any
identify the difficulties met in practice. The study deviation from or violation of the protocol should
team may also consider to adapt the protocol and be documented [8, 28]. Deviations from the pro-
information given to the patient if too complex or tocol are defined as changes or noncompliance
leading to confusion (i.e., too many visits, diver- with the study protocol that does not have a sig-
gence from routine care) or to increase the dura- nificant effect on the participant’s rights, safety,
tion of the recruitment period [29]. or welfare (i.e., missing visits or data). Protocol
violation may affect the participant’s rights,
safety, or welfare (i.e., inclusion/exclusion crite-
14.3.4  Study Visits: Compliance ria not met, failure to obtain valid informed
with the Approved Protocol consent).
and Protocol Amendments If substantial protocol modifications become
necessary during the study conduct (either to
14.3.4.1 Study Visits avoid protocol deviation/violation or for other
The PI should consider providing the study staff reasons), amendments to the protocol must first
with a brief worksheet/checklist indicating pro- be approved by the IEC/IRB. Overall, about two-­
cedures to be performed at each visit. It can serve thirds of protocols are reported to require one or
as a reminder and ensure that procedures are more amendments [14, 19] although the latter
completed in a timely manner without any have an additional impact on study costs, time-
­missing data. line, and resources [19, 24]. One-third of these
To efficiently organize the study visits, the amendments are considered to be avoidable if
study team should consider the following aspects: inconsistencies/errors in the protocol and diffi-
visit windows (range of days in which a subject culties in recruiting study volunteers are better
visit can occur according to the study protocol), anticipated.
room and staff availability, availability of support The definition of “substantial” may vary
department (i.e., X-ray), and whether procedures according to legal authorities but generally
are part of the clinical routine or not. includes all modifications that may impact the
The study staff should be careful to avoid long safety or physical or mental integrity of the sub-
waiting times for study visits. Ideally, the visits jects or the scientific value of the study. According
should coincide with routine visits if applicable to the EU guidance (Sects. 14.3.3 and 14.3.4)
and be short [23]. The follow-up visits may also [12], substantial amendments include (non-­
be proposed at a location convenient for the exhaustive list):
patient and/or outside working hours. The
appointments should be scheduled as soon as –– Amendments to the protocol: Any changes in
possible (i.e., at discharge if any surgery), and the the population studied, procedures, and moni-
14  How to Document a Clinical Study and Avoid Common Mistakes in Study Conduct? 129

toring visits including changes of the main All AE should be followed, and detailed writ-
objective or endpoints or changes in the ten reports should be provided until the risk is
recruitment procedure (inclusion/exclusion eliminated or AE resolved. If judged necessary
criteria, additional group of patients, etc.) are by the authorities, the protocol may be amended
considered as substantial changes. This does and ICF updated to inform patients about new
not include modification of the title, addition/ risks. Alternatively, the study may be termi-
deletion to tertiary endpoints, minor increase nated prematurely.
in the duration of the study (<10% of the over-
all time), or increase of >10% of the overall
time of the study provided that ­monitoring
visits are unchanged (i.e., no additional visit).
–– Amendments concerning the product studied: Fact Box 14.2
any new information on the study product or The PI assumes the responsibility for
any new information made available by the proper conduct of the study even if tasks
manufacturer. are delegated.
–– Amendments to other documents: any changes Communicate not only with the study
of sponsor/principal investigator or revocation team, but inform all departments/doctors
of the product marketing authorization. involved in the patient’s care about the
ongoing study.
Substantial amendments must be approved by Aspect to organize for an efficient study
the IEC/IRB.  As for non-substantial amend- conduct:
ments, it is possible to record them and submit
them simultaneously with the notification of a –– Identification of eligible patients.
substantial amendment or at least inform the –– Contact with patients and information
IEC/IRB. about the study (written and oral).
For each new protocol version, date and ver- –– Consent signature and on-site documen-
sion identifier should be properly reported, and tation (screening log, study code
modifications should be highlighted, and a ­allocated to the patient).
detailed summary of protocol changes including –– Study visits (according to visit w­ indows,
old text, new text, and rationale for change should room and staff availabilities, and ­clinical
be provided (SPIRIT guidelines—Item 3—date routine).
and version identifier) [10]. It is usually recom- –– Procedures to be performed at each
mended to maintain both a track change and a visit.
clean version of the protocol to the IRB/IEC –– Contact with patients missing a visit.
including a document history.

14.4 Managing Study Data


14.3.5  Safety Management
and Reporting Data management includes all procedures for
collecting, handling, manipulating, analyzing,
The PI should ensure that the risk-benefit for and storing/archiving data used during the
the patient to participate in the study remains study conduct. All information should be
constantly favorable. He should ensure the recorded, handled, and stored in a way that
communication with the study staff who is allows its accurate reporting, interpretation,
often the first to observe unanticipated risks. and verification.
130 C. Mouton et al.

14.4.1  Data Quality and Integrity base for inclusion/exclusion criteria, validity


of data, and outliers. At the end of the patient
Quality and integrity are related. If data quality is follow-up, CRF should be completed and
bad, data integrity cannot be reached. Data qual- signed. At the end of the study, records related
ity refers to the essential characteristics of data. to the study should be kept for the period of
These should be attributable, legible, contempo- time required by national/local laws and
raneous, original, and accurate (ALCOA) [6]. regulations.
Data integrity refers to the validity and consis-
tency of data. Mechanisms should be in place to
prevent accidental modification or erasure of the 14.4.2  Data Access, Confidentiality,
data (i.e., data backup). and Privacy
The design of the CRF (paper or electronic
form) is a key quality step in ensuring the data Since May 2018, the General Data Protection
required by the protocol [7]. To ensure data qual- Regulation (GDPR) replaces the Data Protection
ity and integrity, the PI should ensure that the Directive 95/46/EC to protect data privacy from
CRF is standardized (i.e., format of date, pick European residents [3]. The implication of such
lists). regulation on research is developed in another
Care should also be given to provide the study chapter [13], and only critical aspects for study
staff on how to deal with: conduct will be recalled in this section.
Privacy implies that the participant informa-
–– Confidentiality: The CRF should not include tion will be protected and not disclosed without
any information that can be used to identify the knowledge/permission of the participant him-
the study participant. self. Confidentiality involves that the PI and his
–– Missing data: No field should be left blank. team protect the information on the patient from
Visits that the participants fail to make, tests deliberate or accidental disclosure and follow
not conducted, and examinations not per- procedures to release the information only to
formed or missing information should be authorized parties.
reported by indicating “not done,” “not appli- The PI must protect the confidentiality of
cable,” or “unknown.” information retrieved from medical records
–– Completion, change, or correction on a CRF: and visits. For example, he/she should consider
In any format (paper or electronic), it is encrypting data, restricting access to study
required to record who entered and generated records, keeping study records in secured
the data. Any change or correction should be areas, and maintaining subjects’ names and
dated, initialed, and explained. Original data study code separately (subject identification
should be crossed out with a single line that code list).
leaves the original information clearly visi- The PI or the delegate should inform the study
ble. The correct data should be inserted next participant in written (ICF) and oral format about
to the erroneous data, and the form should be who will have access to his/her personal data
initialed and dated. Similar systems should be (study team, IRB/IEC, regulatory authorities,
activated for electronic forms. sponsors if applicable) and about the measures
taken to ensure the confidentiality and security of
Identification of missing data and/or dis- personal information. The participant should also
crepancies should be performed on a regular be aware that his personal data will be kept confi-
basis during the study conduct, especially if dential and will not be publicly available even if
CRF completion is delegated to staff members. the results are foreseen to be published in a scien-
Regularly, the PI should also control the data- tific manuscript.
14  How to Document a Clinical Study and Avoid Common Mistakes in Study Conduct? 131

Bibliography
Fact Box 14.3
–– Respect the confidentiality of data on 1. ICH tripartite guideline for good clinical practices
the CRF: Use the participant study code. E6 (R1). 1996. http://www.ich.org/fileadmin/Public_
Web_Site/ICH_Products/Guidelines/Efficacy/E6/
–– Control data: Identify regularly missing E6_R1_Guideline.pdf.
data, control inclusion/exclusion crite- 2. NCCIH clinical research toolbox—essential docu-
ria, and check for abnormal values. All ments/regulatory binder. https://nccih.nih.gov/grants/
information should be accurate and ver- toolbox#binder.
3. (2016) regulation (EU) 2016/679 of the European
ifiable (data entered in the CRF should Parliament and of the council of 27 April 2016 on the
be identical to the data in the patient protection of natural persons with regard to the pro-
medical record). No field should be left cessing of personal data and on the free movement of
blank (report “not done,” “not applica- such data, and repealing Directive 95/46/EC (General
Data Protection Regulation). Official Journal L 119,
ble,” “unknown”). Any change or cor- 4.5.2016, p. 1–88. Official Journal L 119 1–88.
rection of the CRF should be dated, 4. Administration USDoHaHSFaD.  Guidance for
initialed, and explained. industry—electronic source data in clinical inves-
–– Respect privacy: Only release informa- tigations. 2013. https://www.fda.gov/downloads/
Drugs/GuidanceComplianceRegulatoryInformation/
tion to authorized team and parties (i.e., Guidances/UCM328691.pdf.
sponsor if applicable). 5. Administration USDoHaHSFaD. Guidance for indus-
try—investigator responsibilities—protecting the
rights, safety, and welfare of study subjects. 2009.
https://www.fda.gov/downloads/Drugs/.../Guidances/
UCM187772.pdf.
Take-Home Message 6. Bargaje C.  Good documentation practice in clinical
• Principal investigators are responsible for the research. Perspect Clin Res. 2011;2(2):59–63.
7. Bellary S, Krishnankutty B, Latha MS. Basics of case
entire study conduct, independent of whether report form designing in clinical research. Perspect
study related tasks are delegated or not. Clin Res. 2014;5(4):159–66.
• Compliance with legal requirements and 8. Bhatt A.  Protocol deviation and violation. Perspect
GCPs should be ensured during the entire Clin Res. 2012;3(3):117.
9. Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA,
duration of the study. Glasziou PP, Irwig L, et al. STARD 2015: an updated
• Regulatory binders should be up to date, the list of essential items for reporting diagnostic accu-
informed consent process should be per- racy studies. BMJ. 2015;351:h5527.
formed according to the participants’ rights, 10. Chan AW, Tetzlaff JM, Gotzsche PC, Altman DG,
Mann H, Berlin JA, et  al. SPIRIT 2013 explanation
and data confidentiality and privacy should be and elaboration: guidance for protocols of clinical tri-
maintained. als. BMJ. 2013;346:e7586.
• The organization of the study conduct from 11. Clark GT, Mulligan R.  Fifteen common mistakes

patient screening to data quality will inher- encountered in clinical research. J Prosthodont Res.
2011;55(1):1–6.
ently follow the Deming circle: Plan, commu- 12.
Communication from the commission—detailed
nication with the team, agreement on task guidance on the request to the competent authori-
delegation, planning of patient enrollment and ties for authorisation of a clinical trial on a medicinal
study visits, and respect of confidentiality; product for human use tnos.
13. Theisen D, Moksnes H, Hardy C, Engebretsen L, Seil
Do, recruitment, data collection, and restric- R. How to organise an international register in compli-
tion of the access to study records; Check, ance with the European GDPR: walking in the foot-
regular discussion with the team to identify steps of PAMI (Paediatric ACL Monitoring Initiative).
problems (i.e., adverse events, low rate of 14. Decullier E, Lhéritier V, Chapuis F.  The activity of
French Research Ethics Committees and charac-
enrollment) and regular check of the database teristics of biomedical research protocols involving
to correct for missing data and inconsisten- humans: a retrospective cohort study. BMC Med
cies; and Act, adapt procedures if necessary. Ethics. 2005;6:9.
132 C. Mouton et al.

15. Elm JJ, Palesch Y, Easton JD, Lindblad A, Barsan double blind randomised trial of “opt-in” versus “opt-­
W, Silbergleit R, et  al. Screen failure data in clini- out” strategies. BMJ. 2005;331(7522):940.
cal trials: are screening logs worth it? Clin Trials. 23. Kaur M, Sprague S, Ignacy T, Thoma A, Bhandari M,
2014;11(4):467–72. Farrokhyar F.  How to optimize participant retention
16. FDA.  Guidance for Institutional Review Boards
and complete follow-up in surgical research. Can J
and clinical investigators—recruiting study sub- Surg. 2014;57(6):420–7.
jects—information sheet. 2018. https://www.fda.gov/ 24. Lösch C, Neuhäuser M.  The statistical analysis of a
RegulatoryInformation/Guidances/ucm126428.htm. clinical trial when a protocol amendment changed the
17. FDA.  Guidance for Institutional Review Boards
inclusion criteria. BMC Med Res Methodol. 2008;8:16.
and clinical investigators—screening tests prior to 25. Moher D, Hopewell S, Schulz KF, Montori V,

study enrollment—information sheet. 2018. https:// Gotzsche PC, Devereaux PJ, et al. CONSORT 2010
www.fda.gov/RegulatoryInformation/Guidances/ explanation and elaboration: updated guidelines for
ucm126430.htm. reporting parallel group randomised trials. BMJ.
18. Getz KA, Wenger J, Campo RA, Seguine ES,
2010;340:c869.
Kaitin KI.  Assessing the impact of protocol design 26. Ross S, Grant A, Counsell C, Gillespie W, Russell
changes on clinical trial performance. Am J Ther. I, Prescott R.  Barriers to participation in ran-
2008;15(5):450–7. domised controlled trials: a systematic review. J Clin
19. Getz KA, Zuckerman R, Cropp AB, Hindle AL,
Epidemiol. 1999;52(12):1143–56.
Krauss R, Kaitin KI. Measuring the incidence, causes, 27. Schroy PC 3rd, Glick JT, Robinson P, Lydotes MA,
and repercussions of protocol amendments. Drug Inf Heeren TC, Prout M, et al. A cost-effectiveness analy-
J. 2011;45(3):265–75. sis of subject recruitment strategies in the HIPAA era:
20. Haidich AB, Ioannidis JP.  Effect of early patient
results from a colorectal cancer screening adherence
enrollment on the time to completion and publica- trial. Clin Trials. 2009;6(6):597–609.
tion of randomized controlled trials. Am J Epidemiol. 28. Sweetman EA, Doig GS.  Failure to report protocol
2001;154(9):873–80. violations in clinical trials: a threat to internal valid-
21. Hewison J, Haines A. Overcoming barriers to recruit- ity? Trials. 2011;12:214.
ment in health research. BMJ. 2006;333(7562):300–2. 29. Thoma A, Farrokhyar F, McKnight L, Bhandari

22. Junghans C, Feder G, Hemingway H, Timmis A,
M. Practical tips for surgical research: how to optimize
Jones M.  Recruiting patients to medical research: patient recruitment. Can J Surg. 2010;53(3):205–10.
Framework for Selecting Clinical
Outcomes for Clinical Trials
15
Adam J. Popchak, Andrew D. Lynch,
and James J. Irrgang

15.1 Outcomes in Healthcare sure value, there is an inherent requirement to


accurately measure health outcomes.
Outcomes in healthcare exist in various domains, Patient-centred outcomes measure the result
such as clinical outcomes, process-of-care out- of medical care from the perspective of the
comes, patient satisfaction outcomes, and cost patient. Patient-reported outcomes (PROs) com-
outcomes. Therefore, when selecting outcomes monly measure the patient’s perception of their
for clinical trials, one must determine which out- symptoms, activities, or participation levels.
come is meaningful in the context of the trial. Therefore, when selecting outcomes for clinical
Clinical outcomes may relate to impairments in trials, determination of what is important to the
body structure and function, activity limitations, population of interest is essential. Relationships
or participation restrictions. Process-of-care out- between impairments [26, 36] and resulting
comes are often related to utilization of resources, activity limitations and participation restrictions
duration of care, and the procedures and inter- that affect patient-centred outcomes are not
ventions provided. Patient satisfaction may be always direct and vary amongst individuals.
related to satisfaction with the healthcare pro- Additionally, activity and participation are of
vider, the support staff, or the result of care. utmost concern to the individual. Therefore, mea-
Finally, cost outcomes often focus on the direct sures of activity and participation should be the
costs of the medical care or the indirect costs of primary outcome measure in clinical trials con-
the illness or pathology. In regard to cost, pay- cerned with the outcome of patient care.
ment reform in healthcare is a driving factor of Practical considerations for choosing an out-
what a meaningful outcome is, with value-based come include the purpose of the measurement;
payments replacing volume-based payments. the relevance to the patient population; the psy-
Value, in healthcare, is the outcomes that are chometric properties such as reliability, validity,
achieved compared to the costs to achieve them and responsiveness; and the clinician or respon-
[29]. Increased value is associated with the dent burden. Additional considerations include if
improvement of the health outcomes achieved the purpose of the measurement is to discrimi-
and the consideration of costs. Therefore, to mea- nate between subjects or groups, predict current
or future status, or evaluate change in a condition
A. J. Popchak (*) · A. D. Lynch · J. J. Irrgang over time [18]. Ideally, the measurement matches
Departments of Physical Therapy and Orthopaedic the level of intervention in a given clinical trial.
Surgery, University of Pittsburgh, Interventions aimed at treating an impairment
Pittsburgh, PA, USA should have outcome measurements that can
e-mail: ajp64@pitt.edu

© ISAKOS 2019 133


V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_15
134 A. J. Popchak et al.

evaluate outcomes at the level of an impairment. a child) that rely on self-perception of the symp-
Likewise, a trial with a purpose of reducing a dis- toms, impairments, and abilities [21]. Patient-­
ability should have an outcome measure capable reported outcomes of health-related quality of
of assessing levels of disability. Ultimately, the life (HRQOL) can be either general or specific,
outcome measure should serve the purpose of the with pros and cons being present with each type
research trial to provide information necessary to of measures.
draw a conclusion (Fig. 15.1). Generic health status measures are applicable
to diverse populations and usually measure mul-
tiple aspects of health such as physical, emo-
15.2 Measures of Activity tional, and social. The common examples of a
and Participation general health measure are the Medical Outcomes
Study 36-Item Short Form (SF-36) [25] and
There are two main approaches to assess activity 12-Item Short Form (SF-12) [35]. A more con-
and participation, performance-based testing and temporary measure is the PROMIS Global-10
patient-reported measures [21]. Performance-­ [9], which utilizes item response theory (see
based measures (PBM) rely on a rater’s assess- chapter on adaptive testing). General health mea-
ment of a patient’s performance on specific sures permit comparisons across populations
physical tasks [21]. Performance may be with different health conditions as well as being
assessed qualitatively with rating scales for the more likely to detect unexpected effects of an
level of assistance needed or quality of move- intervention. However, they are less responsive
ment or quantitatively based on the amount of than specific measures of health status; are sus-
time required to complete a task, the tolerance ceptible to being unable to distinguish between
for completing a task, or the measurable results groups secondary to their scores being either as
of a test such as jumping distance, balance, good as possible or as bad as possible in both the
speed, or time. For example, when considering treatment and control groups when used for high
outcomes related to movement in the context of or low functioning individuals, otherwise known
acute care or rehabilitation settings, level of as ceiling and floor effects; generally have con-
assistance works particularly well. Advantages tent that is less relevant to the patient and clini-
of PBM of physical function compared to cian; and tend to be longer and more difficult to
patient-reported measures include superior score.
reproducibility, greater sensitivity to change Specific health status measures focus on con-
(responsiveness), and less vulnerability to exter- tent specific to the primary condition or popula-
nal influences or biases [8, 21]. tion of interest, potentially creating a more
Patient-reported outcomes (PROs) are com- responsive instrument. To achieve this, specific
pleted by the patient or a proxy (e.g. a parent for health status measures include only the aspects of

Fig. 15.1 Determining
the primary outcome Level of Outcome Measurement
measure
Activity Participation
Level of Intervention

Impairment
Limitation Restriction

Impairment  X X

Activity Limitation X  X

Participation Restriction X X 
15  Framework for Selecting Clinical Outcomes for Clinical Trials 135

HRQOL that are relevant to the condition or pop- patients with varying levels of independence
ulation being studied. Specific health status mea- [11]. The PSFS has shown to be a valid, reliable,
sures have the distinct advantages of improved and responsive outcome measures for a variety of
responsiveness, lower respondent burden, are musculoskeletal problems [10]. Patient-specific
easy to score and interpret, and are more likely to scales are applicable to a large number of condi-
be accepted by patients and clinicians secondary tions, are efficient and easy to administer, and
to having greater relevance to the condition of have adequate psychometric characteristics, in
interest. However, specific health measures do particular responsiveness to change over time.
not measure all aspects of health that may influ- However, patient-specific scales limit compari-
ence the overall status, nor do they allow for sons between patients secondary to the lack of
comparison between different disease states and/ uniformity of content items that are determined
or populations. by each individual.
Disease-, region-, or patient-specific scales As outcome measures are being utilized in
exist as specific health status measures. Disease-­ clinical trials, the measures should be standard-
specific scales are designed for a particular dis- ized to ensure the ability to compare results,
ease process or pathology. The content reflects thereby working with a “common currency” of
symptoms, activity limitations, and participation effect. In the absence of such standardization,
restrictions that are experienced by the individual results and conclusions of studies are unable to
with the disease. Examples of disease-specific be compared, and additional evidence to support
scales used include the Lysholm [33] and or refute a conclusion remains missing. There has
Cincinnati Knee Rating Scale [27] (knee liga- been some effort to document core outcome sets
ment scales), the Western Ontario and McMaster for a body region or condition. A core outcome
Universities Osteoarthritis Index (WOMAC) [4] set establishes a minimum data set that should be
(osteoarthritis-specific scales), and the Western collected for a particular condition to allow for
Ontario Rotator Cuff Index (WORC) [17] (rota- comparison amongst studies.
tor cuff dysfunction).
Region-specific scales are designed for use on
a wide variety of disorders or impairments that 15.3 Psychometric Considerations
affect a particular region. The content reflects all
possible symptoms, activity limitations, and par- Psychometric considerations are vitally impor-
ticipation restrictions that can arise from impair- tant when selecting outcome measures. Primarily,
ment of a specific region. Examples of investigators must be concerned with the out-
region-specific scales include the Neck disability come measure’s reliability, validity, and respon-
Index (NDI) [34]; Penn Shoulder Score (PSS) siveness. Additional considerations include the
[22]; Disabilities of the Arm, Shoulder, and Hand purpose of the measurement, the relevance to the
(DASH) [12]; Oswestry Disability Index (ODI) patient population, and the clinician or respon-
[7]; and the Knee Outcome Survey Activities of dent burden. All of these factors should be con-
Daily Living Scale (KOS-ADLS) [15]. sidered when selecting an appropriate outcome
Patient-specific scales are defined by the measure for clinical trials.
patient, often with the patient providing a list of
3–5 relevant activities they are unable to do or are
have difficulty performing. Generally, the activi- 15.3.1 Reliability
ties are then given a numerical value rating, often
on an 11-point scale, where 0 is “unable to do” Reliability is the consistency of measurement
and 10 is “able to do at preinjury level”. and how much error one can expect in the cho-
Specifically, the Patient-Specific Functional sen outcome. Acceptable reliability levels are
Scale (PSFS) is used primarily in patients with necessary to ensure that the error associated
musculoskeletal disorders and can be used in with the measurement is small enough to detect
136 A. J. Popchak et al.

actual changes in what is being measured [31]. change rapidly, the time between repeat mea-
Therefore, reliability can be conceptualized as surements should be longer [30]. As such, test-
the dependability or the predictability of a retest reliability is not a fixed property of an
measure [30]. Reliability is fundamental to instrument but rather a degree of measurement
clinical research. In the absence of it, the consistency when applied to certain populations
researcher cannot be confident in the data that under particular measurement conditions.
is collected, and no definitive conclusion can Therefore, the population in the study which
be made from it [30]. evaluated test-retest reliability must be the repre-
Reliability can be assessed on multiple levels. sentative of the target population of interest. The
When assessing whether multiple items measure reliability coefficients used for test-retest reli-
the same construct, such as with questionnaires ability depend on the type of data that is present.
and interviews, reliability is related to internal For interval or ratio data, the Pearson correlation
consistency [30]. Internal consistency is the coefficient and the intraclass correlation coeffi-
degree to which all items on a scale consistently cient (ICC) are commonly used for normally dis-
measure the underlying condition [30]. Internal tributed data. Test-retest reliability for ordinal or
consistency applies to measures that consist of nominal data is measured with percent agree-
multiple items and therefore is related to errors of ment or Cohen’s kappa. Reliability values are
measurement that are linked to content sampling. placed on a common scale of 0–1.0. If a measure
Two primary approaches exist to measure inter- is perfect, without error, the reliability is 1.0. If
nal consistency, split-half reliability, and test item the measures are full of error, the reliability is
reliability [30]. Split-half measures the extent to 0.0. Generally, coefficient values of less than
which all parts of the test contribute equally to 0.50 indicate poor reliability, 0.50–0.75 repre-
what is being measured and is based on dividing sent moderate reliability, and those greater than
the items on an instrument into two halves and 0.75 suggest good reliability [30]. In clinical tri-
correlating the results [30]. The split-half method als, to ensure the valid interpretation of the find-
is a quick and relatively easy way to establish ings, measures should generally exceed 0.90.
reliability. However, its use is limited to larger However, acceptable reliability is often a judge-
questionnaires which only measure one con- ment call that depends on the knowledge of how
struct. Test item reliability assesses internal con- precise the measurements must be in order to be
sistency through item analysis, where each item used in a meaningful way [30].
on the test is examined to determine how it relates In clinical testing, measurements are rarely
to every other item on the test and to the instru- perfectly reliable [30]. In addition to the limita-
ment as a whole [30]. By examining how each tions of measurement instruments, human sub-
item relates to one another, the test item method ject research adds a level of inconsistency that
does not require nearly the length of a test that the must be acknowledged [30]. The standard error
split-half method necessitates. of measurement (SEM) is a measure of precision
Test-retest reliability (intra-tester and inter-­ that is used to determine such limitations. The
tester) is the degree to which the score remains SEM estimates how well repeated measures are
stable when there is no change in the underlying distributed around a true score. This standard
condition being measured. Test-retest reliability error is directly related to the reliability of a test,
is estimated by measuring individuals two or with a larger SEM being associated with lower
more times over an interval when the individu- reliability and less precision in the scores
al’s condition is expected to remain stable. The obtained.
time between the repeat measurements is an Precision of the measurement can also be
important consideration. When the condition assessed via confidence intervals. A confidence
being measured is expected to change quickly, interval gives an estimated range of values, which
the time between repeat measurements should be is likely to obtain the true score for a number of
short. When the condition is not expected to variations in the population [30]. The specific
15  Framework for Selecting Clinical Outcomes for Clinical Trials 137

limits of the confidence interval are determined to measure. Validity emphasizes the objectives of
by the variability in the data as well as the level of a test and the ability to make inferences from the
confidence the researcher wants to assign to the associated measurements. In essence, validity
point estimate [30]. Confidence intervals typi- determines what you are able to do with the test
cally are presented with 95% confidence but can results [30]. Implied in validity is that the mea-
also be reported as 90 or 99%. The confidence surement has an acceptable level of error or is
interval is reflective of the SEM.  The tradition- reliable. Inaccurate or unreliable measurements
ally used 95% confidence interval is determined cannot provide meaningful measurements [30].
by multiplying the SEM by 1.96. Likewise, the Establishing validity of an outcome measure is
90% and 99% confidence intervals are calculated not as straightforward as establishing reliability
by multiplying the SEM by 1.64 and 2.58, [30]. Often there is no obvious manner to deter-
respectively. mine if an outcome is measuring exactly what it
The minimal detectable change (MDC) is an is intending to measure. Therefore, the researcher
absolute measure of reliability or error and is must determine if the measure has enough valid-
used to determine the threshold for true change in ity to be utilized in research and practice through
a measure, i.e. what amount of change must be various means.
seen to be sure that the difference is not related to Validation procedures are based on the type of
measurement error of the instrument [3, 19]. evidence that can be provided to determine an
When interpreting scores on an outcome mea- outcome’s validity [30]. Content, criterion-­
sure, knowing the minimal detectable change is related, and construct validity are essential ele-
important to ensure the change is not the result of ments that a researcher must determine to some
the measurement error alone. The MDC can be degree before using an outcome measure.
determined with knowledge of the SEM and can Content validity is the degree to which items
be applied to findings in clinical research trials. on the instrument adequately reflect the content
domain that is being measured. Specifically, con-
tent validity addresses the question “are all
15.3.2 Validity important item content included on the instru-
ment and all irrelevant item content excluded?”
Traditionally, validity describes the degree to Content validity is useful with questionnaires and
which an instrument measures what it is intended inventories [30].

Fact Box 15.1: Reliability


Description Statistic used
Internal The extent to which items measure the same characteristic Alpha
consistency
Test-retest The extent to which an instrument is able to measure a variable Pearson r, ICC
with consistency Percent agreement, Cohen’s
Not a fixed property of an instrument kappa
Degree of measurement consistency when applied to certain
populations under particular measurement conditions
SEM Interprets internal consistency and test-retest reliability.
Measure response stability estimates the standard error in SEM = SD (1 - rxx )
repeated scores 95%
CI = score ± (1.96*SEM)
99%
CI = score ± (2.58*SEM)
MDC95 Minimal amount of change that represents a true change from a
change due to variability MDC = 1.96* SEM * 2
138 A. J. Popchak et al.

Criterion-related validity is the degree to lying construct that one is intending to measure.
which the score on the instrument reflects the Construct validity requires one to demonstrate
current or future standing on a gold standard. hypothesized relationships with other measures
Essentially, this is an issue that is related to prog- of the construct.
nosis and how well the score predicts the status of
the individual. When appropriate criterion valid-
ity is present, the outcomes obtained from one 15.3.3 Responsiveness
test could be used as a substitute for an estab-
lished gold standard [30]. There are two standard Responsiveness possesses two major elements,
sub-forms of criterion validity. One sub-form of internal and external responsiveness [13]. Internal
criterion validity is concurrent validity, which responsiveness is the degree to which the score
establishes validity when two measures are changes as the underlying condition that is being
obtained at a similar time. Establishing concur- measured by the scale changes [13]. Essentially it
rent validity is important when one test is consid- is the instrument’s ability to detect change over
ered more efficient, economical, or practical time. External responsiveness reflects the extent
compared to the established gold standard [30]. to which in a measure corresponds to actual
The other form of criterion validity is predictive changes in a reference measure of health status
validity, which specifically establishes that the [13]. With external responsiveness, the measure
outcome on one test can be used to predict the is not of primary interest but rather the change in
score of the criterion test. Strong criterion valid- the external health status standard [13]. In con-
ity allows the researcher or clinician to utilize the trast to internal responsiveness, external respon-
more efficient and generalizable outcome while siveness will depend on the choice of the
acknowledging the similarities to the gold reference health standard and not on the treat-
standard. ments under investigation [13].
Finally, construct validity is the degree to There is a lack of consensus on the appropri-
which scores on the instrument reflect the under- ate statistic for assessing responsiveness; thus

Fact Box 15.2: Validity


Description Determination
Content • Degree to which the outcomes adequately • Subjective process with no statistical
validity reflect the content domain that is being tests to assess
measured • Determined by expert panel, requiring
• Demands that the outcome is free from the several revisions
influence of factors that are irrelevant to the
purpose of the measurement
Criterion-­ • Reflects the ability of one test to predict the • Determined through correlation
related results on another (gold standard) test estimates between the measures
• The most practical approach to validity testing
•  Often separated into:
  – Concurrent validity
  – Predictive validity
Construct • The degree to which scores reflect the • The ability to identify and test theory
validity underlying constructs that it is measuring behind construct is essential
• The degree to which a test measures what it •  Difficult to ever fully achieve:
claims  –  Estimates of sensitivity
 –  Estimates of specificity
 –  ROC curve analyses
  – Correlation analyses
  – Regression models
15  Framework for Selecting Clinical Outcomes for Clinical Trials 139

more than one statistic is often reported [13]. As with sample size. An effect size of 0.5 implies the
the patient’s condition improves or deteriorates, average change score is equal to one-half of the
the score on the measure should change in a simi- standard deviation of the initial scores. A general
lar manner. However, there are a number of fac- interpretation of effect sizes is that small effect
tors that can affect the magnitude of change for a sizes are generally on the magnitude of 0.20,
given instrument. Factors that affect responsive- medium are 0.50, and large are around 0.80 [5].
ness include the patient group (i.e. an acute ver- The SRM is an alternative to the effect size
sus a chronic condition), the type of treatment, and is used to gauge the responsiveness of
the timing of the data collection, and the con- ­instruments to actual clinical change. The SRM
struct of change [2]. Responsiveness statistics is determined by dividing the mean change score
include the effect size, the standardized response by the standard deviation of the change score. In
means (SRM), the minimally clinically important clinical research and in determining which out-
difference (MCID), and the patient acceptable come measure to use, the measure with the larger
symptom state (PASS) [2]. SRM will be more able to detect clinical change
The effect size relates change to the standard [1, 23].
deviation of the initial scores and is expressed in The MCID [16] was first described in order to
standard deviation units. It is a way of quantify- better determine if statistically significant change
ing the extent of change without confounding it in an outcome measure also had clinical signifi-

Fact Box 15.3: Responsiveness


Description Determination
Effect • Relates change to the standard deviation of the
size initial scores Effect size = ([ x1 ] - [ x2 ]) / SD
•  Expressed in standard deviation units
• Manner in which to quantify the extent of change
without confounding it with sample size
• Places more emphasis on the size of the effect
than statistical significance
SRM •  Reflects variability of change scores
• Provides an estimate of change in the measure,
standardized relative to the between patient ( ) (
SRM = x2 - x1 / SD x2 - x1 )
variability in change scores
•  Removes dependence on sample size
• Low level of variability in change scores in
relation to mean change will have a large SRM
value [13]
MCID • Smallest amount of change that is identified as •  Numerous methods to determine
clinically important •  Requires a patient-centred anchor
•  Would dictate a change in patient management • May vary for different populations under
different conditions
PASS • Highest level of symptom beyond which the •  Requires anchoring question
patient considers themselves well • Use ROC curve analyses to determine
•  An absolute level of wellbeing cut-point
•  Relatively stable values
Ceiling • All scores cluster at or near the maximum score or • Frequency of highest possible score
effects best possible outcome achieved by ≥15% of subjects
•  Restricts variability as ceiling of test is too low
Floor • All scores cluster at or near the minimum score or • Frequency of lowest possible score
effects worst possible outcome achieved by ≥15% of subjects
•  Restricts variability as floor of test is too high
140 A. J. Popchak et al.

cance or significance to the patient [6]. Given this


context, the MCID represents the smallest differ- Clinical Vignette
ence in a score which the patient perceives as When designing a research study to exam-
beneficial. Unfortunately, there are a number of ine the outcomes of three separate surgical
methods to calculate the MCID, and no standard procedures for reconstruction of the ante-
method has been identified. Therefore, values of rior cruciate ligament with comparable
MCIDs can have a large amount of variation post-operative rehabilitations, the investi-
leading to a number of problems with interpreta- gators needed to identify the most impor-
tion [6]. One constant in determining the MCID tant outcome measures. Given that all
is the need for a patient-centred anchor, such as three surgical techniques were established
the global rating of change in order to determine and the purpose of the study had to deal
when patient perceived benefit has taken place. In with the overall patient outcome and
essence, the MCID functions as a measure of improvement in symptoms, function, and
responsiveness of a given instrument. However, general quality of life, the outcome mea-
at times the MCID may be less reflective of the sures selected were required to reflect
responsiveness of the instrument and more reflec- those domains. Performance-based mea-
tive of the treatment itself [6]. sures such as range of motion, strength,
The PASS is defined as the highest level of the and proprioception were selected to exam-
symptom beyond which the patient considers ine the effect on impairments and symp-
themselves well [20]. Like the MCID, the PASS toms and were based on factors that are
also requires an anchoring question to identify considered important fundamental aspects
the cut-off. The anchoring question “Taking into that are related to outcomes. Patient-
account all of the activities you have during your reported outcomes included self-reported
daily life, do you consider that your current state pain, condition-­ specific and region-spe-
satisfactory?” has simple response options of yes cific questionnaires, and general health
or no. PASS cut-points appear to be relatively status indices to examine overall quality of
stable over time and are not strongly influenced life. When selecting the outcome mea-
by age or gender [20]. PASS can be determined sures, the investigators researched mea-
for both patient-reported outcomes and clinical sures that were found to have sufficient
measures such as pain and movement ability. reliability and high validity to their patient
When the average scores for the group of population and that were responsive to
interest represent different points on a measure- change. Therefore, the International Knee
ment instrument and the instrument is not able to Documentation Committee Subjective
register equal intervals across the full range of Knee Form (IKDC-­SKF) [14] was selected
measurements, ceiling and floor effects occur over the WOMAC [4] as it has superior
[30]. Ceiling effects technically occur when the validity for the population undergoing
outcome is equal across the range of treatment ACL reconstruction. At the termination of
groups, and all groups cluster at or near the best the study, the investigators were able to
possible outcome level. Conversely, floor effects draw conclusions on multiple aspects of
occur when performance is nearly as bad as pos- the effect of the intervention on impair-
sible in all groups or at the low of the instrument ments, activity and participation limita-
scale. Outcome instruments with either ceiling or tions, and general quality of life. Since the
floor effects render the instrument unable to dis- outcomes were methodologically sound,
criminate between patients at either extreme of the results had good generalizability to the
the scale [32]. Generally, ceiling or floor effects population they were studying and allowed
are said to be present in an instrument when 15% comparison with similar research trials
or more of the respondents achieve the best or available in the literature.
worst level of the score [24, 28].
15  Framework for Selecting Clinical Outcomes for Clinical Trials 141

Applying the concepts of reliability, validity, References


and responsiveness to evaluate a measure of
health-­related quality of life is essential when 1. Evidence-based rehabilitation: a guide to practice.
2nd ed. Thorofare: SLACK Incorporated; 2008.
selecting outcomes for clinical trials. Whether an 2. Beaton DE. Understanding the relevance of measured
outcome measure is performance-based or patient-­ change through studies of responsiveness. Spine.
reported, general or specific, all measures must be 2000;25(24):3192–9.
assessed for acceptable levels of reliability in the 3. Beaton DE, Bombardier C, Katz JN, Wright JG, Wells
G, Boers M, et al. Looking for important change/dif-
measurement, validity in the construct that it mea- ferences in studies of responsiveness. OMERACT
sures, and its ability to detect change that may MCID Working Group. Outcome measures in rheu-
occur with intervention. More generally, outcome matology. Minimal clinically important difference. J
measures should match the level of intervention Rheumatol. 2001;28(2):400–5.
4. Bellamy N, Buchanan WW, Goldsmith CH, Campbell
that is being provided in the clinical trial, with J, Stitt LW.  Validation study of WOMAC: a health
clinical trials addressing impairments, activity status instrument for measuring clinically important
limitations, or participation restrictions having patient relevant outcomes to antirheumatic drug ther-
outcome measures that correspondingly assess apy in patients with osteoarthritis of the hip or knee. J
Rheumatol. 1988;15(12):1833–40.
issues at the impairment, functional, and societal 5. Cohen J. Statistical power analysis. Curr Dir Psychol
levels. Sci. 1992;1(3):98–101.
6. Cook CE. Clinimetrics corner: the minimal clinically
Take-Home Message important change score (MCID): a necessary pre-
tense. J Man Manip Ther. 2008;16(4):E82–E3.
• To comprehensively assess results of clinical 7. Fairbank JC, Couper J, Davies JB, O’Brien JP.  The
trials, clinical outcome measures should Oswestry low back pain disability questionnaire.
include those that promote comparison to the Physiotherapy. 1980;66(8):271–3.
population as a whole, as well as measures 8. Guralnik JM, Branch LG, Cummings SR, Curb
JD. Physical performance measures in aging research.
that are distinctive to the condition and popu- J Gerontol. 1989;44(5):M141–M6.
lation of interest via region- or disease-spe- 9. Hays RD, Bjorner JB, Revicki DA, Spritzer KL,
cific assessments. Cella D. Development of physical and mental health
• Both performance-based and patient-reported summary scores from the patient-reported outcomes
measurement information system (PROMIS) global
outcomes are useful, with patient-reported items. Qual Life Res. 2009;18(7):873–80.
outcomes being a key indicator of patient-cen- 10. Hefford C, Abbott JH, Arnold R, Baxter GD.  The
tred responses. patient-specific functional scale: validity, reliability,
• Consideration of the psychometric proper- and responsiveness in patients with upper extremity
musculoskeletal problems. J Orthop Sports Phys Ther.
ties of reliability, validity, and responsive- 2012;42(2):56–65.
ness of the measure as they relate to the 11. Horn KK, Jennings S, Richardson G, Van Vliet D,
condition of interest is essential when Hefford C, Abbott JH. The patient-specific functional
selecting clinical outcomes to ensure that scale: psychometrics, clinimetrics, and application
as a clinical outcome measure. J Orthop Sports Phys
there is methodological acceptability in the Ther. 2012;42(1):30–42.
measures. 12. Hudak PL, Amadio PC, Bombardier C, Beaton D,
Cole D, Davis A, et  al. Development of an upper
extremity outcome measure: the DASH (Disabilities
of the Arm, Shoulder, and Hand). Am J Ind Med.
15.4 Resources/Websites 1996;29(6):602–8.
13.
Husted JA, Cook RJ, Farewell VT, Gladman
http://www.orthopaedicscore.com DD.  Methods for assessing responsiveness: a criti-
www.rehabmeasures.org cal review and recommendations. J Clin Epidemiol.
2000;53(5):459–68.
https://www.aaos.org/Quality/Performance_ 14. Irrgang JJ, Anderson AF, Boland AL, Harner CD,
Measures/Patient_Reported_Outcome_Measures/? Kurosaka M, Neyret P, et  al. Development and
ssopc=1 validation of the international knee documentation
http://www.ptnow.org/tests-measures committee subjective knee form. Am J Sports Med.
2001;29(5):600–13.
142 A. J. Popchak et al.

15. Irrgang JJ, Snyder-Mackler L, Wainner RS, Fu FH, 25. McHorney CA, Ware JE Jr, Raczek AE.  The

Harner CD. Development of a patient-reported mea- MOS 36-Item Short-Form Health Survey (SF-36):
sure of function of the knee. J Bone Joint Surg Am. II. Psychometric and clinical tests of validity in mea-
1998;80(8):1132–45. suring physical and mental health constructs. Med
16. Jaeschke R, Singer J, Guyatt GH.  Measurement
Care. 1993;31:247–63.
of health status. Ascertaining the minimal clini- 26. Nagi SZ. A study in the evaluation of disability and
cally important difference. Control Clin Trials. rehabilitation potential: concepts, methods, and
1989;10(4):407–15. procedures. Am J Public Health Nations Health.
17. Kirkley A, Alvarez C, Griffin S.  The development 1964;54(9):1568–79.
and evaluation of a disease-specific quality-of-life 27. Noyes FR, Barber SD, Mooar LA.  A rationale for
questionnaire for disorders of the rotator cuff: the assessing sports activity levels and limitations in knee
Western Ontario Rotator Cuff Index. Clin J Sport disorders. Clin Orthop. 1989;(246):238–49.
Med. 2003;13(2):84–92. 28. Paulsen A, Odgaard A, Overgaard S.  Translation,

18. Kirshner B, Guyatt G.  A methodological frame-
cross-cultural adaptation and validation of the Danish
work for assessing health indices. J Clin Epidemiol. version of the Oxford Hip Score: assessed against
1985;38(1):27–36. generic and disease-specific questionnaires. Bone
19. Kovacs FM, Abraira V, Royuela A, Corcoll J, Alegre Joint Res. 2012;1(9):225–33.
L, Tomás M, et al. Minimum detectable and minimal 29. Porter ME.  What is value in health care? N Engl J
clinically important changes for pain in patients with Med. 2010;363(26):2477–81.
nonspecific neck pain. BMC Musculoskelet Disord. 30. Portney L, Watkins M.  Foundation of clinical

2008;9(1):43. research: application to practice. Norwalk: Appleton
20. Kvien TK, Heiberg T, Hagen KB. Minimal clinically & Lange; 1993.
important improvement/difference (MCII/MCID) and 31. Rankin G, Stokes M. Reliability of assessment tools
patient acceptable symptom state (PASS): what do in rehabilitation: an illustration of appropriate statisti-
these concepts mean? Ann Rheum Dis. 2007;66(Suppl cal analyses. Clin Rehabil. 1998;12(3):187–99.
3):iii40–1. 32. Stucki G, Liang M, Stucki S, Katz J, Lew

21. Latham NK, Mehta V, Nguyen AM, Jette AM,
R.  Application of statistical graphics to facili-
Olarsch S, Papanicolaou D, et  al. Performance- tate selection of health status measures for clinical
based or self-­report measures of physical func- practice and evaluative research. Clin Rheumatol.
tion: which should be used in clinical trials of 1999;18(2):101–5.
hip fracture patients? Arch Phys Med Rehabil. 33. Tegner Y, Lysholm J.  Rating systems in the evalu-
2008;89(11):2146–55. ation of knee ligament injuries. Clin Orthop.
22. Leggin B, Iannotti J. Shoulder outcome measurement. 1985;(198):42–9.
Disorders of the shoulder: diagnosis and manage- 34. Vernon H, Mior S. The neck disability index: a study
ment. Philadelphia: Lippincott Williams & Wilkins; of reliability and validity. J Manipulative Physiol
1999. p. 1024–40. Ther. 1991;14(7):409–15.
23. Liang MH, Lew RA, Stucki G, Fortin PR, Daltroy 35. Ware JE Jr, Kosinski M, Keller SD. A 12-Item Short-­
L.  Measuring clinically important changes with Form Health Survey: construction of scales and pre-
patient-oriented questionnaires. Med Care. liminary tests of reliability and validity. Med Care.
2002;40(4):II-45–51. 1996;34(3):220–33.
24. McHorney CA, Tarlov AR.  Individual-patient moni- 36. International Classification of Functioning, Disability
toring in clinical practice: are available health status and Health (ICF) [press release]. Geneva: World
surveys adequate? Qual Life Res. 1995;4(4):293–307. Health Organization; 2001.
Advances in Measuring Patient-­
Reported Outcomes: Use of Item
16
Response Theory and Computer
Adaptive Tests

Andrew D. Lynch, Adam J. Popchak,
and James J. Irrgang

16.1 Introduction to Item respond to an item. As the ability of an individual


Response Theory increases, the probability of choosing a “correct”
response to the item increases.
Most clinical outcomes do not fall into discrete For example, individuals with higher func-
categories (e.g., alive or dead; torn or intact), but tional ability, such as athletes, will have a higher
rather are measured on a continuum (e.g., physi- probability than nonathletes of responding posi-
cal function, pain). This is especially true for tively to an item “Can you run one mile?”.
patient-reported outcomes (PROs), which classi- Therefore, this item will be more likely adminis-
cally place an individual on the continuum of the tered to someone with good functional ability. An
underlying construct being measured via scores in-depth description of the mathematics of the
on fixed-length surveys. An instrument calibrated models underlying IRT is beyond the scope of
with item response theory uses individual items this chapter. The interested reader is directed to
to estimate the location on the continuum or Hays [13].
latent trait being measured by the set of items [4]. IRT calibration assumes that the item bank is
Using multiple items from a single bank of items unidimensional—that it measures a single latent
which all measure the same construct (e.g., trait and can therefore be expressed as a single
mobility or upper extremity function) improves score [13, 17]. In contrast, consider a measure
the precision of that measurement. that asks questions about both physical pain and
Item response theory (IRT) is a modern mea- depression—an individual can have significant
surement method that can help to eliminate physical pain but not be depressed or can be
redundant items and ensure that an item bank is depressed without having physical pain. A single
unidimensional [8]. The underlying premise of score on this hypothetical measure is difficult to
IRT is that the performance of a person on an interpret because it measures more than one
item can be modeled by the characteristics of the dimension of health-related quality of life.
person and the item. Item response theory is a Many traits are measured on extremely broad
family of mathematical models that explains how spectrum. Mobility related to physical function is
individuals at different ability levels should an excellent example of a broad latent trait. At the
low end of the mobility spectrum, an individual
A. D. Lynch (*) · A. J. Popchak · J. J. Irrgang may be bedridden, requiring assistance to roll
Departments of Physical Therapy and Orthopaedic over. At the high end, a world-class decathlete
Surgery, University of Pittsburgh, has very high levels of mobility. Between these
Pittsburgh, PA, USA two individuals lies a broad continuum which
e-mail: adl45@pitt.edu

© ISAKOS 2019 143


V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_16
144 A. D. Lynch et al.

must be considered. Any number of questions 16.2 Computer Adaptive Test


exist, which can place an individual somewhere (CAT) Technology
along the continuum from immobility to very
high levels of mobility. To perfectly place the Computer adaptive testing (CAT) algorithms
individual on the continuum, a PRO should con- choose items that will provide the most informa-
tain a range of questions that measure the full tion about an individual (i.e., items that will give
spectrum of the trait including the lowest and the best estimate of an individual’s position on the
highest end of the trait. In our mobility example, latent trait) [4, 13]. A great benefit of an item bank
this would include asking questions about rolling that is calibrated using IRT is the ability to accu-
over in bed and being able to run the 400-m hur- rately predict the response to many items based on
dles—and everything in between. A measure the individual response to a few items. It is intuitive
with a large number of items would be incredibly that an individual who responds that he cannot
long and burdensome to complete and just as bur- walk 1 mile will also not be able to run 5 miles. If a
densome to interpret. However, a scale with only clinician is interviewing her patient, she would not
ten items may be subject to floor and ceiling ask follow-up questions that do not make sense.
effects as noted in Chap. 15. The clinician would ask questions that provide the
IRT can be used to calibrate a set of items (i.e., most value and information about the function of
an item bank) that will measure the extremes of the individual. Administering an item bank using
function. Once the items are calibrated, this can CAT methods uses algorithms to accomplish the
be used to measure an individual on the contin- same end goal. The CAT algorithms improve the
uum without having to administer a large number efficiency of item administration by presenting
of items [19]. Each response to an item gives an items that are relevant to the individual. Generally,
estimate of an individual’s location on the con- the position on the scale can be determined after
tinuum of the latent trait. administration of four to eight items. This greatly
An item can be mathematically described by reduces respondent burden compared to traditional
the item characteristic curve (ICC). The ICC rep- general measures [e.g., Medical Outcomes Study
resents the “difficulty” and slope of an item. The Short Form 36 (SF-36—36 items)] or region-spe-
difficulty relates to the level of latent trait being cific items [e.g., Knee Injury and Osteoarthritis
measured by the item. For example, an item diffi- Outcome Score (KOOS)—42 items].
culty of 0.5 provides the most information about To illustrate the concept of a CAT, consider
someone whose level of the latent trait is 0.5 logits the following example. Based on the age and sex
above average. A greater slope indicates that an of an individual, the initial question administered
item is more discriminating around the level of dif- to a 25-year-old female might be “how much dif-
ficulty for an item (i.e., a small change in the latent ficulty do you have walking a mile?”. If the indi-
trait is likely to change the response) [13]. To vidual indicates to have no difficulty walking a
select the best item for an individual, we should try mile, the computer algorithm would bypass “eas-
to match the item difficulty as closely as possible ier” items, such as “can you walk a block” and
to the individual’s level of the latent trait. Then we would administer a more “difficult” item, “how
should select the item that has the greatest slope much difficulty do you have running a mile?”.
(i.e., the item with the greatest discrimination). After each item is administered, the individual’s
When items are calibrated using item response level of physical function is re-estimated, and an
theory (IRT), a scaling factor and difficulty rating item that would provide the most information
are assigned to each item, so that the response on given the individual’s current estimate of f­ unction
any single item can be compared to other items in is administered. The process continues iteratively
the bank [20]. Measures calibrated with IRT can until a predetermined number of items are admin-
locate an individual on the functional scale using istered or the level of function is estimated with a
a subset of items in a calibrated item bank. pre-specified level of precision.
Therefore, direct comparisons of individuals are Item banks are designed to measure the most
possible using two completely different item sets. common presentations of the latent trait. As an
16  Advances in Measuring Patient-Reported Outcomes: Use of Item Response Theory and Computer… 145

example, the Patient-Reported Outcome again poor. Considering these results, the Pain
Measurement Information System (PROMIS) Interference CAT may not be suitable for indi-
Physical Function Item Bank is designed to mea- viduals for whom pain is only a minor inconve-
sure the physical function of individuals in the nience, and the Physical Function CAT may not
general population and to distinguish between be suitable for individuals who function at the
someone who is functioning “normally” and highest end of the spectrum. Each of these scales
someone who is limited in some fashion. It was may benefit from replenishment of the item bank
not designed to accurately measure the function of to expand the range of the latent trait being mea-
either elite athletes or those who are nearly bedrid- sured (see below).
den. One way to assess the measurement capabil-
ity of a CAT is to graphically look at a representation
of the location vs. precision. We graph the location 16.3 CAT Administration
of the score along the x-axis and the precision Parameters
associated with that score on the y-axis. Generally,
a U-shaped graph is seen with a range of scores When setting up the CAT parameters, the user
that are associated with a precision (standard error) can create stopping rules for administration of
of less than 3.3 in the center of the graph and less items. The most intuitive of these stopping rules
precise scores at either end. is the number of items to administer. Generally,
However, some measures do not follow this the user can state the minimum and maximum
U-shape, as can be seen in the ability of the number of items to administer, which generally is
PROMIS Pain Interference CAT and Physical set at a minimum of 4 items and a maximum of
Function CAT in Fig. 16.1. Each measure is asso- 12 items for the PROMIS instruments.
ciated with good precision (i.e., standard error of The second stopping rule involves the overall
less than 3.3 for scores between 40 and 60) (the- precision of the location estimate. When estimat-
tas from −1 to 1). However, for Pain Interference ing the position of the individual on the scale of
t-scores less than 40 (scores indicating that pain the latent trait, we want to know how precise that
does interfere with daily function), there is poor estimate is. Having an estimate that your patient
precision (SE  >  5). Similarly, for Physical is functioning at the 50th percentile of mobility is
Function scores greater than 55, precision is helpful if the 90% confidence interval (i.e., the

Fig. 16.1  Sample data Standard Error (Y) vs. Theta (X)
(previously unpublished) Assessment Center Administered Follow-Up CATs
for PROMIS CAT scores
6
in individuals with
orthopedic knee
conditions 5
Standard Error

1
–2 –1 0 1 2
Theta

Pain Interference Physical Function


146 A. D. Lynch et al.

true location) is between the 47th and 53rd per- met after three items, but the minimum was set to
centiles. However, if the 90% confidence interval four, the algorithm will administer a fourth item.
ranges from the 30th to the 70th percentiles, we If the maximum number of items is administered,
are much less confident in our estimate. Therefore, but the precision requirement has not been met,
the administration setup often allows the user to the algorithm will stop administering items.
identify a level of precision with which they are
comfortable for the location estimate. Depending
on the desired use of the data, this can range from 16.4 I nterpreting Score Reports
two to ten standard error points. There is a trad- from Computer Adaptive
eoff in the number of items for being more pre- Tests
cise—with greater required precision, generally
more items must be administered. Scores are typically expressed as a t-score. These
It is important to understand that the number of standardized scores are used when a large, nor-
items to administer generally overrides the preci- mative data set is available and encourages com-
sion requirement. If the precision requirement is parison of the individual to the population
average as is available for the PROMIS measures
[6, 16]. When expressed as a t-score, the expected
Fact Box 16.1: Computer Adaptive Testing mean is 50, and the expected standard deviation
• Computer adaptive testing (CAT) uses is 10. Therefore, we can expect that 68% of indi-
an algorithm to choose an item from an viduals will score between 40 and 60, 90%
IRT-calibrated item bank that will pro- between 30 and 70, and 99.7% between 20 and
vide the most information about an indi- 80. We also then know that a difference of 10
vidual. Therefore, two individuals can points on a t-score between individuals repre-
complete the same CAT-administered sents about one standard deviation.
outcome measure and respond to differ- As mentioned in Chap. 15, psychometric prop-
ent items. erties of an instrument are not fixed. Reliability,
• The score on a CAT for an IRT-­calibrated responsiveness, and patient acceptable symptom
measure considers only the responses to state may change depending on the population.
individual items that are administered to Prior to choosing a CAT measure, the investigator
the individual and not responses to a should assess whether it has been used in a similar
complete item bank. population. The PROMIS Physical Function mea-
• Therefore, a CAT does not need to
administer all items to arrive at a score.
• Because the administration is an esti- Fact Box 16.2: t-Score Interpretation
mate of a score, there is an associated • A t-score normalizes an individual score
standard error with the score. The lower relative to a large, representative popu-
the standard error, the more precise the lation who have also completed the out-
estimate. come measure.
• Most often, a minimum of 4 and maxi- • On a t-score, 50 represents the popula-
mum of 12 items are administered; once tion average and 10 points is 1 standard
four items have been administered and deviation.
an acceptable level of precision for the • Because t-scores are normally distrib-
score estimate has been achieved, test- uted, 68% of individuals will score
ing stops. This makes test administra- between 40 and 60, 90% between 30
tion more efficient than administration and 70, and 99.7% between 20 and 80.
of a full item bank or fixed-length out- • Both CAT administration and short form
come measure. administration will provide a t-score.
16  Advances in Measuring Patient-Reported Outcomes: Use of Item Response Theory and Computer… 147

sures have been used routinely in orthopedic pop- 50; however, the location estimates for someone
ulations [1, 2, 12, 15, 18, 21]. who achieves the highest possible score on the
form is not a very precise measure. On the other
hand, raw summed scores that are not at the max-
16.5 Assessment with Short imum are typically associated with a standard
Forms error of 3 or less, which is usually regarded as
acceptable. Therefore, these forms would have an
Computer adaptive tests obviously require com- obvious ceiling effect when measuring physical
puters and depending on the mode of administra- function at the highest level of function.
tion (e.g., Research Electronic Data Capture In the instance where a broad range of a latent
System (REDCap) or measure-specific web site) trait must be measured, it is possible to select two
an Internet connection. However, there are work- versions of the short form—one for the low end
arounds for scenarios in which an Internet-­ of the spectrum and one for the high end. It is up
connected device capable of administering the to the clinician to judge which version would best
measure is not available. serve the individual patient. In this case, there
Using the item characteristic curves, it is pos- may be a total of 24 items that could be adminis-
sible to create static short forms that resemble tered, but any given individual will only be asked
classic fixed-length patient-reported outcome to complete half of them to arrive at a t-score
surveys. A short form has a fixed number of with an acceptable level of precision.
items, to which the patient responds. There are Alternatively, to determine if the high or low ver-
scores associated with each of the items; how- sion of the short form should be administered, a
ever, scoring a short form is not as simple as single screening item that is highly discrimina-
totaling the points on the form or expressing the tive may be administered first. The response to
achieved points as a percentage of the total points the item would determine if the high or low ver-
possible as is typically done with legacy outcome sion of the short form is administered.
measures.
After totaling the points from all items, the
person scoring the form must refer to a conver- 16.6 Expanding and Improving
sion table to arrive at a t-score and standard error. Content Coverage
Conversion tables are typically available in
administration manuals. IRT-calibrated item banks are superior to classi-
Reviewing a conversion table prior to choos- cally created, fixed-length assessments for mea-
ing a short form can help to identify if a particular suring at the extremes of function, but the item
short form will meet the needs of the clinician-­ bank must cover the complete range of the
researcher. Because each short form “total score” intended trait for precise measurement [9, 11, 14].
is associated with a t-score and a standard error, The ability to improve the psychometric proper-
the clinician-researcher can review the range of ties of a measure by adding and calibrating new
t-scores that are associated with a particular short items—referred to as item bank replenishment—
form and the associated standard errors. As an is a significant advantage of IRT-based measures
example, multiple PROMIS Physical Function [10]. In the scenario where an item bank is deemed
Short Forms are available from HealthMeasures. insufficient to measure a certain aspect of func-
org (e.g., Physical Function 4a, Physical Function tion, a replenishment study may be completed to
6b, and Physical Function 8b). The maximum augment the item bank. Replenishment studies
t-score for each of these measures is 57, 59, and typically seek to expand the upper or lower
60.1, respectively; however, each is associated boundaries of the item bank to improve the range
with a standard error of 5.9 or greater. These of measurement; however, it is also conceivable
short forms are all capable of measuring physical that an item bank may need better coverage in the
function that is above the population average of middle of the spectrum of the latent trait.
148 A. D. Lynch et al.

In a replenishment study, additional items are should be considered in addition to the general
identified via comparison to other existing item measure.
banks and legacy outcome measures and through It can be argued that a region-specific measure
interviews with relevant stakeholder groups [3, 4, related to the hip or a condition-specific measure
7, 10]. For instance, an item bank may be excel- about osteoarthritis gives more information about
lent at measuring general mobility, but may not a particular patient’s situation. However, it is not
have items that appropriately measure mobility possible to compare scores on a shoulder-specific
that is assisted by a cane, walker, or wheelchair. measure and a knee-specific measure. It is possi-
In this case, existing measures could be con- ble to determine how each of those individuals is
sulted, for example, items, and individuals who impacted in general physical function when mea-
routinely get around with assistance may be sured by a general IRT-based CAT.
asked to provide input on what types of items
should be included or how the items should be
worded. This provides a group of candidate items 16.8 Limitations Associated
that may be used to augment the existing bank. with Computer Adaptive
The item bank calibration method is a more Testing
direct method to calibrate new items. In this
instance, an individual is asked to respond to a Legacy patient-reported outcome measures, with
large portion of the existing item bank (possi- a fixed number of items that are always adminis-
bly the entire item bank) and to each new can- tered, are easy to program into the electronic
didate item. Because there is more information medical record (EMR) for administration in the
with which to calibrate the new items, fewer clinic. However, CAT-based measures require
overall participants are needed. However, each specific, proprietary algorithms to be pro-
participant assumes a greater burden by grammed into the EMR. This is not yet a standard
responding to a large number of items. practice for the EMR; therefore, adoption in clin-
Regardless, an item bank is not a completely ical practice is difficult and has not yet become
static entity. It may be augmented and refined routine. Alternatively, research based programs
through multiple administrations. such as the Research Electronic Data Capture
(REDCap) System have the PROMIS CATs built
into their system. Additionally, the vendors of
16.7 Using CATs in Conjunction each program have websites and applications that
with Legacy Measures can administer the CAT-based PROs, usually at a
cost.
Presently, the majority of IRT-calibrated mea- The CAT-based PRO may also lack specific-
sures that can be administered as CATs are gen- ity to the patient or clinician [5]. Because it is a
eral measures. As an example, the PROMIS has general measure of a specific trait, it contains
tools to measure fatigue, pain intensity, pain general items. This is unlike legacy PROs which
interference, physical function, sleep distur- may be joint or region specific and therefore
bance, anxiety, depression, and ability to partici- contain items about a specific injury, symptom,
pate in social roles and activities as their primary or function. Some individuals comment that the
measures. These are not disease- or condition-­ CAT-­based PROs are not specific to their current
specific measures, and therefore, they will not situation, and therefore they do not see value in
ask specific questions about how a recent rotator completing it. To address and overcome this
cuff tear affects sleep or physical function. If concern, it is imperative that the clinician look
disease-specific or body region-specific informa- at the PRO and discuss the results with the
tion is needed, an additional legacy measure individual.
16  Advances in Measuring Patient-Reported Outcomes: Use of Item Response Theory and Computer… 149

Clinical Vignette was functioning at the highest level of


In an orthopedic clinic, a group of part- mobility, where we know measurement
ners provide care for a variety of orthope- precision is poor.
dic presentations. They are curious to While Patient B may not have achieved
understand how each of their patients are the same level of overall mobility, when
functioning on the continuum of physical we consider the Pain Interference scores
function. Therefore, they decide to admin- for this individual, we see an initial trend
ister the PROMIS Physical Function CAT of increased pain after surgery, but ulti-
to each patient at each visit to track func- mately a reduction of five t-score points or
tion and progress. An example of the one-­half of a standard deviation. This is an
t-scores and standard errors are described excellent example of being able to mea-
below for Patient A, who presented with sure two constructs (Physical Function
an ACL tear requiring reconstruction and and Pain Interference) in an individual in
for Patient B, who presented with a degen- only eight items. Pain Interference scores
erative meniscus tear requiring were low for Patient A and did not vary
debridement. over time.
Because the items in the PROMIS However, because the Physical Function
Physical Function Item Bank are all cali- Item Bank does not ask the same questions
brated on a single scale, the scores can be every time the CAT is administered, we do
directly compared. It does not matter that not directly know what physical limitations
the individuals did not respond to the same are present in either Patient A or Patient
items. We can clearly see that Patient A is B.  This would require administration of a
doing better in his overall mobility com- knee-joint-specific outcome measure. We
pared to Patient B at all time points. may also estimate how an individual would
Importantly, we were able to arrive at a respond to item that were not administered
score in only four items for most adminis- based on responses to other IRT-calibrated
trations. The only instance which required items.
more than four items was when Patient A

Patient A: ACL reconstruction Patient B: arthroscopic debridement of degenerative meniscus


Physical function Item Physical function Item Pain interference Item
t-score SE count t-score SE count t-score SE count
Pre-op 47.7 1.9 4 39.1 1.9 4 57.7 1.7 4
3 m 44.7 2.1 4 38.3 1.8 4 65.7 1.7 4
6 m 47.2 2 4 40.6 1.9 4 57.3 1.7 4
12 m 54.6 2.4 4 45.7 1.9 4 52.8 1.8 4
24 m 70.3 4.1 12 48.7 1.9 4 52.6 1.9 4

Take-Home Message better meta-analysis; however, CAT adminis-


• The use of IRT and CAT to administer patient- tration has not become part of routine practice
reported outcomes allows for more rapid yet.
assessment of various aspects of healthrelated • It remains to be seen how IRT- and CAT-
quality of life. administered PROs function in specific patient
• Widespread use of IRT-based measures will populations.
promote comparisons between studies and
150 A. D. Lynch et al.

16.9 Resources and Web Sites 10. Haley SM, Ni P, Jette AM, Tao W, Moed R, Meyers
D, Ludlow LH.  Replenishing a computerized adap-
tive test of patient-reported daily activity function-
• PROMIS Resources via Healthmeasures. ing. Qual Life Res. 2009;18:461–71. https://doi.
net—http://www.healthmeasures.net org/10.1007/s11136-009-9463-5.
• Activity Measure for Post-Acute Care—http:// 11. Haley SM, Ni P, Ludlow LH, Fragala-Pinkham
MA.  Measurement precision and efficiency of mul-
am-pac.com/category/home/ tidimensional computer adaptive testing of physical
functioning using the pediatric evaluation of disability
inventory. Arch Phys Med Rehabil. 2006;87:1223–9.
https://doi.org/10.1016/j.apmr.2006.05.018.
12. Haskell A, Kim T. Implementation of patient-reported
References outcomes measurement information system data
collection in a private orthopaedic surgery prac-
1. Beleckas CM, Padovano A, Guattery J, Chamberlain tice. Foot Ankle Int. 2018;39:517–21. https://doi.
AM, Keener JD, Calfee RP. Performance of patient-­ org/10.1177/1071100717753967.
reported outcomes measurement information system 13. Hays RD, Morales LS, Reise SP. Item response theory
(PROMIS) upper extremity (UE) versus physical and health outcomes measurement in the 21st century.
function (PF) computer adaptive tests (CATs) in upper Med Care. 2000;38:II28–42.
extremity clinics. J Hand Surg Am. 2017;42:867–74. 14. Jette AM, Haley SM.  Contemporary measurement
https://doi.org/10.1016/j.jhsa.2017.06.012. techniques for rehabilitation outcomes assess-
2. Bernholt D, Wright RW, Matava MJ, Brophy RH, ment. J Rehabil Med. 2005;37:339–45. https://doi.
Bogunovic L, Smith MV.  Patient reported outcomes org/10.1080/16501970500302793.
measurement information system scores are respon- 15. Lee AC, Driban JB, Price LL, Harvey WF, Rodday
sive to early changes in patient outcomes following AM, Wang C. Responsiveness and minimally impor-
arthroscopic partial meniscectomy. Arthroscopy. tant differences for 4 patient-reported outcomes mea-
2018;34:1113–7. https://doi.org/10.1016/j. surement information system short forms: physical
arthro.2017.10.047. function, pain interference, depression, and anxiety in
3. Bruce B, Fries J, Lingala B, Hussain YN, Krishnan knee osteoarthritis. J Pain. 2017;18:1096–110. https://
E.  Development and assessment of floor and ceil- doi.org/10.1016/j.jpain.2017.05.001.
ing items for the PROMIS physical function item 16. Liu H, Cella D, Gershon R, Shen J, Morales LS,
bank. Arthritis Res Ther. 2013;15:R144. https://doi. Riley W, Hays RD. Representativeness of the patient-­
org/10.1186/ar4327. reported outcomes measurement information system
4. Bruce B, Fries JF, Ambrosini D, Lingala B, Gandek internet panel. J Clin Epidemiol. 2010;63:1169–78.
B, Rose M, Ware JE Jr. Better assessment of physical https://doi.org/10.1016/j.jclinepi.2009.11.021.
function: item improvement is neglected but essen- 17. McHorney CA, Cohen AS.  Equating health sta-
tial. Arthritis Res Ther. 2009;11:R191. https://doi. tus measures with item response theory: illus-
org/10.1186/ar2890. trations with functional status items. Med Care.
5. Cella D, Chang CH.  A discussion of item response 2000;38:II43–59.
theory and its applications in health status assessment. 18. Minoughan CE, Schumaier AP, Fritch JL, Grawe
Med Care. 2000;38:II66–72. BM. Correlation of PROMIS physical function upper
6. Cella D, et  al. The patient-reported outcomes mea- extremity computer adaptive test with American
surement information system (PROMIS) developed shoulder and elbow surgeons shoulder assessment
and tested its first wave of adult self-reported health form and simple shoulder test in patients with shoul-
outcome item banks: 2005-2008. J Clin Epidemiol. der arthritis. J Shoulder Elb Surg. 2018;27:585–91.
2010;63:1179–94. https://doi.org/10.1016/j. https://doi.org/10.1016/j.jse.2017.10.036.
jclinepi.2010.04.011. 19. Rose M, Bjorner JB, Gandek B, Bruce B, Fries JF,
7. DeWalt DA, Rothrock N, Yount S, Stone AA, Ware JE Jr. The PROMIS physical function item bank
PROMIS Cooperative Group. Evaluation of item can- was calibrated to a standardized metric and shown to
didates: the PROMIS qualitative item review. Med improve measurement efficiency. J Clin Epidemiol.
Care. 2007;45:S12–21. https://doi.org/10.1097/01. 2014;67:516–26. https://doi.org/10.1016/j.
mlr.0000254567.79743.e2. jclinepi.2013.10.024.
8. Fries JF, Bruce B, Cella D. The promise of PROMIS: 20. Tao W, Haley SM, Coster WJ, Ni P, Jette AM.  An
using item response theory to improve assessment exploratory analysis of functional staging using
of patient-reported outcomes. Clin Exp Rheumatol. an item response theory approach. Arch Phys Med
2005;23:S53–7. Rehabil. 2008;89:1046–53. https://doi.org/10.1016/j.
9. Haley SM, Ni P, Hambleton RK, Slavin MD, Jette apmr.2007.11.036.
AM.  Computer adaptive testing improved accuracy 21. Zdziarski-Horodyski L, et  al. An integrated-delivery-­
and precision of scores over random item selection in of-care approach to improve patient reported physical
a physical functioning item bank. J Clin Epidemiol. function and mental wellbeing after orthopaedic trauma:
2006;59:1174–82. https://doi.org/10.1016/j. study protocol for a randomized controlled trial. Trials.
jclinepi.2006.02.010. 2018;19:32. https://doi.org/10.1186/s13063-017-2430-5.
Part III
Basics in Statistics: Statistics Made Simple!
Common Statistical Tests
17
Stephan Bodkin, Joe Hart, and Brian C. Werner

17.1 Introduction through patient questionnaires and medical chart


reviews. Prospective studies are those that make
17.1.1 Common Research Designs observations forward in time. A longitudinal
study, which is prospective in nature, makes
Research studies can be either prospective or ret- repeated observations of the same variables over
rospective in nature. Retrospective studies are time (Fig. 17.1).
performed by identifying a target population and Observational studies, often referred to as
making observations backwards in time. descriptive studies, are those in which no treat-
Common retrospective studies are carried out ment or intervention is administered to the sub-

Fig. 17.1  Timing of Past Present Future


common research
designs. Cross-sectional
studies involve
collection of similar
subjects at different time Cross-sectional study
points. Prospective
studies identify cases of
interest and follow them Prospective cohort
forward in time.
Retrospective studies Retrospective cohort
will identify cases and
observe characteristics
backwards in time; Case-control study
common examples of
retrospective studies are Randomized, controlled, trials
chart reviews

TIME

S. Bodkin · J. Hart · B. C. Werner (*)


Department of Orthopaedic Surgery, University of
Virginia, Charlottesville, VA, USA
e-mail: BCW4X@hscmail.mcc.virginia.edu

© ISAKOS 2019 153


V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_17
154 S. Bodkin et al.

jects. Due to this, no casual effect can be cohort of interest are compared to those of a con-
established. Observational studies can be either trol group.
retrospective or prospective in design. Cross-sectional studies involve data observa-
Observational studies are often inexpensive and tions of a particular study cohort at one time
take less time to complete compared to interven- point with no follow-up. These studies are
tion studies; however, as discussed later in this thought of being a “snapshot” of the cohort of
chapter, their results possess a lower level of evi- interest. Though not as strong as longitudinal
dence. These studies are commonly performed to studies, with collection of similar subjects at dif-
develop hypotheses on which larger scale studies ferent time points post-diagnosis, postsurgery,
can be performed. etc., the researcher may get an idea of how the
characteristic changes over time. A cross-­
sectional study is useful to describe the frequency
17.1.2 Observational Research of a characteristic of interest to make associations
Designs with other variables over time.

Case reports represent the reporting of outcomes


or descriptions of one patient. The patient usually 17.1.3 Experimental Research Designs
presents a unique injury or illness which may be
difficult to collect and study in larger numbers. In comparison to observational studies, experi-
There can be no attempt in data analysis with mental studies involve the allocation of treatment
case report studies. A case series represents the or intervention to a group of subjects in order to
reporting of clinical outcomes of multiple test the benefit of a particular treatment. Data for
patients with the same injury, illness, or treatment experimental studies are measured prospectively
management. The number of subjects in a case to determine causal relationships. Studies that are
series can range from a few to a couple dozen experimental, involve human subjects, and incor-
depending on the prevalence of the characteristic porate randomization of group allocation are
of interest. Both case reports and case series do called clinical trials.
not have a control group and do not utilize an There are multiple designs for experimental
experimental design. They are not designed to studies. The “gold standard” in regards to study
provide information about frequency or distribu- evidence and establishing causal relationships is
tion of the pathology of interest. the randomized controlled trial (RCT). An RCT
Cohort studies, which can be either prospective incorporates a sample of subjects and then
or retrospective, are aimed to identify and study a assigns individuals to an intervention and control
group of subjects with similar characteristics, group through unbiased or random allocation.
whether that is a pathology or treatment. Common A type of experimental study design is the
data collected within a cohort study includes sub- crossover design. In this study, subjects receive
jective (patient-reported) and objective (clinician- either interventions (or the intervention and con-
measured) outcomes after injury, surgery, or the trol treatment) in a particular sequence
characteristic of interest. Statistical tests can be (Fig. 17.2a). Due to every subject receiving both
used to find associations and differences among interventions, fewer subjects are needed for a
groups and can be interpreted for clinical impor- crossover study. With this design, it is important
tance. The difference between a prospective cohort to make sure that a washout period is imple-
study and randomized control trial is the lack of mented to make sure that the effects of one treat-
randomization of a treatment or intervention. ment is not carried into the collection period of
Case-control studies are a form of cohort the other. In a parallel design, independent
study; however, comparisons are made between groups of patients are assigned to only receive
groups of subjects based on an outcome rather one treatment (Fig. 17.2b). Treatment allocation
than an exposure. Data received from the study can occur randomly or nonrandomly.
17  Common Statistical Tests 155

Fig. 17.2  Designs for a


prospective,

Randomization
T1 T2 T3 T4
experimental research.
(a) Parallel study design.
Study Sample
(b) Crossover study
design
T1 T2 T3 T4

Randomization

Crossover
Study Sample T1 T2 T3 T4

Descriptive statistics such as mean, median,


Fact Box 17.1 and mode are measures of central tendency used
Level of evidence is based on the quality of to describe distributions of continuous data. The
the research design in order to influence dispersion of the distribution can be character-
clinical practice. Less potential threats to ized through the standard deviation, range, and
validity is represented in higher levels of percentiles.
evidence, resulting in greater cause and The mean is an average calculated as the sum
effect establishment. of all scores in the sample divided by the number
of subjects in the sample. The median is calcu-
lated by dividing a distribution of scores into two
equal parts so that half of the scores fall below
17.2 Hypothesis Testing the median and half of the scores fall above the
median. The mode is defined as the value in the
A study population includes all persons having distribution that occurs most frequently.
a common characteristic of clinical or scientific The standard deviation describes the spread,
interest. Due to it being impossible to observe or variability, of the data in a study sample. A
or study the entire population, a smaller sample higher standard deviation means a higher vari-
of subjects is collected. An assumption is made ability or a wider range of data points. The range
that our study sample accurately represents the is defined as the difference between the largest
entire population of interest. Random sampling and smallest observations in the study sample.
rids of bias and allows all members of the popu- The spread of the distribution that relate to the
lation an equal chance to be included in the rank order of the values can also be characterized
study sample. by percentiles or sometimes called quantiles. For
All data collected from a study subject can be example, a value x at the 80% percentile of the
categorized as continuous or categorical. distribution indicates that 80% of the data points
Continuous data have an infinite number of pos- in the sample are less to or equal to x. The median
sible values. Examples of continuous variables of the sample would be the 50th percentile.
are height, weight, age, and time. Conversely cat- The measures of central tendency and vari-
egorical or discrete data have a limited number ability can be used to characterize the data fre-
of possible values. Categorical data could be quency distribution. In a normal or Gaussian
dichotomous or binary in nature (e.g., failure/ distribution, the data set clusters evenly and sym-
success) or categorical (e.g., mild, moderate, metrically around a value that is the mean,
severe). Data can be expressed in frequency dis- median, and mode. In a normal distribution, 68%
tributions to summarize or describe characteris- of the values are within one standard deviation,
tics of the study sample. 95% of the values are within two standard devia-
156 S. Bodkin et al.

tions, and 99.7% of the values are within three or differences among groupings of data.
standard deviations of the mean. Inferential statistical tests are divided into para-
Skewness is a measure of symmetry or central metric and nonparametric statistics. Optimally,
tendency of the data distribution. Data distribu- researchers want inferences of the study sample
tions can be skewed to the left (negative) or to the to represent a parameter (i.e., the population
right (positive). An excessive outlier in the data you are studying), so default statistical tests
can often cause skewness of the distribution should be parametric. However, parametric sta-
(Fig. 17.3). tistics are most powerful when the dataset is
Kurtosis describes the peak or variance of the normally distributed. If the dataset is skewed or
distribution. A high kurtosis value would be kurtotic, statistical tests are less robust. There
indicative of a high peak in the data distribution are equivalent nonparametric tests that are simi-
or a low sample variance. A low kurtosis value lar in concept to parametric tests, only more
would represent a flat peak in the data distribu- appropriate for datasets that don’t meet the
tion or a high sample variance. assumptions necessary to justify parametric
tests. The decision on which test to use is based
on the type of data in the data set, the distribu-
17.2.1 Inferential Statistics tion of data, as well as the research question and
study design.
Inferential statistical tests are aimed to general- Parametric statistical tests use the mean and
ize data collected from a representative sample standard deviation of the distribution to compare
to the entire population. These statistics can be groups or identify relationships between vari-
used to test hypotheses in terms of relationships ables. Nonparametric statistical tests use the
medians and ranks of the data and are less sensi-
tive to outliers and more robust. Nonparametric
tests are also applied for categorical data and
samples with a small sample size.

a 17.2.2 Tests to Compare Two Groups


of Continuous Data

The t-test is used to compare continuous vari-


ables between two groups and can be used for
b both paired and independent samples. The inde-
pendent sample t-test (also known as student’s
t-test) is used to compare continuous data col-
lected from two different groups. A between-­
subject design that aims to compare two groups
of data would use the independent sample t-test.
The nonparametric equivalent of the independent
c sample t-test is the Mann-Whitney U-test.
The paired sample t-test or the dependent
sample t-test is used to compare data from within-­
factor research designs that aims to collect two
groups of data—typically from repeated/serial
measurements from the same source/subject. The
Fig. 17.3  Skewness of a data distribution. (a) Negative
(left) skew. (b) Normal distribution. (c) Positive (right) nonparametric equivalent of the paired sample
skew t-test is the Wilcoxon signed-rank test.
17  Common Statistical Tests 157

impact of time. These serial time points could be


Clinical Vignette 1 used to assess effect of an intervention pre- and
The study Comparison of Patellar Tendon posttreatment or the progression of disease or ill-
and Hamstring Tendon Anterior Cruciate ness over time. The Friedman test is the nonpara-
Ligament Reconstruction: A 15-Year metric test equivalent to the repeated measures
Follow-Up of a Randomized Controlled ANOVA.
Trial [1] utilized student’s t-test to compare When there is more than one dependent vari-
measures of subjective function, effusion, able being compared between groups, a multi-
range of motion, and radiographs in indi- variate analysis of variance (MANOVA) should
viduals who received a hamstring graft or be used. A MANOVA may be justified if more
patellar tendon graft. In this study, the inde- than one dependent variable is needed to describe
pendent variable was the group (hamstring the outcome of interest. For example,
graft, patellar tendon graft), and the depen- if the outcome of interest is knee degeneration
dent variables were clinical outcomes (sub- in patients following different surgical tech-
jective function, effusion, range of motion, niques, the researchers may want to quantify
and radiographic outcomes). degeneration in more than one variable (joint
space narrowing and the number of osteophytes).
For this example, the different surgical tech-
17.2.3 Tests to Compare Three or niques would be the independent variable, and
More Independent Groups the joint space and number of osteophytes
observed from radiographic imaging would be
The analysis of variance (ANOVA) is the statisti- our dependent variables.
cal test when comparing continuous variables An analysis of covariance (ANCOVA) is a
between three or more groups. The independent statistical test used when a confounding variable
variable is the nominal or ordered variable used needs to be accounted for. The ANOVA then can
to categorize the data (example groups for a study be adjusted to establish groups’ differences using
evaluating knee function: ACL-reconstructed, the covariate as a “statistical control.”
ACL-deficit, healthy control). The dependent Similar to the ANOVA, post hoc testing is
variable is the continuous measure that is the needed to determine group differences for
result of manipulating the independent variable repeated measures, ANCOVA, and MANOVA
(e.g., steps per day). Comparison to a t-test that testing. The Kruskal-Wallis test is the nonpara-
uses the t-distribution to compare groups, the metric test equivalent to the ANOVA and should
ANOVA utilizes the F-distribution. The ANOVA be used when the data is categorical or not nor-
will provide an output to inform if the groups are mally distributed and samples are independent.
significantly different (p < 0.05) or not (p > 0.05). Similar to Wilcoxon rank-sum test, the Kruskal-­
The test will not be able to inform where the spe- Wallis pools the observations from all compari-
cific differences within the groups lie. Post hoc son groups and assigns ranks to each. The average
(Latin for “after the fact”) tests are then used ranks are then compared between groups rather
after obtaining a statistically different (p < 0.05) than group means. Difference among the average
ANOVA to determine exact group differences. ranks determines whether there are differences
Post hoc testing is required as each group com- among groups. Much like the ANOVA models, a
parison may not be the same in a statistically sig- nonparametric post hoc test would be required to
nificant ANOVA. determine the exact differences that exist among
Repeated measures ANOVA are used when multiple (three or more) groups of data.
the same dependent variables are collected at In addition to assessing the differences of
serial time points. A repeated measures ANOVA more than two groups, an ANOVA can be used to
assesses the outcome measures in the same sub- assess effects and interactions when multiple
jects in a longitudinal design, assessing the independent variables are of interest. Common
158 S. Bodkin et al.

grouping variables in research designs are sex other set alpha) is observed, the test would be
(female, male) and treatment (treatment, con- deemed statistically significant. Therefore, the
trol). This would be analyzed through a 2 × 2 fac- conclusion would be to reject the null hypothesis
torial design due to both independent variables and that there are statistical differences between
having two levels. Factorial designs that are nor- groups. If the p-value is above the established
mally distributed can be analyzed through alpha, the results conclude that any observed dif-
ANOVAs, MANOVAs, ANCOVAs, or a repeated ferences would be due to chance and not due to
measures ANOVA (often called a split-plot any interventions or grouping, therefore, accept-
design). A factorial repeated measures ANOVA ing the null hypothesis.
(or split-plot ANOVA) compares the multiple A Bonferroni correction is used to adjust the
groups over serial time points. For example, a p-value or level needed for statistical signifi-
research study which aimed to see the effect of an cance, when more than one comparison is being
intra-articular injection (treatment group, corti- performed between the groups (Table  17.1). As
costeroid; control group, saline) on knee joint the number of comparisons you make between
pain over time (1-week postinjection, 2-week the group increases, your likelihood of finding a
postinjection, 3-week postinjection) would uti- statistical significant result also increases. To pre-
lize this study design. The two factors in this vent this, or commonly referred to as “protecting
study would be group and time. This design your alpha,” a Bonferroni correction will be
would allow the ability to see the difference in applied to lower your threshold of significance
pain between groups, time points, and/or an for a greater number of comparisons being made.
interaction of group (i.e., treatment allocation) This is applied by dividing your alpha (typically
and time. 0.05) by the number of comparisons you are
making. This new threshold for defining statisti-
cal significance is an adjustment due to the pos-
Clinical Vignette 2 sibility of error from making multiple
The study Age Influences Biomechanical comparisons.
Changes after Participation in an Anterior Statistical power is the ability to detect group
Cruciate Ligament Injury Prevention differences or an association when one actually
Program utilized a Univariate ANOVA to exists. Type II errors (beta) occurs when a statis-
assess biomechanical differences between tical test infers that there are no differences
preadolescents and adolescent athletes who between groups when one actually exists. A typi-
did and did not participate in a training pro- cal established beta is 0.2 or up to 20% of the
gram [2]. time. Therefore, statistical power (1 − β) should
be greater or equal to 0.80 or able to detect differ-
ences or relationships between variables at least

17.2.4 Determination of Significance


Table 17.1  Interpretation of the p-value
Probability values (p-value) are associated with Null H0 = H1 p-value > a Accept the null
hypothesis priori set hypothesis.
inferential statistics to determine if a test statistic
alpha Conclude no
is statistically significant. Traditionally, a p-value differences
(or alpha) is set at a threshold of 0.05. This between
p-value refers to the probability that the result groups
from the statistical test is due to purely chance Alternate H0 ≠ H1 p-value < a Reject the null
hypothesis priori set hypothesis.
alone or, in other words, would occur once out of alpha Conclude
20 tests. The null hypothesis for any statistical differences
comparison is that there are no differences between
between groups. If a p-value less than 0.05 (or groups
17  Common Statistical Tests 159

ings. A stronger effect size will be present with


Fact Box 17.2 greater mean differences and smaller data vari-
“Protecting Alpha Error” If you are going ability. Effect sizes range from zero to infinity,
to make three comparisons between your with values closer to zero representing a weaker
study groups, a Bonferroni correction can effect. One example of effect size classification is
be used as follows: 0.05 ÷ 3 = 0.017. The Cohen’s D. Cohen’s D effect size values are con-
alpha level of a test would need to be sidered “weak” if less than 0.2, “medium” if less
≤0.017 (compared to 0.05) to be consid- than 0.5, and “strong” if greater than 0.8.
ered statistically significant.

17.2.5 Tests for Categorical Data


80% of the time if they exist. Statistical power is
increased when sample sizes increase or when The chi-square (χ2) test is an appropriate test
data variability decreases. Power analyses utilize when there are two or more groups of categorical
these relationships and can be performed from data. Rather than the distribution used in continu-
established surrogate data or preliminary pilot ous data, the frequency of results would be ana-
data to determine how many subjects are required lyzed using the chi-square test. Fisher’s exact test
to provide the greatest chance of finding a statis- is another test used to analyze categorical data;
tically significant result (Table  17.2). Properly however, it is more so utilized for small samples
preformed power analyses should be performed or when one or more categories contain few or no
with limiting the possibility of type I error to 5% data points.
(p = 0.05) and with statistical power being greater
than 80% (type II error).
A confidence interval quantifies the variance Fact Box 17.3
error around the point estimation of the study Inferential statistics are used to test specific
sample. Confidence intervals are calculated from hypotheses of study subjects or sample
the study means and measures of variance of the acome measure of interest. The indepen-
distribution are which a population parameter dent variables are the groupings of the data
will fall. Samples with a greater variability will that is observed or manipulated by the
have larger confidence intervals. A 95% investigator.
­confidence interval is often utilized in clinical
research and describes a range of values in which
researchers are 95% certain that the actual mea-
sure of the population will fall within those 17.2.6 Measures of Association
values.
Effect sizes measure the magnitude of the Correlations are used to describe the strength of
treatment effect and are often useful when deter- the relationship or association between two vari-
mining the clinical importance of research find- ables. The Pearson product-moment correlation
coefficient (r) is the test utilized to assess the rela-
tionship between two normally distributed, con-
Table 17.2  Factors accepting sample size
tinuous variables. The r value can range from −1
Effect on to 0 to +1. Values closer to −1 or + 1 represent
Condition sample size
stronger relationships and values closer to 0 rep-
Variability in outcome measure Increases
increases resent weaker relationships. Positive values rep-
Significance level (alpha) decreases Increases resent directive relationships; a high value in one
Required power (1 − β) increases Increases variable would often be seen with a high value in
Effect size or mean difference Decreases the other variable. Conversely, a negative value
increases between groups would represent an indirective relationship. A
160 S. Bodkin et al.

negative r value would be indicative of a high Table 17.3  Interpretation of the intraclass correlation
coefficient and Cohen’s kappa coefficient
value in one variable associating to a low value of
the other variable. Pearson classification consid- Intraclass correlation
(ρI) Cohen’s kappa (κ)
ers correlation coefficients less than 0.33 “weak,”
Poor reproducibility <0.4 Marginal
those less than 0.66 “moderate,” and those greater reproducibility
than 0.66 “strong.” Fair to good ≥0.4, Good
The nonparametric equivalent to the Pearson reproducibility <0.75 reproducibility
correlations coefficient is Spearman’s rho, which Excellent ≥0.75 Excellent
reproducibility reproducibility
should be used for non-normally distributed data
or categorical data.
Regression describes the ability to predict a to determine the consistency of the results.
specific outcome variable and is expressed in a Interpretation of the intraclass correlation coef-
coefficient of determination (R2). In comparison ficient (ICC or ρI) is defined in Table 17.3.
to correlation, a regression analysis has an Cohen’s kappa coefficient (κ) is a measure of
intended outcome variable that is explained by interrater agreement for categorical data. Kappa
one or more predictor variables. An outcome is usually used as a measure of reproducibility of
variable is the variable singled out to be predicted
by the other variables, often called the dependent
variable. Predictor variables, or independent vari-
ables, are variables used to make the prediction. Clinical Vignette 3
The use of one predictor is known as simple lin- In the study The Reliability of Assessing
ear regression compared to the utility of multiple Radiographic Healing of Osteochondritis
predictors, known as multiple regression. The Dissecans of the Knee, ICC values were
coefficient of determination ranges from 0 to 1, used to determine interrater and intrarater
where a higher value is indicative of a greater reliability of assessing OCD healing on
proportion (%) of variance explained and a better plain radiographs [3]. In this study, multi-
predictive value. Both simple linear and multiple ple surgeons evaluated the radiographic
regression are used when the outcome variable is healing of the knee at two time points, min-
continuous. When the outcome variable is imally 1  month apart. Healing of OCD
­categorical (often dichotomous) or not normally lesions was found to have excellent inter-
distributed, logistic regression should be used. rater reliability (ICC  =  0.94), indicating
high agreement among the raters.

17.2.7 Tests of Agreement Between


Variables (ICC and Kappa)

The intraclass correlation (ICC) should be used Fact Box 17.4


to assess the consistency or reproducibility of Statistical tests for comparisons should
quantitative measures. These statistics are similar match the purpose of the pursued research
to other correlation coefficients, only they assess study. Analyses should be run to accom-
agreement between arrays of data. These tests are plish the goals of the researcher. If the
often used in reliability and validity research interest of the study is to find differences
studies to evaluate concordance among two out- between groups, t-tests or ANOVA’s should
comes or measurements of interest. An intraclass be performed. IF the goal of the study is to
correlation could also be used to assess the same find associations or prediction of variables,
outcome being examined by different individu- correlations or regression analyses should
als. For example, a radiograph of knee osteoar- be performed.
thritis could be sent to multiple reading evaluators
17  Common Statistical Tests 161

qualitative outcomes between repeated assess- cance does not always mean clinical
ments of the same variable. Guidelines for evalu- meaningfulness.
ating kappa are in Table 17.3.

17.3 Useful Resources


Fact Box 17.5
Hung M, Bounsanga J, Voss MW. Interpretation
• Level of evidence is based on the quality of correlations in clinical research. Postgraduate
of the research design in order to influ- Medicine. 2017;129(8):902–906 [4].
ence clinical practice. Less potential Barkan H.  Statistics in clinical research:
threats to validity is represented in Important considerations. Ann Card Anaesth.
higher levels of evidence, resulting in 2015;18(1):74–82 [5].
greater cause and effect establishment Kocher MS, Zurakowski D. Clinical epidemi-
(Fig. 17.2). ology and biostatistics: A primer for orthopaedic
• Inferential statistics are used to test spe- surgeons. Journal of Bone and Joint Surgery-­
cific hypotheses of study subjects or American Volume. 2004;86A(3):607–620 [6].
sample data. The dependent variable is Petrie A.  Statistics in orthopaedic papers.
the outcome measure of interest. The Journal of Bone and Joint Surgery-British
independent variables are the groupings Volume. 2006;88B(9):1121–1136 [7].
of the data that is observed or manipu-
lated by the investigator.
• Statistical tests for comparisons should References
match the purpose of the pursued
research study. Analyses should be run 1. Webster KE, Feller JA, Hartnett N, Leigh WB,
Richmond AK.  Comparison of patellar tendon and
to accomplish the goals of the researcher. hamstring tendon anterior cruciate ligament recon-
If the interest of the study is to find dif- struction: a 15-year follow-up of a randomized con-
ferences between groups, t-tests or trolled trial. Am J Sports Med. 2016;44(1):83–90.
ANOVA’s should be performed. IF the 2. Thompson-Kolesar JA, Gatewood CT, Tran AA,
Silder A, Shultz R, Delp SL, et  al. Age influences
goal of the study is to find associations biomechanical changes after participation in an ante-
or prediction of variables, correlations rior cruciate ligament injury prevention program.
or regression analyses should be Am J Sports Med. 2018;46(3):598–606. https://doi.
performed. org/10.1177/0363546517744313.
3. Wall EJ, Milewski MD, Carey JL, Shea KG, Ganley
TJ, Polousky JD, et  al. The reliability of assessing
radiographic healing of osteochondritis dissecans of
the knee. Am J Sports Med. 2017;45(6):1370–5.
Take Home Messages 4. Hung M, Bounsanga J, Voss MW.  Interpretation
of correlations in clinical research. Postgrad Med.
• The first step is selecting a statistical analysis 2017;129(8):902–6.
is to determine they type of level of the data 5. Barkan H.  Statistics in clinical research: important
that will be analyzed. considerations. Ann Card Anaesth. 2015;18(1):74–82.
• Statistical significance tells us that the finding 6. Kocher MS, Zurakowski D.  Clinical epidemiology
and biostatistics: a primer for orthopaedic surgeons. J
unlikely occurred by chance, rather that it is a Bone Joint Surg Am. 2004;86A(3):607–20.
reflection of the population. Statistical signifi- 7. Petrie A. Statistics in orthopaedic papers. J Bone Joint
Surg Br. 2006;88B(9):1121–36.
The Nature of Data
18
Clair Smith

18.1 List of Definitions Example: Mean, variance, median, minimum,


and maximum.
Data: A collection of data points organized into Sample: Data that is collected/observed. A
one or more variables of interest. small subset of the population of interest is
Example: The set of all responses to a survey presented.
given to a group of people, all measurements Example: A random sample of residents of a
taken from the mice in an animal study, etc. certain neighborhood and all people hospitalized
Variable: A measureable characteristic such for a heart attack at one of three local hospitals
as blood pressure, age, or gender. during a certain period of time.
Example: Treatment group, marital status, Population: The group of all subjects
diabetes status, systolic blood pressure, blood researchers are interested in studying.
glucose level, etc. Example: The set of all women currently liv-
Observation: A single datum. In clinical data ing in the USA, the set of all people experiencing
this will often be a measurement taken from a lower back pain in the USA, and the set of all
person or animal. Also called a data element or mice of a certain species.
data point.
Example: The heart rate of mouse in an ani-
mal study, the cancer status of a cell from a per- 18.2 Types of Data
son in a cancer study, the range of motion of a
knee from a cadaver in a meniscectomy study, There are two major types of data that research-
and the BMI of one person in a study. ers typically deal with in health science: continu-
Statistic: A numerical summary of the data ous and discrete data. The type of data drives
points that make up a variable. This can be calcu- which statistics are used in their analyses.
lated from a sample. Continuous data such as age, height, weight, and
BMI have infinitely many possible values. For
example, age in years can be any positive real
C. Smith (*)
number such as 42 or 37.25. Discrete (also known
Department of Orthopaedic Surgery, as categorical) data has a limited number of val-
University of Pittsburgh, Pittsburgh, PA, USA ues it can take on such as race, treatment group,
Department of Physical Therapy, and study site. For example, if possible values of
University of Pittsburgh, Pittsburgh, PA, USA race on a self-reported survey are black, white,
Bridgeside Point 1, Pittsburgh, PA, USA and other, then everyone taking the survey will
e-mail: cns45@pitt.edu have one of these three values for the variable

© ISAKOS 2019 163


V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_18
164 C. Smith

race. Other less common types of data are count between the mild and moderate level of an
data and censored data. Count data is made up of injury severity variable may not be the same as
whole numbers that represent counts such as the the difference between the moderate and severe
number of falls in a year of follow-up or the num- level. For both types of discrete data, each
ber of heartbeats per minute. While there are observation must belong to exactly one level of
analyses created specifically for count data, it is the discrete variable, and the levels should
often treated as continuous data for simplicity. cover all possible values that exist in the data
Censored data, such as number of years till death set. For example, if the discrete variable race
after having a certain procedure, occurs when the has levels black, white, and other, then each
event researchers are interested in (such as death) observation in the data set must be categorized
may not occur during the study. Another type of as black, white, or other. If a discrete variable
data, longitudinal data, occurs when measure- has only two levels, then it is called a dichoto-
ments are taken repeatedly from subjects over a mous variable. Examples include gender and
period of time. disease status (the disease is either present or
A continuous variable is one whose values can not present).
be any real number. It is meaningful to measure
the distance between values, and arithmetic oper-
ations such as addition and multiplication make 18.3 Data Description
sense for continuous variables but not for discrete
variables. Technically, all continuous variables Summarizing discrete data is simpler than sum-
are measured discretely since we don’t have marizing continuous data. Discrete data is often
instruments that can be measured continuously. described by reporting the frequency and propor-
One can think of continuous data as discrete with tion (or percent) of people belonging to each
lots of levels or categories. For example, blood level of the discrete variable. For example, say
pressure is typically measured to 2  mmHg you were reporting on disease severity in a study
because measuring with higher precision would of 50 people. Your description of the variable
be difficult with the instruments that are used. “disease severity” could be 23 (46%) mild, 12
However, we still treat blood pressure as continu- (24%) moderate, and 15 (30%) severe if 23 peo-
ous since there would be far too many levels to ple in the study had mild disease, 12 had moder-
treat it as discrete. Sometimes discrete variables ate, and 15 had severe. If the variable is
with many levels such as the visual analog scale dichotomous, then it is acceptable to report only
(VAS) for measuring pain or variables measured the frequency and proportion in one level of that
with the Likert scale are treated as continuous for variable. For example, if researchers were sum-
ease of analysis. When treating discrete variables marizing the dichotomous variable “gender” and
as continuous, you are assuming that each level putting it in a table of demographic information
of the variable is equidistant apart. Another for a study, then they could simply report the fre-
example of a continuous variable is the Lysholm quency and proportion of women in the study
scale for assessing ACL injuries which gives a sample. Researchers would not need to include
score from 0 to 100 with higher scores indicating the number of men in the study since this can be
fewer symptoms. deduced by subtracting the number of women in
There are two major types of discrete data: the study from the sample size. Some researchers
nominal and ordinal. Nominal data has no depict the proportions of subjects in each level of
inherent ordering such as gender, race, and a discrete variable in a bar graph. This may be
marital status. Ordinal data can be ordered from appropriate if the publication has no other fig-
low to high such as injury severity, level of edu- ures. However, when other figures are present,
cation, and household income. The differences graphing proportions is superfluous as they are
between the levels of ordinal variables are not described adequately by frequencies and propor-
necessarily equal. For example, the difference tions alone.
18  The Nature of Data 165

The relationship between two categorical vari-


ables is best captured by a 2 × 2 table. In such a n
table, the rows are levels of one categorical vari- median is the average of the numbers in the
2
able, and the columns are levels of the other cat-
n
egorical variable. The cells of the table contain and + 1 positions of the ordered list of values
2
the number of people in the study belonging to
[1]. The median and mean can only be used to
the corresponding levels of the row and column
describe continuous data. The mode of a variable
variables. The last row and last column are typi-
is the most frequently occurring value and can be
cally reserved for totals (also known as
used to describe the central tendency of continu-
margins).
ous or discrete data.
Continuous data contains more information,
Another name for the median is the second
has more properties, and requires more statistics
quartile. The quartiles split the list of ordered val-
to describe it than discrete data. There are differ-
ues into fourths. The first quartile is also called
ent statistics to measure the location (or center),
the 25th percentile and is often denoted Q1. If the
spread (or dispersion), and shape of the distribu-
data is listed in order, then 25% of the values will
tion of values from a continuous variable.
be below or equal to the first quartile. The median
Measures of location seek to describe the cen-
is the 50th percentile (or second quartile) since
tral tendency of the data with a representative
half of the values are less than or equal to it and it
value from it. Examples are the mean (or aver-
is denoted Q2. The third quartile, Q3, is also called
age), median, and mode. The mean of a continu-
the 75th percentile, and 75% of the values are
ous variable is the sum of all the values divided
less than or equal to it.
by the number of values present in that sum. Put
Measures of spread describe how tightly clus-
tered the values are around the mean of continu-
å
n
x
into symbols the mean is i =1 i
, where xi repre- ous data. Examples are the variance, standard
n deviation, range, and interquartile range. The
n variance is the average squared distance from the
sents the ith value, n is the sample size, and åx
i =1
i mean. It is calculated by adding up all of the
says to sum all the values from 1 to n. The letter i squared differences between the mean and each
is called an index. The median is the middle value data point then dividing this sum by the number
of the ordered data. If the values are ordered from of data points minus one. If n is the sample size
smallest to largest (or largest to smallest), then and x is the mean of the sample, then the follow-
the median is the value that has an equal number ing is an equation for finding the variance of the
of observations on either side of it. If the sample
sample: Note that Σ is a symbol
å (x - x)
n 2
size is even, then the median is found by averag- i
ing the two middle values of the ordered data.
i =1
.
n -1
Instead of listing out the ordered values by hand meaning “the sum of” and xi represents the ith
and visually finding the middle value, there is a value in the sample. When put together as in
simple formula that can be used to find the posi- n

å(x - x ) , this means take each value from the


2
tion of the median. Say there are n people in the i
i =1
study and the age of each person is listed from
smallest to largest. If n is odd, then the position of first (i = 1) to the last (i = n), subtract the mean
from it, then square it, then add all these squares
n +1 together. This equation is similar to the equation
the median age is . Note that this equation for the mean except that it is divided by n  −  1
2
will produce the position of the median, not the instead of n. Dividing by n  −  1 leads to a less
value of the median itself. If n is even, then the biased statistic than dividing by n. The standard
deviation is the square root of the variance and
166 C. Smith

thus the average distance from the mean. The are farther in absolute value than observations
range is simply the maximum (largest) value above the median. Left skewed data has a long
minus the minimum (smallest) value. The inter- tail to the left since most of the values are
quartile range (IQR) is the third quartile minus large and there are a few extreme observations
the first quartile. The IQR describes the spread of that are much smaller than the rest. The mean
the central 50% of the data. For all measures of is greater than the median in right skewed data
spread, a higher value indicates that observations and less than the median in left skewed data.
are more spread around the mean and smaller There is a skewness index that measures the
values indicate they are more tightly clustered degree of skewness in the data. The index is
about the mean. zero if the data is symmetric, greater than zero
if the data is right skewed, and less than zero if
the data is left skewed [3]. A normal distribu-
Fact Box 18.1 tion is a symmetrical hill or bell shape with the
majority of the values close to the central
value (the mean) and a few extreme observa-
å
n
x
i =1 i
Mean =  tions on either side of the mean (i.e., in the
n. tails of the distribution). A bimodal distribu-
n +1 tion looks like the two humps of a camel; it
Position of median for odd n = 
2 has two central values. When the data is sym-
metric, the best numerical summaries are the
n æn ö
+ ç + 1÷ mean and standard deviation. When the data is
2 è2 ø skewed, it is best to use the median and inter-
Position of median for even n = 
2 quartile range or interquartile deviation (half
of the interquartile range).
Measures of the shape of a distribution
describe the overall trend of the data. As an
example, a distribution could have mostly Fact Box 18.2
large values with a few extreme outliers, or it
å (x - x)
n
could have values evenly distributed across the 2
i =1 i
range. Some examples of distribution shapes Variance = 
n -1
are symmetric, normal, bimodal, and left or
right skewed. The mean, median, and mode of Standard deviation =  variance
a continuous variable are equal if the distribu-
tion of its values is symmetric. In terms of IQR = Q3 − Q1
symmetric data, the relative position of obser-
vations is the same on either side of the
median. Right skewed (or positively skewed)
data occurs when observations above the Kurtosis is another measure of shape that
median are farther in absolute value than describes how flat or steep the distribution of val-
observations below the median. Another way ues is compared to a bell-shaped (or normal) dis-
of saying this is that the distribution has a long tribution. If there are a lot more observations in
tail to the right. In right skewed data, the the tails of a distribution compared with a normal
majority of the values are relatively small and distribution, then the graph appears flatter than a
close together, and a minority of the values are bell shape. If there are many fewer observations
extreme or much larger in value than the rest. in the tails of a distribution compared with a nor-
Left skewed (or negatively skewed) data mal distribution, then the graph appears more
occurs when observations below the median peaked than a bell shape [2].
18  The Nature of Data 167

All of the descriptive measures discussed


n
thus far are statistics. Statistics are calculated
from a sample that is drawn from the population
of interest. Suppose the goal of a study is to N n
determine whether a new surgical technique for
repairing a joint leads to a better clinical out- n
come than the standard procedure. The popula-
tion of interest in this case would be the set of Fig. 18.1  Samples of size n drawn from a population of
all people with the joint injury who would be size N
eligible for this surgery. In order to determine
whether the new technique is an improvement form the first sample, measurements were
over the old technique, researchers must look at taken on them, and then they were returned to
the outcomes from a sample of people with the the population pool. This way of sampling is
joint injury. It is not possible to observe all peo- called sampling with replacement. This pro-
ple with this injury (the population of interest), cess was then repeated for the second and third
so a sample must be taken. Typically, the sample samples of size n. If we calculated the mean of
is chosen in such a way that every member of the measurements taken on each of the three
the population has an equal opportunity of being samples, we would get three different means
picked for the sample. A sample that is created even though the samples are the same size and
in this way is called a random sample because are taken from the same population. This
each member is chosen at random. This helps to occurs because the three samples consist of
ensure that the sample is representative of the different individuals. It is possible that there is
population, e.g., if half of the population is overlap between the three samples, that is,
women, then roughly half of the random sample some individuals may occur in two or more of
drawn from the population should be women. them. This is possible because the samples
The researchers would then use statistics such were drawn from the population with replace-
as means and standard deviations calculated ment: they were selected, their measurements
from this sample to summarize outcomes of the taken, and then they were returned to the pop-
two surgery techniques. Such an outcome may ulation pool. Thus, every time a sample is
be the range of motion of the repaired joint after drawn from the population, a different sample
it has healed. Since range of motion is a contin- mean is calculated, but each of these means
uous measure, researchers would use a mean or will be a good estimate of the true mean of the
median and standard deviation or interquartile entire population (given that the sample size,
range to summarize it. This example illustrates n, is sufficiently large). The mean calculated
using a sample to make inference about a popu- from the sample of size n is an example of a
lation, the main goal of statistics. sample statistic, and the mean calculated from
Since a sample does not include all members the population of size N is an example of a
of a population, there are multiple ways to draw a population parameter. Sample statistics are
sample from a given population. The number of estimates of population parameters. Population
people in the population of interest is typically parameters are usually unknown since we can-
denoted by N, and the number of people in a given not measure an entire population but we esti-
sample drawn from the population is denoted by mate these parameters by taking a random
n. Figure 18.1 depicts three different samples of sample of the population and calculating sam-
size n drawn from a population of size N. ple statistics. The larger the sample size, the
Suppose the three samples in the figure more confident researchers are that the sample
were drawn in a sequence: n individuals were statistics are good approximations of the pop-
selected at random from the population to ulation parameters.
168 C. Smith

18.4 Visual Displays along the median will result in each half being a
mirror image of the other. A common example of
It is good practice to plot the data before sum- a symmetric distribution is a hill or bell-shaped
marizing and performing statistical tests on it. distribution. Figure  18.2 shows a histogram for
This will give the researcher a sense of the type the 15 BMI measurements in the last example.
of data available. There are various methods for The width of the rectangles in this histogram
describing and analyzing data, and which method is five observations, and the vertical axis is the
to be used depends on the nature of the data. frequency of occurrences. The x-axis shows the
A stem-and-leaf plot is a simple visual display range of values for each rectangle. The histogram
of data points that shows the distribution or shape shows the general hill-shaped trend of the data:
of the values of a continuous variable. An advan- most of the data (9 observations) fall within the
tage of this plot is that it includes the value of range of 23–28  kg/m . This histogram shows a
2

each individual observation. This plot is appro- similar shape as the stem-and-leaf plot turned on
priate when there are a small number of observa- its side. If the width of the rectangles is too large,
tions. As an example, suppose there is a small important information about the shape of the dis-
sample of 15 subject’s BMI measurements that tribution can be lost. The smaller the width of the
have been rounded to the nearest integer. BMI is rectangles, the more detail about the shape of the
a continuous variable and its units are kg/m2. The distribution will be shown. Most statistical pro-
first step of making a stem-and-leaf plot of these grams will automatically choose a width that is
values would be to list them in order: appropriate for the data.
Another visual display for continuous data is
the box plot. The box plot shows the interquartile
18,19, 23, 24, 24, 24, 25, 25, 26, 26, 27, 28, 30, 32, 37
range, the median, and any extreme observations
(i.e., observations that have values that are much
The stem of the plot is made up of the leading larger or much smaller than the rest of the data).
number, and the leaves are made up of the trailing If there is a lot of variability in the data, then the
number. Both the numbers in the stem and in the box and whiskers will be elongated. If there is not
leaves are ordered smallest to largest. a lot of variability, then the box and whiskers will
1 89
| appear squatter. Figure 18.3 shows a box plot of
the BMI example data.
2 | 3444556678
The first quartile of the BMI data is the bottom
3 | 027 line of the box, the median is the middle line in the

It can be seen from this plot that the BMI mea- box, and the third quartile is the top line of the box.
surements are distributed in roughly a hill shape: When the third quartile is farther from the median
most of the values are in the middle and there are than the first quartile, the data is right skewed, and
a few in either tail. If there are more observations,
more than one line can be added for each digit in 10
the stem. 9
A histogram is a graph that shows the shape of 8
7
the distribution of values of a continuous vari-
6
able. The horizontal (or x) axis has the values of 5
the variable, and the vertical (or y) axis has the 4
frequency or proportion of observations. The 3
2
height of each rectangle represents the proportion 1
or frequency of observations whose values fall 0
within the range specified by the width of the [18, 23] [23, 28] [28, 33] [33, 38]
rectangle. If the distribution of values of a vari- Fig. 18.2 Histogram of a sample of 15 BMI
able is symmetric, then cutting the histogram measurements
18  The Nature of Data 169

when the first quartile is farther from the median, of axes. This allows direct comparison of the
the data is left skewed. The histogram and box plot ­distributions of the continuous variable across
of the BMI data in Fig.  18.3 show a slight right various levels of the categorical variable. As an
skew in the shape of the distribution. The whiskers example, consider the BMI data again, but sup-
of the box plot are drawn to the smallest and largest pose there is information on whether the subjects
observations in the sample that are not outliers. were over 25  years old or under 25  years old.
Outliers are defined as values that are greater than Figure 18.4 is an example of a way to visualize
Q3 + 1.5*(IQR) or less than Q1 – 1.5*(IQR). The the relationship between a continuous variable
dots in the box plot are extreme outliers, which are (BMI) and a categorical variable (age category).
defined to be larger than Q3 + 3*(IQR) or smaller From Fig.  18.4 it can be seen that subjects
than Q1 – 3*(IQR) [3]. who are over 25 years old have a higher BMI than
The box plot is a good visual display to use people who are 25 years old or younger.
when comparing a continuous variable between A scatter plot is useful for understanding the
different groups of a categorical variable since relationship between two continuous variables
they can be plotted side-by-side on the same set and revealing potential outliers. Values of one
variable are plotted on the horizontal axis, and
values of the other variable are on the vertical
39
axis. A scatter plot is a quick way to discover
37 potential trends in the data. For example, if higher
35
values of one variable tend to occur with higher
values of the other, then the scatter plot will show
33 this positive relationship. If there is no relation-
31 ship between the two variables, then the scatter
plot will show a random scattering of points that
29
don’t indicate any specific pattern. If most of the
27 points are clustered tightly together, while one or
25
two points are clearly outside of this cluster, then
these points are potential outliers and should be
23 checked for accuracy. Figure 18.5 is an example
21 of a scatter plot using the BMI data. A second
variable, age, has been added to the vertical axis.
19
Figure 18.5 shows that as age increases so
17 does BMI. In other words, there is a positive
relationship between age and BMI. An example
Fig. 18.3  Box plot of 15 BMI measurements
of an outlier for this data is the point (32, 20),
i.e., the point with BMI = 32 and age = 20. While
40

35 45
40
30
35
BMI

25 30
Age

20 25
20
15
15
10 10
25 or younger Over 25 15 20 25 30 35 40
BMI
Fig. 18.4 Side-by-side box plots of 15 BMI
measurements Fig. 18.5  Scatter plot of 15 subjects’ age and BMI
170 C. Smith

the point is medically ­feasible, if it were in the statistics and visual displays. Different types of
plot in Fig. 18.5, we would want to check on its data have different properties, and these proper-
accuracy, because it is so far away from the ties determine which statistical tests are appro-
increasing trend of the rest of the points. Two priate for answering the questions of interest.
continuous variables could also have a negative Statistical tests are based on probability theory
relationship if it were the case that as one and allow the researchers to draw conclusions
increased the other decreased. If the scatter plot about a population based on a sample from that
appears to show a positive relationship for some particular population. This is called statistical
values and a negative relationship for others, we inference and is the overarching goal of statisti-
would say the relationship appears to change cal analysis.
direction. A statistic called a correlation coeffi-
cient classifies the strength of the association
between two continuous variables. References
1. D’Agostino RB, Sullivan LM, Beiser AS. Introductory
applied biostatistics. Belmont: Thomson Brooks/
18.5 Conclusion Cole; 2006.
2. Daniel WW, Cross CL.  Biostatistics: a foundation
The first steps of data analysis should be to for analysis in the health sciences. 10th ed. Hoboken:
determine what types of variables are present Wiley; 2013.
3. Rosner B.  Fundamentals of biostatistics. 8th ed.
and to describe them with appropriate summary Boston: Cengage Learning; 2016.
Does No Difference Really Mean
No Difference?
19
Carola F. van Eck, Marcio Bottene Villa Albers,
Andrew J. Sheean, and Freddie H. Fu

19.1 Introduction are errors in statistical analysis. A type I (or


“alpha”) error refers to the rejection of the H0
In conducting research, it is important to form a when it is really true. However, the most common
sound hypothesis that allows for determination of statistical error is type II (or “beta”) error: failing
the proper primary outcome measure for a par- to reject a H0 when in fact it should have been
ticular study. Additionally, the hypothesis dic- rejected or, more simply said, unjustly conclud-
tates which statistical methods should be applied. ing there is “no difference” between the variables
A hypothesis is a proposed explanation for a phe- in question. The purposes of this chapter are to
nomenon. The null hypothesis (H0) is the default (1) outline several of the more common causes of
position and states there is no difference between type II errors and (2) describe strategies for miti-
the test and control group, whereas the one gating the likelihood committing type II errors
hypothesis (H1) is its rival and states superiority when conducting orthopedic research.
of one procedure/intervention over another. The
p-value is a measure of how much evidence there
is against the null hypothesis. The null hypothe- The most common statistical error is type II
sis is commonly rejected when the p-value is less (or “beta”) error: failing to reject a H0
than 0.05 corresponding to a 5% chance of reject- when in fact it should have been rejected
ing H0 when in fact it is true. Type I and II errors or, more simply said, unjustly concluding
there is “no difference.”
The material in this manuscript has never been published
in its current format. The authors did not receive any out-
side funding or grants directly related to the research pre-
sented in this manuscript. The Department of Orthopaedic 19.2 Lessons Learned
Surgery from the University of Pittsburgh receives fund- from Cardiology
ing from Smith and Nephew, Depuy Mitek, ConMed
Linvatec, Cook MyoSite, and Arthrex not directly related
to the research presented in this manuscript. Dr. Freddie Cardiology is perhaps one of the most advanced
H. Fu is an editorial board member of AJSM and KSSTA. fields in medicine with regard to research and
innovation. The first electrocardiogram (EKG)
C. F. van Eck · M. B. V. Albers · A. J. Sheean was performed by Einthoven as early as 1903
F. H. Fu (*) [76]. His goal was to develop a simple and
Department of Orthopaedic Surgery, University of inexpensive test which could identify conduc-
Pittsburgh, Pittsburgh, PA, USA tion abnormalities of the heart and demonstrate
e-mail: ffu@upmc.edu

© ISAKOS 2019 171


V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_19
172 C. F. van Eck et al.

the anatomic location of the problem [76]. This However, the surgeons implementing these PRO
invention was soon followed by the develop- began to observe a variety of postoperative
ment of angiography which could not only problems, including loss of knee range of
visualize the location and severity of the prob- motion [16], impingement of the new ACL graft
lem but also allowed intervention to be per- [13, 24, 25, 52], and failure of the reconstruc-
formed [6]. More recently, innovation within tion requiring revision surgery [29]. In mid- to
the field of cardiology catalyzed the integration long-term follow-up, a large number of patients
of high-resolution computed tomography (CT) were found to have developed osteoarthritis
scanning to more precisely understand the anat- [15]. In this respect, the limitations of early
omy of the heart’s chambers, valves, and blood PRO partially obscured an adequate apprecia-
vessels [54]. This noninvasive test can be per- tion for some of the complications of ACL
formed in 10 s and allows detection of coronary reconstruction. In these respects, early out-
artery disease involving plaques as small as comes demonstrated that the state of the art in
1.5  mm [77]. This highly specialized applica- ACL reconstruction left much to be desired.
tion of CT technology has excelled research
efforts focusing on treatment of coronary artery
disease and other cardiac conditions. However, 19.4 W
 hy Our Outcomes Are
within orthopedic surgery, this standard has not So Good
yet been paralleled.
Despite surgeons observing the loss of knee
range of motion [16], impingement [13, 24, 25,
19.3 Mission Accomplished 52], graft failure, and early osteoarthritis [29]
in Orthopedic Surgery? aforementioned, orthopedic publications contin-
ued to report good results after transtibial ACL
Anterior cruciate ligament (ACL) injury is one reconstruction. The discrepancy between the
of the most widely studied topics within the poor observed outcomes and the good reported
field of orthopedic sports medicine. Some of the outcomes is likely due to the widespread use of
earliest described techniques for reconstructing instruments that were not truly sensitive to actual
the torn ligament involved a large arthrotomy outcomes and the flawed interpretations that fol-
[42]. However, as with all modern surgery, min- lowed. As recently as 2011, a poll of orthopedic
imally invasive surgical techniques were intro- surgeons demonstrated that 83% of respondents
duced in knee surgery, leading to the based their assessment of surgical outcomes
development of arthroscopically assisted ACL solely on whether the patient was satisfied, rather
reconstruction [53, 59, 64]. Arthroscopic ACL than on KT arthrometer, pivot shift results, or
reconstruction was first performed using a two- long-term clinical follow-up [62]. In spite of
incision technique, in which the femoral bone technically imperfect surgeries, acceptable
tunnel was drilled from the outside-in [53, 64]. patient satisfaction scores were observed.
Over time, a one-incision technique was However, it is entirely possible that this phenom-
adopted, where the femoral bone tunnel is enon is attributable to placebo effect, which has
drilled from inside-out, through the tibial tunnel been shown to exist in approximately 35% of
(transtibial technique) [17]. Both techniques patients. Moreover, it has been shown that
were fast and efficient; unfortunately, neither 50–70% of patients will heal or cope regardless
technique afforded focus on native ACL anat- of the nature and quality of the treatment ren-
omy [33]. Outcome-based research focused on dered [73]. Thus, in the absence of accurate and
these minimally invasive techniques relied precise measures of outcomes, a true understand-
mostly on subjective patient-­reported outcome ing of the actual effect of treatment can easily be
measures (PRO), which suggested good results. obscured.
19  Does No Difference Really Mean No Difference? 173

19.5 I s Evidence-Based Medicine 19.8 Selecting the Appropriate


Really Evidence-Based Outcome Measure
in ACL Reconstruction?
It is important to select the outcome tool that is
A variety of Level I studies published on the most suitable and sensitive in detecting the pri-
topic of ACL reconstruction have focused on var- mary outcome of a study as indicated in the
ious patient-specific and surgical factors. hypothesis. Failure to do so will result in draw-
Lubowitz et al. performed a meta-analysis com- ing an incorrect conclusion. An example is the
paring single- and double-bundle ACL recon- use of KT arthrometer [MedMetric, San Diego,
struction and did not find any difference [28, 43]. CA, USA] testing for knee instability after ACL
Foster et al. and Carey et al. compared allograft reconstruction [70, 72]. Although this is an
to autograft and found no difference [5, 11]. excellent tool for evaluation of anterior-posterior
Holm et  al. looked at hamstring versus patellar laxity, it has little value in assessing rotatory
tendon autograft and found no difference [22]. instability [70, 72]. Relying solely on the KT
Samuelsson looked at graft type and surgical arthrometer to compare ACL reconstruction
technique and found no difference [60]. When techniques is therefore not sufficient. Indeed it
looking at these studies, several causes for type II has been shown by more advanced biomechani-
error can be identified, each of which are reviewed cal testing that nonanatomic techniques specifi-
below. cally fail to restore rotatory laxity, while they
may be sufficient to restore anterior-posterior
laxity [14].
19.6 Study Design

Basic science studies are extremely valuable in


It is important to select the outcome tool
the field of orthopedic surgery. However, the pit-
which is most suitable and sensitive in
falls of improperly designed studies must be rec-
detecting the primary outcome of a study as
ognized. When in vitro testing parameters do not
indicated in the hypothesis. Failure to do so
accurately reflect the in  vivo condition, results
will result in drawing an incorrect
may not necessarily be readily applicable to clin-
conclusion.
ical scenarios. For example, most biomechanical
studies on ACL reconstruction techniques only
use testing systems that applied a small fraction
of the load that the reconstructed knee would be A study on risk factors for the development of
exposed to in a living human subject performing osteoarthritis after ACL reconstruction evaluated
high-level sports activities [37, 38, 46]. 50 patients at 6 years following surgery. The fol-
lowing ten potential factors involved in the devel-
opment of osteoarthritis were assessed:
19.7 Patient Selection meniscectomy, chondral damage, patellar tendon
and Allocation of Treatment grafting, age at surgery, time delay between injury
and surgery, type and intensity of post-­surgery
The process of selecting patients and allocating sport, quadriceps strength, hamstring strength,
them to a treatment groups is extremely impor- quadriceps-to-hamstring strength ratio, and resid-
tant. Specifically, when comparing two groups ual joint laxity. However, despite the fact that the
of treatment, when possible, randomization radiographs shown in the manuscript indicate
should be applied to avoid baseline differences nonanatomic tunnel placement [23], tunnel place-
between the groups, which can confound the ment was not considered as a possible factor for
outcomes. osteoarthritis [30]. However, in accordance with
174 C. F. van Eck et al.

biomechanical studies, this should ideally have detect knee-related disability, there is consider-
been evaluated (Fig. 19.1) [2]. able variability in the types of patients that each
A number of outcome tools—clinician-based of these measures were intended to assess
and patient-reported—have been described to (patients with knee osteoarthritis versus active,
characterize the nature of patients’ knee function athletic patients with non-arthritic knee condi-
[9, 19, 31, 51, 55, 61]. While many of these tions) [39]. It should be noted that patient activity
instruments have been validated in their ability to level is an important prognostic variable, which

a b

Medial Lateral

Fig. 19.1 (a) Pressure map of the articular cartilage of a graph of a left knee after ACL reconstruction showing
left knee after nonanatomic ACL reconstruction showing medial compartment osteoarthritis with joint space nar-
increased pressures (red and black color) in the medial rowing. (c) Long cassette radiograph of the same patient
compartment. (b) Plain flexion posterior-anterior radio- showing resulting varus malalignment [2]
19  Does No Difference Really Mean No Difference? 175

does not always correlate to symptoms and func- related symptoms and functions. It is frequently
tion [39]. Consequently, studies should be used in clinical research [21, 35].
designed in a way such that the outcome mea- The Single Assessment Numeric Evaluation
sures selected as dependent variables should be (SANE) was designed specifically for college-­
specific to the type of patients being studied. age patients after ACL reconstruction [75]. It is
Thus far, this type of patient-specific outcome very simple and involves just one question, ask-
measure selection has been lacking in certain dis- ing a patient on a 0–100% scale how much of a
ciplines of orthopedic research [41]. percentage of normal they would rate their knee.
The International Knee Documentation Though application is easy, it is only useful
Committee (IKDC) has developed two rating when looking at a homogeneous cohort of
scales, one “objective” and one “subjective” patients who would interpret this one question
[20]. The first is clinician-based and grades similarly [39, 75].
patients as normal, nearly normal, abnormal, or The Knee Injury and Osteoarthritis Outcome
severely abnormal on a variety of parameters Score (KOOS) is another patient-related mea-
that include effusion, motion, ligament laxity, sure. It consists of 5 separate scores: 9 questions
crepitus, harvest-­ site pathology, radiographic for pain, 7 questions for symptoms, 17 questions
findings, and single-leg hop test. The final patient for activities of daily living, 5 questions for sport
grade is determined by the lowest grade in any and recreational function, and 4 items for quality
given group. The subjective one is patient- of life [57]. The KOOS has been employed to
reported and inquires about symptoms, sports evaluate ACL reconstruction but also meniscec-
activities, and ability to function, including tomy, tibial osteotomy, and post-traumatic arthri-
stairs, squatting, running, and jumping. It has tis. The other benefit is that it has been validated
been demonstrated to be reliable, valid, and in multiple languages [56, 58, 74, 78].
responsive when applied to a range of knee con- The quality-of-life outcome measure for
ditions, including ACL tears as well as meniscus chronic ACL deficiency was developed with
and cartilage pathology [26, 27]. input from ACL-deficient patients, primary care
The Cincinnati Knee Rating System combines sports medicine physicians, orthopedic sur-
clinician-based evaluation with patient-reported geons, athletic therapists, and physical therapists
symptoms and function [4, 63]. It is composed of [39]. It consists of 31 visual analog questions
6 subscales that add up to 100 points: 20 for relating to 5 categories: symptoms and physical
symptoms, 15 for daily and sports functional complaints, work-related concerns, recreational
activities, 25 for physical examination, 20 for activities and sports participation, lifestyle, and
knee stability testing, 10 for radiographic find- social and emotional health status relating to the
ings, and 10 for functional testing [3]. It is most knee [39, 45].
often employed to evaluate ACL injuries before There are several instruments used to charac-
and after reconstruction and is proven to be reli- terize patients’ level of physical activity. The
able, valid, and responsive [40, 55]. Tegner is probably the most popular in orthope-
The modified Lysholm scale is a patient-­ dic publications. It aims to place a patient’s
related measure designed to evaluate outcomes activity level somewhere on a 0–10 scale based
after knee ligament surgery [36]. It is a question- on their specific sport [68, 78]. However,
naire with eight items scaled to a maximum score although commonly used, this instrument has
of 100. Knee stability accounts for 25 points, not been officially validated [41]. The Marx
pain for 25, locking for 15, and swelling and stair activity level is based on function-specific, rather
climbing for 10 each. In addition, a limp, use of a than sport-­ specific, questions, and it incorpo-
support, and squatting accounts for 5 points each rates the frequency of activities as well [41]. The
[68]. It was originally developed in 1982 and scale consists of four questions, evaluating run-
later modified in 1985. This Lysholm score is one ning, cutting, decelerating, and pivoting. Patients
of the first outcome measures to rely on patient-­ are asked to score frequency on a 0–4 scale for
176 C. F. van Eck et al.

each element for a total of 16 points. In contrast patient’s compliance. The final grade attributed to
to the Tegner scale, the Marx scale has been vali- amount of instability also results from a subjec-
dated [41]. tive judgment of the examiner. During the pre-
course of ISAKOS, Osaka, in 2009, five selected
experts were invited to perform a pivot shift test
Clinical Vignette on a cadaveric lower body specimen. This setup
A 35-year-old female presents to the office was repeated at the Panther Global Summit in
complaining of right knee pain. She under- Pittsburgh in 2011 with 12 expert surgeons on an
went right knee ACL reconstruction using a actual ACL-injured patient under general anesthe-
transtibial single-bundle technique 15 years sia (Fig.  19.3). During the exam, the pivot shift
ago. She has had no new injury. She states was quantified with an accelerometer. The results
the knee feels stable. Examination in the of this experiment showed no objective agreement
office appears to confirm this with a 1A [34]. This is partly due to differences among
Lachman, negative anterior drawer, and examiners, but also because the pivot shift is
guarding pivot shift test. She undergoes influenced not only by the ACL but also by the
imaging with plain radiographs demon- iliotibial band, capsule, medial meniscus, lateral
strating a vertical tunnel orientation as well meniscus, and bony morphology [48–50]. To
as an MRI indicating the ACL graft is intact make the pivot shift more objective, its use in con-
but vertically oriented (Fig. 19.2). In addi- junction with an accelerometer has been advo-
tion, it reveals a degenerative medial cated and validated (Fig. 19.4) [1, 47].
meniscal tear. A CT scan was also ordered Further objective outcome measures are cur-
which reveals nonanatomic tunnel location rently being investigated and implemented in
(Fig.  19.2). She elects to undergo surgery order to deal with variability in some of the more
for the meniscus tear. Examination under widely reported physical exam measures.
anesthesia reveals an exam very different Dynamic stereo in vivo radiography can be used
from that in the office with a 2A Lachman, for detailed kinematic analysis (Fig.  19.5) [65–
1+ anterior drawer, and 2+ pivot shift test. 67]. Three-dimensional computed tomography
Arthroscopy reveals a vertical graft, degen- scanning is very valuable for evaluation of ACL
erative medial meniscal tear, and advanced tunnel placement [10, 32, 33]. Magnetic reso-
degenerative changes of the medial com- nance imaging can be used for a variety of differ-
partment (Fig.  19.2). This clinical case ent purposes following ACL reconstruction
illustrates that when insufficient or inap- (Fig. 19.6) such as evaluations of graft integrity,
propriate outcome tools are used, clinical graft healing, inclination angle, tunnel position,
outcome is not accurately and reliably as well as the status of the menisci and articular
measured. cartilage [7, 44].

19.10 Interpretation of Results


19.9 Improving How We Measure
Outcome As described above, the clinical outcome follow-
ing ACL reconstruction is frequently assessed
Although quantification is simple when using the using the IKDC scoring system. Although the use
aforementioned outcome and activity scales, the of this outcome tool has been validated, this was
objective description of physical examination done using four specific categories: excellent,
findings can be a challenging proposition. The good, fair, and poor. However, various authors
pivot shift test, for example, is the most performed have inappropriately grouped the excellent and
test to determine rotational instability in the knee, good outcome categories together, as well as the
but it is dependent on the examiner skills and fair and poor category [5, 43]. This likely started
19  Does No Difference Really Mean No Difference? 177

a b

c d

Fig. 19.2  A 35-year-old female, 15 years after right ACL showing the ACL graft is intact but again vertical in orienta-
reconstruction, presents with knee pain. Examination in the tion. (c) Three-dimensional CT scan of the same right knee
office appeared to suggest a stable knee. (a) Plain radio- confirming nonanatomic tunnel position. (d) Arthroscopy of
graphs of a right knee showing status post-ACL reconstruc- that knee showing advanced degenerative changes in the
tion with a vertical tunnel orientation suggestive of medial compartment, frequently seen with residual rotatory
nonanatomic tunnel placement. (b) MRI of the right knee instability after nonanatomic ACL reconstruction

as an attempt to dichotomize the results as nor- This fact may also be attributable to the concern
mal/nearly normal versus abnormal/severely that some of the components of the IKDC grad-
abnormal to allow for easier statistical analysis. ing system, including the pivot shift, are subjec-
178 C. F. van Eck et al.

Fig. 19.3  Panther Global Summit in Pittsburgh in 2011 magnetic tracking system. The results of this experi-
with 12 expert surgeons performing the pivot shift test ment showed no objective agreement in the pivot shift
on a left knee of an actual ACL-injured patient under grade among the experts and significantly improved
general anesthesia. During the exam, the pivot shift was agreement after instruction of a “standardized pivot
quantified with an accelerometer, iPad app, and electro- shift test” [34]

Fig. 19.4 Examination
of a left knee showing
the application of an
accelerometer and iPad
application to quantify
the pivot shift
examination by the
surgeon. Markers on the
skin are used which are
tracked with the iPad
application [1, 47]
19  Does No Difference Really Mean No Difference? 179

a c

Fig. 19.5  In vivo dynamic stereo radiography of a left track the distance and motion pattern between the tibia
knee. (a) The patient runs on a treadmill, while real-time and femur as the patient runs and loads the knee. (c)
radiography is performed from orthogonal angles. (b) Subsequent map of the knee outlining the distance and
Videos are overlapped with a three-dimensional computed motion between the tibia and femur in real time [65–67]
tomography scan of the knee such that it is possible to

a b c

Fig. 19.6  High-resolution magnetic resonance imaging meniscus. (c) Surface mapping of the anterior horn, body,
of a left knee. (a) Sagittal sequence outlining the medial and posterior horn of the medial meniscus and articular
meniscus. (b) Three-dimensional reconstruction of the cartilage [7, 44]
magnetic resonance scan outlining the medial and lateral

tive in nature. Or perhaps some may believe that struct the ACL are not able to completely restore
“nearly normal” is good enough when assessing the knee to normal, this should be the goal of
clinical outcome after ACL reconstruction. future reconstruction techniques [28, 43]. This
However, although traditional methods to recon- point is illustrated by the case of a recent Level I
180 C. F. van Eck et al.

study that made headlines comparing the results allograft ACL reconstruction was performed in
of operative to nonoperative treatment of patients 1996 and indicated a failure rate of 3% [18].
with ACL tears. Overall, there were 121 patients However, when outcomes of the same procedure
treated with either operative or nonoperative were evaluated in 2011 with more advance out-
treatment [12]. Although randomization was per- come measures, the failure rate was shown to
formed, the crossover of patients was allowed per approximate 15% [71].
the intention to treat analysis. The authors con-
cluded that no difference existed between the Take-Home Message
treatment groups, but when looking at the data in • Finding no difference does not mean there is
more detail, in the rehabilitation-only group, no difference.
there were more meniscus injuries at final follow- • Orthopedists should be more like the
up (13 vs. 1), more abnormal Lachman tests
­ cardiologists.
(75% vs. 35%), more abnormal pivot shift exami- • Cardiology is leading in medicine with regard
nations (53% vs. 25%), and an increased total to research and innovation.
KT-arthrometer translation (8.3 vs. 6.6  mm). • In conducting research, it is important to form
Thus, the authors’ conclusions were based on a a sound hypothesis that allows for determina-
series of dependent variables that may not be the tion of the proper primary outcome measure
most relevant factors in ultimately determining for a particular study.
success of treatment. • The hypothesis of a research study dictates
The use of reliable and valid outcome measure which statistical methods should be applied
specific to the aim of the study is highly recom- and avoids making statistical errors.
mended when conducting a study. For example, • It is important to select the outcome tool that
for anatomic ACL reconstruction, a scoring is most suitable, reliable, valid, and sensitive
­system was developed, which was validated and in detecting the primary outcome of a study as
tested for reliability and responsiveness in two indicated in hypothesis.
separate studies [8, 69]. • Failure to do so will result in drawing an
incorrect conclusion.
• Study design and the quality of the follow-up
The use of reliable and valid outcome mea- is also extremely important.
sure specific to the aim of the study is • Randomized studies with large numbers of
highly recommended when conducting a patients and at least 80% follow-up at 2 years
study. or longer are necessary to critically evaluate
new surgical techniques.

19.11 Quality and Duration References


of Follow-Up
1. Ahlden M, Araujo P, Hoshino Y, et al. Clinical grad-
ing of the pivot shift test correlates best with tibial
The quality of the follow-up is also extremely acceleration. Knee Surg Sports Traumatol Arthrosc.
important. By the standard of publishing of major 2012;20(4):708–12.
orthopedic journals with high impact factor, a fol- 2. Andriacchi TP, Briant PL, Bevill SL, Koo
low-up percentage of at least 80% and ideally over S. Rotational changes at the knee after ACL injury
cause cartilage thinning. Clin Orthop Relat Res.
90% is warranted for a high-level prospective clin- 2006;442:39–44.
ical trial. In addition, patients should be followed 3. Barber-Westin SD, Noyes FR, McCloskey
for at least 2 years following the tested procedure. JW.  Rigorous statistical reliability, validity, and
Two similar studies from these authors’ insti- responsiveness testing of the Cincinnati knee rating
system in 350 subjects with uninjured, injured, or
tution highlight this difference in how outcomes anterior cruciate ligament-reconstructed knees. Am J
are evaluated. The first study on the outcomes of Sports Med. 1999;27(4):402–16.
19  Does No Difference Really Mean No Difference? 181

4. Bollen S, Seedhom BB. A comparison of the Lysholm 18. Harner CD, Olson E, Irrgang JJ, Silverstein S, Fu FH,
and Cincinnati knee scoring questionnaires. Am J Silbey M. Allograft versus autograft anterior cruciate
Sports Med. 1991;19(2):189–90. ligament reconstruction: 3- to 5-year outcome. Clin
5. Carey JL, Dunn WR, Dahm DL, Zeger SL, Spindler Orthop Relat Res. 1996;324:134–44.
KP.  A systematic review of anterior cruciate liga- 19. Heckman JD.  Are validated questionnaires valid? J
ment reconstruction with autograft compared with Bone Joint Surg Am. 2006;88(2):446.
allograft. J Bone Joint Surg Am. 2009;91(9):2242–50. 20. Hefti F, Muller W. [Current state of evaluation of knee
6. Chavez I, Dorbecker N, Celis A.  Direct intracardiac ligament lesions. The new IKDC knee evaluation
angiocardiography; its diagnostic value. Am Heart J. form ]. Orthopade. 1993;22(6):351–62.
1947;33(5):560–93. 21. Hoher J, Munster A, Klein J, Eypasch E, Tiling

7. Chu CR, Williams AA, West RV, et  al. Quantitative T.  Validation and application of a subjective knee
magnetic resonance imaging UTE-T2* map- questionnaire. Knee Surg Sports Traumatol Arthrosc.
ping of cartilage and meniscus healing after ana- 1995;3(1):26–33.
tomic anterior cruciate ligament reconstruction. 22. Holm I, Oiestad BE, Risberg MA, Aune AK.  No

Am J Sports Med. 2014;42(8):1847–56. https://doi. difference in knee function or prevalence of osteo-
org/10.1177/0363546514532227. arthritis after reconstruction of the anterior cruci-
8. Desai N, Alentorn-Geli E, van Eck CF, et  al. A ate ligament with 4-strand hamstring autograft
systematic review of single-versus double-bundle versus patellar tendon-­bone autograft: a randomized
ACL reconstruction using the anatomic ante- study with 10-year follow-up. Am J Sports Med.
rior cruciate ligament reconstruction scoring 2010;38(3):448–54.
checklist. Knee Surg Sports Traumatol Arthrosc. 23. Illingworth KD, Hensler D, Working ZM, Macalena
2016;24(3):862–72. JA, Tashman S, Fu FH. A simple evaluation of ante-
9. Eastlack ME, Axe MJ, Snyder-Mackler L.  Laxity, rior cruciate ligament femoral tunnel position: the
instability, and functional outcome after ACL injury: inclination angle and femoral tunnel angle. Am J
copers versus noncopers. Med Sci Sports Exerc. Sports Med. 2011;39(12):2611–8.
1999;31(2):210–5. 24. Iriuchishima T, Tajima G, Ingham SJ, et  al.

10. Forsythe B, Kopf S, Wong AK, et al. The location of Intercondylar roof impingement pressure after
femoral and tibial tunnels in anatomic double-bundle anterior cruciate ligament reconstruction in a por-
anterior cruciate ligament reconstruction analyzed by cine model. Knee Surg Sports Traumatol Arthrosc.
three-dimensional computed tomography models. J 2009;17(6):590–4.
Bone Joint Surg Am. 2010;92(6):1418–26. 25. Iriuchishima T, Tajima G, Ingham SJ, Shen W,

11. Foster TE, Wolfe BL, Ryan S, Silvestri L, Kaye
Smolinski P, Fu FH. Impingement pressure in the ana-
EK.  Does the graft source really matter in the out- tomical and nonanatomical anterior cruciate ligament
come of patients undergoing anterior cruciate liga- reconstruction: a cadaver study. Am J Sports Med.
ment reconstruction? An evaluation of autograft 2010;38(8):1611–7.
versus allograft reconstruction results: a systematic 26. Irrgang JJ, Anderson AF. Development and validation
review. Am J Sports Med. 2010;38(1):189–99. of health-related quality of life measures for the knee.
12. Frobell RB, Roos EM, Roos HP, Ranstam J,
Clin Orthop Relat Res. 2002;402:95–109.
Lohmander LS.  A randomized trial of treatment for 27. Irrgang JJ, Anderson AF, Boland AL, et  al.

acute anterior cruciate ligament tears. N Engl J Med. Responsiveness of the International Knee
2010;363(4):331–42. Documentation Committee Subjective Knee Form.
13. Fujimoto E, Sumen Y, Deie M, Yasumoto M, Kobayashi Am J Sports Med. 2006;34(10):1567–73.
K, Ochi M. Anterior cruciate ligament graft impinge- 28. Irrgang JJ, Bost JE, Fu FH.  Re: outcome of single-­
ment against the posterior cruciate ligament: diagnosis bundle versus double-bundle reconstruction of the
using MRI plus three-dimensional reconstruction soft- anterior cruciate ligament: a meta-analysis. Am J
ware. Magn Reson Imaging. 2004;22(8):1125–9. Sports Med. 2009;37(2):421–2; author reply 422.
14. Gabriel MT, Wong EK, Woo SL, Yagi M, Debski 29. Johnson DL, Swenson TM, Irrgang JJ, Fu FH, Harner
RE. Distribution of in situ forces in the anterior cruci- CD.  Revision anterior cruciate ligament surgery:
ate ligament in response to rotatory loads. J Orthop experience from Pittsburgh. Clin Orthop Relat Res.
Res. 2004;22(1):85–9. 1996;325:100–9.
15. Gillquist J, Messner K.  Anterior cruciate ligament 30. Keays SL, Newcombe PA, Bullock-Saxton JE,

reconstruction and the long-term incidence of gonar- Bullock MI, Keays AC.  Factors involved in
throsis. Sports Med. 1999;27(3):143–56. the development of osteoarthritis after anterior
16. Harner CD, Irrgang JJ, Paul J, Dearwater S, Fu FH. Loss cruciate ligament surgery. Am J Sports Med.
of motion after anterior cruciate ligament reconstruc- 2010;38(3):455–63.
tion. Am J Sports Med. 1992;20(5):499–506. 31. Kocher MS, Steadman JR, Briggs K, Zurakowski

17. Harner CD, Marks PH, Fu FH, Irrgang JJ, Silby
D, Sterett WI, Hawkins RJ.  Determinants of patient
MB, Mengato R.  Anterior cruciate ligament recon- satisfaction with outcome after anterior cruciate
struction: endoscopic versus two-incision technique. ligament reconstruction. J Bone Joint Surg Am.
Arthroscopy. 1994;10(5):502–12. 2002;84-A(9):1560–72.
182 C. F. van Eck et al.

32. Kopf S, Forsythe B, Wong AK, et  al. Nonanatomic single- and double-bundle anterior cruciate ligament
tunnel position in traditional transtibial single-bundle reconstruction. Arthroscopy. 2009;25(1):62–9.
anterior cruciate ligament reconstruction evaluated 47. Musahl V, Griffith C, Irrgang JJ, et  al. Validation of
by three-dimensional computed tomography. J Bone quantitative measures of rotatory knee laxity. Am J
Joint Surg Am. 2010;92(6):1427–31. Sports Med. 2016;44(9):2393–8.
33. Kopf S, Forsythe B, Wong AK, Tashman S, Irrgang 48. Musahl V, Hoshino Y, Ahlden M, et al. The pivot shift:
JJ, Fu FH. Transtibial ACL reconstruction technique a global user guide. Knee Surg Sports Traumatol
fails to position drill tunnels anatomically in  vivo Arthrosc. 2012;20(4):724–31.
3D CT study. Knee Surg Sports Traumatol Arthrosc. 49. Musahl V, Hoshino Y, Becker R, Karlsson J. Rotatory
2012;20(11):2200–7. knee laxity and the pivot shift. Knee Surg Sports
34. Kopf S, Musahl V, Perka C, Kauert R, Hoburg
Traumatol Arthrosc. 2012;20(4):601–2.
A, Becker R.  The influence of applied internal 50. Musahl V, Rahnemai-Azar AA, Costello J, et  al.

and external rotation on the pivot shift phenom- The influence of meniscal and anterolateral cap-
enon. Knee Surg Sports Traumatol Arthrosc. sular injury on knee laxity in patients with ante-
2017;25(4):1106–10. rior cruciate ligament injuries. Am J Sports Med.
35. Lukianov AV, Gillquist J, Grana WA, DeHaven
2016;44(12):3126–31.
KE. An anterior cruciate ligament (ACL) evaluation 51.
Neeb TB, Aufdemkampe G, Wagener JH,
format for assessment of artificial or autologous ante- Mastenbroek L. Assessing anterior cruciate ligament
rior cruciate reconstruction results. Clin Orthop Relat injuries: the association and differential value of ques-
Res. 1987;218:167–80. tionnaires, clinical tests, and functional tests. J Orthop
36. Lysholm J, Gillquist J.  Evaluation of knee ligament Sports Phys Ther. 1997;26(6):324–31.
surgery results with special emphasis on use of a scor- 52. Nishimori M, Sumen Y, Sakaridani K, Nakamura

ing scale. Am J Sports Med. 1982;10(3):150–4. M.  An evaluation of reconstructed ACL impinge-
37. Markolf KL, Feeley BT, Jackson SR, McAllister
ment on PCL using MRI.  Magn Reson Imaging.
DR.  Biomechanical studies of double-bundle poste- 2007;25(5):722–6.
rior cruciate ligament reconstructions. J Bone Joint 53. Passler HH.  The history of the cruciate ligaments:
Surg Am. 2006;88(8):1788–94. some forgotten (or unknown) facts from Europe. Knee
38. Markolf KL, Park S, Jackson SR, McAllister
Surg Sports Traumatol Arthrosc. 1993;1(1):13–6.
DR.  Simulated pivot-shift testing with sin- 54. Powell WJ Jr, Wittenberg J, Dinsmore RE, Miller SW,
gle and double-bundle anterior cruciate liga- Maturi RA. Definition of cardiac structures using com-
ment reconstructions. J Bone Joint Surg Am. puterized tomography in isolated arrested and beating
2008;90(8):1681–9. canine hearts. Am J Cardiol. 1977;39(5):690–6.
39.
Marx RG.  Knee rating scales. Arthroscopy. 55.
Risberg MA, Holm I, Steen H, Beynnon
2003;19(10):1103–8. BD.  Sensitivity to changes over time for the IKDC
40. Marx RG, Jones EC, Allen AA, et  al. Reliability, form, the Lysholm score, and the Cincinnati knee
validity, and responsiveness of four knee outcome score. A prospective study of 120 ACL reconstructed
scales for athletic patients. J Bone Joint Surg Am. patients with a 2-year follow-up. Knee Surg Sports
2001;83-A(10):1459–69. Traumatol Arthrosc. 1999;7(3):152–9.
41. Marx RG, Stump TJ, Jones EC, Wickiewicz TL,
56. Roos EM, Ostenberg A, Roos H, Ekdahl C,

Warren RF. Development and evaluation of an activ- Lohmander LS.  Long-term outcome of meniscec-
ity rating scale for disorders of the knee. Am J Sports tomy: symptoms, function, and performance tests
Med. 2001;29(2):213–8. in patients with or without radiographic osteoarthri-
42. Mayo Robson AW.  Ruptured crucial ligaments and tis compared to matched controls. Osteoarthr Cartil.
their repair by operation. Ann Surg. 1903;37(5):716–8. 2001;9(4):316–24.
43. Meredick RB, Vance KJ, Appleby D, Lubowitz
57. Roos EM, Roos HP, Lohmander LS, Ekdahl C,

JH.  Outcome of single-bundle versus double- Beynnon BD.  Knee Injury and Osteoarthritis
bundle reconstruction of the anterior cruciate Outcome Score (KOOS)—development of a self-­
ligament: a meta-analysis. Am J Sports Med. administered outcome measure. J Orthop Sports Phys
2008;36(7):1414–21. Ther. 1998;28(2):88–96.
44. Miyawaki M, Hensler D, Illingworth KD, Irrgang JJ, 58.
Roos EM, Roos HP, Ryd L, Lohmander
Fu FH. Signal intensity on magnetic resonance imag- LS.  Substantial disability 3 months after
ing after allograft double-bundle anterior cruciate arthroscopic partial meniscectomy: a prospective
ligament reconstruction. Knee Surg Sports Traumatol study of patient-­ relevant outcomes. Arthroscopy.
Arthrosc. 2014;22(5):1002–8. 2000;16(6):619–26.
45. Mohtadi N. Development and validation of the quality 59. Rosenberg T.  Techniques for ACL reconstruction

of life outcome measure (questionnaire) for chronic with Multi-Trac drill guide. Mansfield: Accufex
anterior cruciate ligament deficiency. Am J Sports Microsurgical Inc.; 1994.
Med. 1998;26(3):350–9. 60. Samuelsson K, Andersson D, Karlsson J.  Treatment
46. Morimoto Y, Ferretti M, Ekdahl M, Smolinski P, Fu of anterior cruciate ligament injuries with special ref-
FH. Tibiofemoral joint contact area and pressure after erence to graft type and surgical technique: an assess-
19  Does No Difference Really Mean No Difference? 183

ment of randomized controlled trials. Arthroscopy. mented knee laxity tests. Knee Surg Sports Traumatol
2009;25(10):1139–74. Arthrosc. 2013;21(9):1989–97.
61. Sernert N, Kartus J, Kohler K, et al. Analysis of sub- 71. van Eck CF, Schkrohowsky JG, Working ZM, Irrgang
jective, objective and functional examination tests JJ, Fu FH. Prospective analysis of failure rate and pre-
after anterior cruciate ligament reconstruction. A fol- dictors of failure after anatomic anterior cruciate liga-
low-­up of 527 patients. Knee Surg Sports Traumatol ment reconstruction with allograft. Am J Sports Med.
Arthrosc. 1999;7(3):160–5. 2012;40(4):800–7.
62.
Sgaglione NA. Revision ACL reconstruction. 72. van Eck CF, van den Bekerom MP, Fu FH, Poolman
Presented at Orthopedics Today Hawaii 2011. Jan RW, Kerkhoffs GM.  Methods to diagnose acute
16–19 Koloa, Hawaii; 2011. anterior cruciate ligament rupture: a meta-analysis
63. Sgaglione NA, Del Pizzo W, Fox JM, Friedman
of physical examinations with and without anaes-
MJ. Critical analysis of knee ligament rating systems. thesia. Knee Surg Sports Traumatol Arthrosc.
Am J Sports Med. 1995;23(6):660–7. 2013;21(8):1895–903.
64. Snook GA.  A short history of the anterior cruciate 73. Vavken P.  Rationale for and methods of superior-
ligament and the treatment of tears. Clin Orthop Relat ity, noninferiority, or equivalence designs in ortho-
Res. 1983;172:11–3. paedic, controlled trials. Clin Orthop Relat Res.
65. Tashman S, Anderst W.  In-vivo measurement of
2011;469(9):2645–53.
dynamic joint motion using high speed biplane radi- 74. W-Dahl A, Toksvig-Larsen S, Roos EM.  A 2-year
ography and CT: application to canine ACL defi- prospective study of patient-relevant outcomes in
ciency. J Biomech Eng. 2003;125(2):238–45. patients operated on for knee osteoarthritis with tibial
66. Tashman S, Collon D, Anderson K, Kolowich P,
osteotomy. BMC Musculoskelet Disord. 2005;6:18.
Anderst W. Abnormal rotational knee motion during 75. Williams GN, Taylor DC, Gangel TJ, Uhorchak JM,
running after anterior cruciate ligament reconstruc- Arciero RA.  Comparison of the single assessment
tion. Am J Sports Med. 2004;32(4):975–83. numeric evaluation method and the Lysholm score.
67. Tashman S, Kolowich P, Collon D, Anderson
Clin Orthop Relat Res. 2000;373:184–92.
K, Anderst W.  Dynamic function of the ACL-­ 76. Wilson FN, Johnston FD, Hill IGW, Macleod AG,
reconstructed knee during running. Clin Orthop Relat Barker PS.  The significance of electrocardiograms
Res. 2007;454:66–73. characterized by an abnormally long QRS interval
68. Tegner Y, Lysholm J.  Rating systems in the evalua- and by broad S deflections in Lead I.  Am Heart J.
tion of knee ligament injuries. Clin Orthop Relat Res. 1934;9:459.
1985;198:43–9. 77. Wood EH, Ritman EL, Robb RA, Harris LD,

69. van Eck CF, Gravare-Silbernagel K, Samuelsson
Ruegsegger P.  Noninvasive numerical vivisection of
K, et  al. Evidence to support the interpretation anatomic structure and function of the intact circula-
and use of the anatomic anterior cruciate ligament tory system using high temporal resolution cylindri-
reconstruction checklist. J Bone Joint Surg Am. cal scanning computerized tomography. Med Instrum.
2013;95(20):e1531–9. 1977;11(3):153–9.
70. van Eck CF, Loopik M, van den Bekerom MP, Fu FH, 78. Wright RW.  Knee injury outcomes measures. J Am
Kerkhoffs GM.  Methods to diagnose acute anterior Acad Orthop Surg. 2009;17(1):31–9.
cruciate ligament rupture: a meta-analysis of instru-
Power and Sample Size
20
Stephen Lyman

20.1 Power Clinical Vignette 1


Consider a research study comparing two
Power is a powerful word. It’s extremely flexible
alternative surgical approaches for patients
with no fewer than nine definitions according to
suffering from a rare orthopedic condition
the Merriam-Webster Dictionary [6]. The final
with just a handful of subjects available for
definition is the one we’re referring to in this
study (Table 20.1, Example 1). Assuming we
chapter: “the probability of rejecting the null
have equipoise in not knowing which treat-
hypothesis in a statistical test when a particular
ment is more effective and/or safe, we would
alternative hypothesis happens to be true.” In
be justified in randomizing patients presenting
clinical research, power refers to the ability to
with this condition to either of the two treat-
detect a difference in diagnostic, prognostic, or
ment groups. In order to maximize power
treatment effectiveness if one exists. This is a pri-
(and efficiency), we randomize them in a 1:1
marily theoretical construct but has practical
allocation. Preoperatively we assess their pain
implications. It applies to any study design in
levels on a 100 mm visual analogue scale. We
which you test a hypothesis whether it be between
randomize patients to receive Surgery A or
two different groups of research subjects given a
Surgery B. This randomization works, and we
diagnostic test, prognostic evaluation, or treat-
find that the pre-­ treatment pain levels are
ment regimen or within the same subjects before
equivalent between the two groups of patients
and after an intervention [4].
(p = 0.87). Six weeks after treatment, we mea-
sure the patients’ pain levels again. At this
time we find that the patients who underwent
20.2 Does Power Matter? Surgery A have a pain score of 34, while
patients who underwent Surgery B have a
Statistical power provides investigators and read-
pain score of 52 (22 point difference between
ers with a sense of whether the study actually
groups). Surgery A seems to be more effective
answers the research question of interest.
in reducing pain in these patients, right? Not
Adequately powered studies can help improve
so fast. First we must perform a statistical test
our ability to diagnose and manage orthopedic
to determine if the difference between treat-
ment groups is statistically significant. To our
surprise the test result’s p-value comes back a
S. Lyman (*)
Hospital for Special Surgery, New York, NY, USA nonsignificant 0.29.
e-mail: lymans@hss.edu

© ISAKOS 2019 185


V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_20
186 S. Lyman

Table 20.1  Examples of underpowered and overpow-


Clinical Vignette 2 ered studies
Now imagine another study in which we Example #1 Example #2
compare two different surgical interven- Surgery Surgery Surgery Surgery
tions for a common orthopedic condition A B C D
Sample size 5 5 2000 2000
where we’re able to recruit a large number
Mean 88 ± 17 86 ± 19 85 ± 38 86 ± 33
of patients (Table 20.1, Example 2). Again presurgical
we randomize the patients to treatment pain ± SD
group—this time Surgery C versus Surgery Mean 34 ± 25 52 ± 26 38 ± 13 40 ± 14
D. The randomization again works, and we postsurgical
pain ± SD
find that patients receiving each treatment Between group 0.29 <0.0001
had similar pre-treatment pain levels p-value
(p = 0.37). Six weeks after surgery, we find Statistical 20% >99%
that both groups have substantially less power
pain than before surgery (p  <  0.01). The
group receiving Surgery C has less pain, sider whether the findings reflect the truth or are
but the difference is only 2  mm out of a the result of inadequate power. Because of these
100 mm pain score. This time the statistical concerns, editors may decline publishing under-
test results in a p-value of <0.0001, which powered studies.
is highly statistically significant using the Overpowered studies may yield findings that
usual critical p-value criteria of <0.05. Yet are statistically significant but clinically irrele-
there is only a slight difference between the vant. Using outcome measures with known mini-
group means, suggesting little clinical mal clinically important change (MCIC) or other
significance. well-established benchmarks of clinical effec-
tiveness can be used to avoid misinterpreting
overpowered studies.
conditions. But studies can be underpowered or Overpowering a study may be a waste of
overpowered, as the examples below show: resources but may also allow for subgroup analy-
The findings for both of these examples are ses. In Example 2, a subgroup analysis focusing
completely the result of statistical power. A study on a specific age range of patients may reveal a
with a very small sample size may show a very bigger, clinically significant difference between
large difference in results between two interven- treatments in younger patients, despite a small
tions, but a statistical test of that difference may difference in the overall study population. A
not detect significance. Conversely, a study with study overpowered to test the main hypothesis
a very large sample size may find a statistically may have enough power to detect meaningful dif-
significant difference between the effects of two ferences in subgroups of patients.
interventions, but the difference may be clinically
irrelevant. The first example is an underpowered
study. The second is overpowered. Fact Box 20.1
Underpowering a study may result in missing Statistical power determines your ability to
a true effect simply because not enough subjects detect a difference if one truly exists.
were enrolled in the study. Underpowering usu- Overpowered studies may yield findings
ally results in a null finding even despite the pres- that are statistically significant but clini-
ence of a true effect. We miss an opportunity to cally irrelevant. Underpowering a study
improve our understanding of clinical care, and may result in missing a true effect simply
our patients are worse off as a result. If a study is because not enough subjects were enrolled
underpowered, researchers must be cautious in in the study.
interpreting their results, while readers must con-
20  Power and Sample Size 187

20.3 Why Is Power Needed? 3 agree), 67% accuracy, and 100% accuracy.
Obviously, a much larger sample size would be
Statistical power is necessary anytime you want necessary to establish meaningful estimate of
to test hypotheses: to determine if there is a sta- ultrasound diagnostic utility in this setting. An
tistically significant difference between groups or entire literature has been developed evaluating
a statistically significant relationship between the sample size requirements of reliability studies
two variables [4]. Statistical power determines [3, 7, 8, 10].
your ability to detect a difference if one truly
exists.
This rationale is both scientific and philosoph- 20.4 Properties of Power
ical. When conducting research, the project is
only worth performing if it is possible to reject Statistical power is customarily represented as a
the null hypothesis. Underpowered research is percentage between 0 and 100% or sometimes as
less likely to be published or to contribute mean- a proportion between 0 and 1. It tells you what
ingfully to improving our understanding of the your chances are of missing a true finding. Power
physical world. Therefore, it is unethical (philo- is calculated as 1 − β (beta) where β is the Type
sophical argument) to perform underpowered II error or the likelihood of rejecting the alterna-
studies, because humans are being subject to tive hypothesis if true. If β were 0.2, then 1 − β
unnecessary experimentation and risk of harm. would be 0.80, or 80% power, to detect a differ-
Furthermore, a null finding from an underpow- ence if it truly exists. For most clinical studies,
ered study may incorrectly be interpreted as evi- 80% power is considered the lowest acceptable
dence that the medical intervention studied has power value since this means you have just 1 in 5
no benefit. chance of missing a true finding. When consider-
ing interventions or testing hypotheses for which
there are serious consequences, a power of 90%
Fact Box 20.2 (1 in 10 chance of missing a true finding) or even
It is unethical to perform underpowered higher may be warranted.
studies, because humans are being subject Let’s imagine an established surgical proce-
to unnecessary experimentation and risk of dure that is highly effective and relatively inex-
harm. pensive but has high risk of adverse events (e.g.,
perioperative fracture). A new alternative surgi-
cal technique is believed to be both effective and
safer than the established procedure but costs
If you are not formally testing hypotheses, substantially more. In comparing these two surgi-
then statistical power is not strictly necessary. cal interventions, we would not want to miss a
However, if you are interested in assessing the
correlation between two variables, then you still
need adequate sample size. For example, if you Fact Box 20.3
were evaluating whether ultrasound could diag- Power tells you what your chances are of
nose a collateral ligament tear as well as a more missing a true finding and is determined by
expensive imaging modality, then you need a sample size, variability, frequency, the crit-
sample size large enough to calculate a reliable ical p-value, and the minimum relevant
estimate of the ability of ultrasound to correctly effect size. The best approach is to set your
diagnose the tear. A study of three subjects evalu- p-value and effect size and estimate your
ated with both MRI and ultrasound would be variability and/or frequency and then cal-
inadequate to answer this question. There are culate the sample size you need for 80% (or
only 4 possible findings: 0% accuracy (0 of 3 90% power).
ultrasounds agree with MRI), 33% accuracy (1 of
188 S. Lyman

true treatment effect if it exists, so we might con- found implications for statistical power for a
sider powering our study to 90% or higher. If we study considering sex differences.
missed a true treatment effect in the new tech- The p-value represents your chance of a
nique’s favor, it might not be adopted into clinical detected effect not being true. A critical p-value
practice due to the high costs despite being safer of 0.05 is usually considered acceptable for clini-
and more effective. cal research. This represents less than a 1 in 20
Power is determined by sample size, variabil- chance of falsely rejecting the null hypothesis if
ity, frequency, the critical p-value, and the mini- the null hypothesis is true. A smaller p-value may
mum relevant effect size. Adjusting any of these be desired if 1  in 20 seems too large an uncer-
characteristics changes the statistical power. tainty. In that case a critical p-value of 0.01 (1 in
Sample size is what most scientists think of 100) or even 0.001 (1 in 1000) may be warranted.
first when considering statistical power [1]. The These more certain critical p-values will decrease
higher the sample size, the higher the power all power all else being equal. Conversely, if a larger
else remaining equal. Conversely, a smaller sam- p-value were considered acceptable (e.g., p of 0.1
ple size will always have a less power all else or 1  in 10), power would be increased all else
being equal. This is also the most easily modifi- being equal. p-Values are not usually modified
able factor in calculated expected power. We can unless there is a strong justification to be more or
usually recruit more patients, but other compo- less refined in what is considered significant.
nents of power are often more difficult or impos- When many different comparisons are being
sible to change. conducted within the same study, p-values will
Variability is a measure of how much spread often be adjusted for multiple comparisons. One
exists in the variables being considered. A vari- of the most well-known adjustments is the
able that is highly variable (pun intended) Bonferroni correction in which the critical
between study subjects will result in a larger p-value of 0.05 is divided by the number of com-
standard deviation. The larger the variation, the parisons being made. If there were 10 hypotheses
more subjects will be needed, because a larger being tested, the new critical p-value would
variation will make any difference between the become 0.005 (0.05  ÷  10 comparisons). The
groups harder to detect. Variability only applies power would then be calculated based on this
to power calculations for continuous or scale new effective p-value. As you can imagine, the
parameters. Frequency (see below) is used for Bonferroni correction can be extremely conser-
discrete variables. vative when a large number of comparisons are
A study’s power is optimized when the fre- planned.
quency of either a discrete outcome or explana- Effect size refers to the magnitude of effect
tory variable is balanced. A study being powered (difference between groups) you expect to find.
on a binary (or ordinal) variable with very low Best practice is to use the smallest effect size
frequency will require many more subjects in deemed clinically relevant. If you do not have an
order to achieve adequate statistical power than a expected effect size based on previous informa-
study in which the frequency is balanced across tion (e.g., pilot data or previously published stud-
groups. For example, if 50% of the study subjects ies), then using your clinical judgment may be
are female and 50% are male, then an analysis required but could be difficult to justify in this era
comparing sex differences would have optimum of data-driven information.
power. However, if intersex (those with biologi- Special consideration should be given to effect
cal reproductive anatomy that are not fully male sizes related to patient-reported outcome mea-
or fully female) were of interest as a third cate- sures (PROMs), which are very common outcome
gory of sex (an estimated 1% of live births) [2], tools in elective orthopedic research. We often use
then this low frequency group would have pro- the minimal clinically important difference
20  Power and Sample Size 189

(MCID), which is also sometimes called the mini- be reduced by having narrow inclusion criteria,
mal clinically important change (MCIC) or mini- which creates a more homogenous study popula-
mal clinically important improvement (MCII). All tion. The trade-off here is that you may be limiting
of these are slightly different concepts, but we the generalizability of the results of the study. For
will use them interchangeably here as the mini- example, if you isolate your study to teenage female
mum effect size we’d like to be able to detect soccer players, you cannot extend your results very
when comparing two groups of patients using easily to college-age male basketball players or
their PROM scores [5]. Conceptually, the MCIC other athletic populations, even if you’ve managed
is the smallest change in PROM score for which a to reduce variability in your study population.
subject can actually discern a difference in their Investigators may be tempted to play fast and
state of health. This concept is particularly useful loose with effect sizes to tweak a power calcula-
when you do not have previous data on which to tion, but this should be done with caution. If you
estimate your effect size. A distribution-­ based use too large an effect size, you’ll have what
MCIC is usually considered 0.5 × standard devia- appears to be an adequately powered study, but
tion of the PROM score, which gives you a rough you may be hopelessly underpowered to detect
estimate of the MCIC. Some recent work has sug- the effect size you are likely to discover even if
gested that this distribution-based MCIC calcula- the treatment is reasonably effective. You’d still
tion is actually closer to the concept of minimal end up with a negative result even though the
detectable change (MDC), which is essentially effect size revealed was clinically meaningful.
the calibration variability of the PROM [9]. Therefore, adjusting sample size is usually the
However, when previous anchor-based MCICs most practical approach to achieving sufficient
are not available, the distribution-based method is statistical power, and this is why we often equate
usually the only alternative. sample size with power. The best approach is to
As a rule of thumb, the MCIC would be the set your p-value and effect size, and estimate
smallest change expected to make a difference. your variability and/or frequency and then calcu-
This difference may be in a subject’s health, qual- late the sample size you need for 80% (or 90%
ity of life, satisfaction, or a myriad of other mea- power). Another approach is to set all those
sures considered clinically important. When parameters, choose a practical target sample size,
using the distribution-based approach, this is and see if your power reaches 80% (or higher). If
often much smaller than we may expect from a not, you can tweak your sample size upward until
treatment thought to be effective. If true, this will you reach your target power.
result in an overpowering of the study but may Not uncommonly, you have very few cases
also allow for subgroup analyses to determine in (e.g., infected total knee arthroplasty (TKA)
which patients the treatment is most effective (or cases) but a nearly unlimited number of potential
ineffective), a concept often called heterogeneity control subjects (noninfected TKA cases). In this
of treatment effect. case, you can identify all of your infected TKA
Adjusting any of these factors will change the cases and then match multiple control subjects
power for a given study. Since most peer-­reviewed per case. Statistical power will be increased for
journals require a p-value of 0.05 or less to be each additional control per case you add. Many
considered statistically significant, this is the case-control studies are performed with a 1:1
power characteristic least easily modifiable for a control/case ratio to maximize efficiency, but
power calculation. Although powering to 0.01 or when power is limited due to the small number of
0.001 is not unheard of, powering to a p-value of cases available, efficiency can be sacrificed by
larger than 0.05 is usually not acceptable. matching 2:1, 3:1, or even 4:1. Practically speak-
Variability and frequency can be adjusted ing, power gains are minimal after 6:1, so it’s
through study design considerations. Variability can really not worth going higher than that.
190 S. Lyman

A final consideration in all of this is your ana- 20.6 H


 ow to Calculate Statistical
lytic plan. The statistical tests proposed also Power?
determine your likely power, though this goes
beyond the scope of this chapter and should be Until a few decades ago, power calculations were
discussed with a statistician. very time-consuming, hand-performed calcula-
tions that would keep statisticians busy until late
into the night. Today most statistical software
20.5 I s Statistical Power Ever Truly programs provide power calculation tools. Both
Known? stand-alone power calculation programs and
macros are available for common statistical pack-
Statistical power as a “truth” is unknowable. ages such as SAS or R.
Rather, it is an estimate of the likelihood of find- Early free web-based power calculators
ing a true difference if one exists, and it is only as were proven unreliable. Though some have
accurate as the inputs we enter into our power cal- been improved over time, they are still use-at-
culation. If our estimates of effect size are too low, your-­own-risk since the underlying calculation
we may be overpowered for the actual effect size is a bit of a black box, and it’s impossible to
found. That’s not problematic except that the know if it was coded properly. Professional
study was less efficient than it could have been. A statistical software packages have much more
worse scenario would be if you were overly opti- rigorous design and testing procedures in place
mistic with your effect size and end with a null to give you more confidence in your power
study. Likewise, if your variability is higher than calculation.
expected, or the frequency of one of your groups Ultimately, for the clinician undertaking
is less common than expected, you’ll lose power. clinical research, the safest way to be assured
Only your p-value and sample size are stable tar- that your power calculation is performed cor-
gets in a power calculation. Moreover, difficulty rectly is by consulting with a statistician.
with patient recruitment or loss to follow up in Statisticians rely on you for your clinical
prospective studies can negatively affect sample expertise. You should rely on them for their
size. To account for these possibilities, it is cus- statistical expertise. However, if you do not
tomary to calculate your power and sample size have access to a statistician, and financial
and then artificially inflate the sample size by the resources are limited, consider downloading
expected dropout rate (often a 10–20% inflation and learning R (https://www.r-project.org/).
factor if not higher). This is a free full-service statistical package
A well-managed institutional review board that can be used to do just about anything
(IRB, ethics review panel) will require an a priori you’d need to do statistically, and because it’s
(before starting the study) power calculation for open source, there are new code and new
all clinical research. Many peer-reviewed medi- macro programs being developed and posted
cal journals also require these a priori calcula- online all the time. This has led R to rapidly
tions. In cases where a study team failed to move beyond SAS, SPSS, and other programs
calculate power a priori, a post hoc power calcu- as perhaps the most versatile statistical pack-
lation is often appropriate to assure themselves, age available. For a stand-alone power calcu-
journal reviewers, and eventual readers that the lator, the best on the market is probably PASS
study was adequately powered. Even in the case (Power Analysis and Sample Size from NCSS)
of research where an a priori power calculation with calculations available for more than 150
was performed, if the estimates used in the origi- different study design types. If your study
nal calculation were inaccurate, a post hoc power analyses tend to be pretty straightforward,
calculation can be reassuring to the investigators, some of the free programs may be sufficient
reviewers, and readers. for your power needs.
20  Power and Sample Size 191

Take-Home Message References


• Statistical power is a commonly misunder-
stood and sometimes abused theoretical 1. Adcock CJ.  Sample size determination: a review.
Statistician. 1997;46(2):261–83.
concept. 2. Blackless M, Charuvastra A, Derryck A, Fausto-­
• While being theoretical, it has practical Sterling A, Lauzanne K, Lee E. How sexually dimor-
benefit. phic are we? Review and synthesis. Am J Hum Biol.
• Conducting a needlessly large clinical research 2000;12:151–66.
3. Carley S, Dosman S, Jones SR, Harrison M. Simple
project is inefficient and may waste resources nomograms to calculate sample size in diagnostic
that could be used to further our knowledge in studies. Emerg Med J. 2005;22:180–1.
other ways. 4. Jones SR, Carley S, Harrison M.  An introduction
• Conversely, conducting an underpowered to power and sample size estimation. Emerg Med J.
2003;20:453–8.
study is a waste of time to the participants and 5. Mcglothlin AE, Lewis RJ. Minimal clinically impor-
the researchers and may violate the responsi- tant difference defining what really matters to patients.
bility to do no harm as the study intervention JAMA. 2014;312:1342–3.
may not be risk-free. 6. “Power.” Merriam-Webster.com. 2018. https://www.
merriam-webster.com (1 July 2018).
• All studies, especially those in which invasive 7. Saito Y, Sozu T, Hamada H, Yoshimura I.  Effective
interventions are being used, should always, number of subjects and number of raters for inter-rater
ethically, be adequately powered. reliability studies. Stat Med. 2006;25:1547–60.
• Power is determined by a balance of the sam- 8. Sim J, Wright CC.  The Kappa statistic in reliability
studies: use, interpretation, and sample size require-
ple size, critical p-value, variability and/or fre- ments. Phys Ther. 2005;85(3):257–68.
quency, and effect size. 9. Theodore BR.  Methodological problems associ-
• Adjusting any of these characteristics will ated with the present conceptualization of the mini-
modify the estimate of statistical power, mum clinically important difference and substantial
clinical benefit. Spine J. 2010;10:507–9. https://doi.
though usually sample size is the most easily org/10.1016/j.spinee.2010.04.003.
modifiable in calculating power. 10. Walter SD, Eliasziw M, Donner A. Sample size and
• Power should always be calculated a priori, optimal designs for reliability studies. Stat Med.
though a post hoc power estimate may also be 1998;17:101–10.
useful if the study data differs substantially
from what was estimated prior to conducting
the research.
• With today’s powerful computing capabilities,
calculating power is easier than ever, though
appropriate inputs are vital.
Visualizing Data
21
Stephen Lyman, Naomi Roselaar,
and Chisa Hidaka

21.1 Introduction tion with a visual representation of the location


and extent of the loss of soldiers as they advanced
As you begin to analyze your data, think about to and retreated from Moscow. At a glance, the
how best to present it. Whether in a paper, a huge loss is apparent, as the lines representing
poster, or at a laboratory meeting, outstanding the number of soldiers thins during the advance,
tables and figures are essential to ensure readers represented in gray, and dwindle down even fur-
and viewers understand your research. In this ther on the retreat, represented in black. The pat-
chapter, we review how the hierarchy of human terns the lines make across the page make it
graphical perception ability elucidated by obvious that the path of the soldiers crossed riv-
Cleveland and McGill in 1985 [1] can be used to ers and included a variety of elevations. On a
achieve two major principles of excellent data closer look, many details are available, including
visualization, put forth by information design the names of cities or towns and, crucially, the
pioneer Edward Tufte in his influential work, The exact number of soldiers.
Visual Display of Quantitative Information [2]: This is not the type of graphic often seen in
Graphical Excellence and Visual Integrity. medical research but worthy of remembering as
Tufte has stated, “Graphical elegance is often an example of the level of elegance and excel-
found in simplicity of design and complexity of lence to which all researchers should aspire.
data” [2]. In the concepts described below, and
examples that follow, we’ll discuss how to
achieve this goal. 21.2 Graphical Excellence
Graphs and charts are able to convey complex
information more efficiently than tables.
However, tables are necessary when you want the Fact Box 21.1
reader to be able to look up individual data val- In his influential work, The Visual Display
ues. This chapter also offers guidelines for tables of Quantitative Information (2001), infor-
that are accurate and easy to understand. mation design pioneer Edward Tufte sum-
Charles Minard’s map (Fig. 21.1) tells a rich, marized the following eight aspects of
data-filled story in a single coherent visual repre- Graphical Excellence:
sentation, by combining geographical informa-
1 . Show the data.
S. Lyman (*) · N. Roselaar · C. Hidaka 2. Avoid data distortion.
Hospital for Special Surgery, New York, NY, USA
e-mail: LymanS@HSS.edu

© ISAKOS 2019 193


V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_21
194

FIGURATIVE MAP of the successive losses in men of the French Army in the RUSSIAN CAMPAIGN OF 1812-1813
Drawn by Mr.Minard,Inspector General of Bridges and Roads in retirement. Paris, 20 November 1869. The numbers of men present are represented by the widths of the colored zones in a rate of one millimeter for ten thousand
Moscow
Mo
men; these are also written beside the zones. Red designates men moving into Russia, black those on retreat. — The informations used for drawing the map were taken from the works of Messrs. Chiers, de Ségur, de Fezensac, de s
100

ko
Chambray and the unpublished diary of Jacob, pharmacist of the Army since 28 October. In order to facilitate the judgement of the eye regarding the diminution of the army, I supposed that the troops under Prince Jèrôme and

00
.000

wa
under Marshal Davoust, who were sent to Minsk and Mobilow and who rejoined near Orscha and Witebsk, had always marched with the army.
riv
er

187.1
Gjat
100.000
100
.000

22.000
6.000

33.000
Mojaisk Tarantino
Polotsk

0
.00
Glubokoe

0
60

175
.0 Vitebsk
00

.00
00

422.000
Malo-jarosewli

400.000
87.0

145
Doroboy Wirma
96.000

00
55.0

Vilna Smolensk
Kovno

0
30.000

37.00
Bé Orscha

0
réz
ina

riv
24.00

er
0

0
Common leagues of France (map of Fezensac)

8.000

4.000
14.000
.00

10.000
20.00

12.000
28
0 5 10 15 20 25 50

0
Studienska

50.00
Mohilow

GRAPHIC TABLE of the temperature in degrees of Réaumur thermometer ºR ºC ºF


0 0 0
Rain October 24
-10 -13 -9.5
The Cossacks pass the frozen
Niémen at a gallop -11º -9º November 9
-20 -25 -13
-20º November 28 -21º November 14
-24º December 1 -30 -38 -35.5
-26º December 7
-30º December 7

Fig. 21.1  A modern adaptation of a map by Charles Minard portraying French losses in the Russian Invasion (1812–1813) is highlighted by Tufte as an example of graphic
elegance and excellence. Commons usage: https://commons.wikimedia.org/wiki/File:Minard_map_of_napoleon.png
S. Lyman et al.
21  Visualizing Data 195

understand from it. For example, “femoral


3. Draw attention to the substance of the tunnel malposition is the main cause for fail-
graph. ure in anterior cruciate ligament reconstruc-
4. Serve a clear purpose. tion (ACLR)” (Example 3). Avoid descriptive
5. Make large datasets coherent. statements like, “reasons for failure of ACLR”.
6. Reveal the data at several levels of
Starting with a pointed summary statement
detail. that supports your research hypothesis ensures
7. Present many numbers in a small space. that you create a graphic that serves a clear
8. Integrate with statistical and verbal
purpose within your paper or presentation.
descriptions. 5. Make large datasets coherent, 6. Reveal the
data at several levels of detail, and 7.
Present many numbers in a small space:
Graphical excellence is achieved when the great- While these are three separate characteristics
est amount of complex information is visually in Tufte’s definition of graphical excellence,
represented in a manner that allows the viewer to they are complimentary and therefore
understand it completely, accurately, and quickly. described together. Once you know what you
In his influential book, The Visual Display of want to say with your data, use what is known
Quantitative Information [2], Tufte describes about the hierarchy of human graphical per-
eight characteristics that define Graphical ceptual capabilities (Sect. 21.3) to prioritize
Excellence in data visualization: information visually, and create a coherent
graphic that helps the viewer understand at a
1. Show the data: Representing as much data as glance which of the multiple levels of details
possible through an economy of visual ele- you reveal are the most important and/or how
ments, without losing what is important about the details relate to each other and to your
their patterns, is at the top of the list for hypothesis (Example 3).
Graphical Excellence. Avoid chart styles that 8 . Integrate with statistical and verbal
inadvertently hide data (Example 1). descriptions: Finally, write an appropriate
2. Avoid data distortion: To create an accurate legend so your figure can stand alone and the
graphic that shows all of the data values and viewer can understand it without referring to
patterns without distortion and follow the the text of the manuscript. The legend should
rules that Tufte has recommended for Visual also fit into the flow of the presentation or
Integrity (Sect. 21.4). paper without redundancy. Tufte maintains
3. Draw attention to the substance of the that visualizations “are paragraphs about data
graph: Focus the viewer on the substance of and should be treated as such” [2]. When your
the graph rather than its production. Eliminate graphic is integrated with statistical and ver-
what Tufte calls “chartjunk” such as unneces- bal descriptions and fits seamlessly into your
sary dimensions or decorations and redundant paper or presentation like a written paragraph
or irrelevant information (Examples 2, 3, and but can make an important, complex point in
4). To present many numbers in a small space one quick glance, you have achieved graphi-
do not only eliminate “chartjunk” but maxi- cal excellence.
mize the “data-ink ratio.” Conveying the max-
imal amount of information with the greatest
possible economy of visual elements is one of 21.3 Hierarchy of Human
the cornerstones of Tufte’s prescription. Graphical Perception Ability
4. Serve a clear purpose: Excellent graphics
must also serve a clear purpose. Before creat- Creating excellent graphics requires an under-
ing a graphic, write a short statement that standing not only of your data but of the hierar-
summarizes what you expect viewers to chy of human graphical perception ability.
196 S. Lyman et al.

Cleveland and McGill [1] have shown that some 4 . Show data variation, not design variation.
visual elements are easier for people to see and 5. The number of information-carrying (vari-

understand than others. able) dimensions depicted should not exceed
Graphical elements in the order they are most the number of dimensions in the data.
accurately perceived: 6. Graphics must not quote data out of context.

1 . Position along a common scale In practical terms, visual integrity often comes
2. Position along identical nonaligned scales down to proper scaling and formatting, as well as
3. Length avoiding “chartjunk.” Examples follow below.
4. Angle and slope
5. Area
6. Volume and density and color saturation Fact Box 21.2
7. Color hue In practical terms, visual integrity can often
be achieved through:
Use this hierarchy to choose the appropriate
visual element to represent your data. For exam- 1. Accurate scaling
ple, if your data can be represented as either 2. Proper formatting
length or area, choose length, as it is more accu- 3. Avoiding “chartjunk”
rately perceived (Example 3). When conveying
complex information, use the hierarchy to priori-
tize information visually. For example, use angle
or slope for the most important information while 21.4.1 Scaling
adding shading or pattern to provide a second
level of information (Example 2). When using Proper scaling ensures that the numbers on the
color, saturation is more accurately perceived and graphic are directly proportional to the numerical
understood than the relationship between hues quantities represented. It allows you to show all
(Example 4). the data, without distortion. By considering the
hierarchy of human graphical perceptual capa-
bilities, you can choose an appropriate scale(s)
21.4 Visual Integrity that makes data patterns easy to see at a glance
(Example 2).
When individual data points and their analyses
are translated from the spreadsheet to a graph or
figure, care must be taken to avoid unintentional 21.4.2 Formatting
“visual lies” or distortions. To maintain visual
integrity, Tufte offers the following rules [2]: To avoid graphical distortion and ambiguity, pay
attention to formatting. In practical terms: use
1. The representation of numbers, as physically simple symbols and clear, thorough labels and
measured on the surface of the graphic itself, avoid remote legends (Examples 3 and 4).
should be directly proportional to the numeri-
cal quantities represented.
2. Clear, detailed, and thorough labeling should 21.4.3 Avoiding “Chartjunk”
be used to defeat graphical distortion and
ambiguity. To focus viewers on data rather than design,
3. Write out explanations of the data on the
avoid unnecessary or confusing elements. Color
graphic itself. Label important events in the is often used with the intention of making a
data. graphic more appealing but can undermine the
21  Visualizing Data 197

effectiveness of the graphic, if it is not used in a If the tables, charts, or other graphics are
manner that conveys relevant information being prepared for a manuscript, make sure to:
(Examples 2, 3, and 4). Three-dimensional graph-
ics are also used for an appealing look, but they 1. Follow the formatting instructions of the spe-
should only be used for three-dimensional data cific journal.
(Examples 2 and 3). Using extra dimensions for 2. Make the values, abbreviations, and/or termi-
two-dimensional data not only encumbers graph- nology in the table consistent with what
ics with “chartjunk” but can create distortions or appears in the text.
optical illusions (“visual lies”). Clearly, these 3. Avoid redundancy. Specific values should not
should be avoided. appear in both the text and table.
4. Use tables for details, for instance, results

instead of lengthy text.
21.4.4 Context

Finally, an excellent graphic is well integrated 21.6 Best Practice


into relevant descriptions and explanations in the
text or presentation, avoiding the quotation of In conclusion, Tufte recommends the following
data out of context (Examples 2 and 3). best practice for creating excellent graphics [2]:

1 . Above all else show the data.


21.5 Tables 2. Maximize the data-ink ratio.
3. Erase non-data-ink.
Whereas graphs and charts are superior for con- 4. Erase redundant data-ink.
veying data patterns, tables are useful when you 5. Revise and edit.
want your reader to be able to look up specific
data values. In this spirit, the examples below show how
graphics can be revised and edited to become
more effective and efficient.
Fact Box 21.3
Example 1: Bar graphs
When creating tables, make sure to:
Figures 21.2 and 21.3 represent the same sample
data.
1 . Have a descriptive title.
The dynamite plot (Fig.  21.2) is frequently
2. Use appropriate headings for each col-
used in scientific studies, but may not reveal all of
umn and/or row.
the data. In Fig. 21.2, the scores for treatment A,
3. Group information in a manner that pro-
B, C, and D are represented as means (bars) with
vides coherence.
standard deviations (I beams).
4. Include appropriate decimal places for
Graphical Excellence: Show all the data
each value.
A modified box and whisker (Tukey’s) plot
5. Provide enough information so the table
(Fig. 21.3) allows the viewer to see all data points,
can stand alone.
and this allows the viewer to appreciate the varia-
tion in the data much more precisely than when
looking at the error bars representing standard
Footnotes are a good way to provide information deviation (Fig.  21.2). For each treatment group,
such as methodology or explanation of abbrevia- the horizontal line represents the 50th percentile,
tions, so that the table can stand alone, and read- while the rectangular box extends to the 25th and
ers can understand it without referring to the text 75th percentiles. The vertical lines represent the
of the paper (Example 5). 95% confidence interval. The dots are all of the
198 S. Lyman et al.

Fig. 21.2 Patient-­ 60
reported outcome scores
after treatment A, B, C,
or D after N weeks of 50
treatment
40

30

20

10

0
A B C D

Fig. 21.3  No difference


in patient outcome score
after treatment for
condition X with
treatment A, B, C, or
60
D. Scores were collected
N weeks after treatment

40
Score

20

A B C D
Treatment.Group

data points and show that, for treatment group B, Example 2: Line Graph
one of the points is an outlier. Comparing Line graphs are common in orthopedic research.
Figs. 21.2 and 21.3 shows that the type of plot used This is an example showing sample data from a
in Fig. 21.2 hid the presence of a negative value study where patient-reported outcome scores
(below 0) in treatment group B. The comparison were assessed as a measure of treatment efficacy
also reveals that treatment group D included fewer in a treated and (untreated) control group.
patients (data points) than the other groups. Figures 21.4, 21.5, and 21.6 show the same data,
21  Visualizing Data 199

Fig. 21.4 Patient-­
reported outcome score 30
after treatment for 25
condition X 20
15
10
5
0
1 Control
2 3
4 Treatment
5
6 7
Treatment Control

Fig. 21.5 Patient-­ 30
reported outcome scores Treatment
are improved with 25
treatment. Treated
(n = 104) and matched
control (n = 104) 20
patients completed the
Score

ABC survey for 7 weeks 15 Control


after receiving treatment
X. Mean scores at each
10
weekly time point are
shown
5

0
1 2 3 4 5 6 7
Weeks

represented with various degrees of graphical Visual integrity: Clear, detailed, and thor-
excellence and visual integrity. ough labeling should be used to defeat graphi-
Figure 21.4 is a typical chart that can be gen- cal distortion and ambiguity.
erated using Excel software and is an example of Neither the X nor Y axes of this graphic are
a graphic that fails to follow the rules of visual labeled (Fig.  21.4). The viewer would need to
integrity and graphic excellence. Figures  21.5 search the manuscript to understand what is rep-
and 21.6 show how the chart can be revised resented in this graphic.
(within Excel) to be more effective. Visual integrity: Write out explanations of
Visual integrity: The number of the data on the graphic itself. Label important
information-­ carrying (variable) dimensions events in the data.
depicted should not exceed the number of The data lines for the treated and control
dimensions in the data. groups are identified in a remote legend at the
This data is two-dimensional, so it should be bottom of the graph (Fig. 21.4). Remote legends
represented in a two-dimensional graph (Figs. 21.5 are not recommended, as they force the viewer to
and 21.6). In Fig.  21.4 the third dimension not look back and forth between the legend and the
only distracts but confuses the viewer, creating an data points to understand the graphic. Wherever
optical illusion, where scores, which are actually possible, labels should appear next to the data
different in treatment and control groups, appear to points so viewers see the data point and what it
overlap at several time points. represents in one glance (Fig. 21.5).
200 S. Lyman et al.

Fig. 21.6  No effect of 100


treatment X on
patient-reported
outcomes in condition
Treatment
Y. Treated (n = 104) and
matched control
(n = 104) patients

Score
completed the ABC Control
10
survey for 7 weeks after
receiving treatment
X. Mean scores at each
weekly time point are
shown

1
1 2 3 4 5 6 7
Weeks

Graphical Excellence: Draw attention to the Visual integrity: Graphics must not quote
substance of the graph (not its production). data out of context.
Extraneous visual elements or “chartjunk” Figure legend 21.4, while descriptive, does
distract the viewer. In addition to removing the not provide sufficient information for the viewer
extra dimension, removing gridlines, which do to understand the graph without referring to the
not improve the viewer’s ability to estimate the manuscript. Figure  21.5 legend interprets the
numerical value of each data point, improves data and provides enough information to under-
this graph (Fig.  21.5). Placing the tick marks stand the data without additional explanation.
inside the axis labels (instead of outside) is suf- Hierarchy of graphical perception: Angle
ficient for the eye to be able to see where the and slope.
data points are located, relative to the axis scales With the graph in two dimensions, it is easy to
(Fig. 21.5). see that the line representing the scores of treated
Color is also an unnecessary (and possibly patients banks at around 45° (Fig. 21.5). Extreme
distorting) element in Fig. 21.4. angles are difficult to perceive, so, where possi-
Hierarchy of graphical perception: Line ble, choose a scale that results in lines that bank
over color. around 45°.
Color is far down the list in the hierarchy of Visual integrity: The representation of
human graphical perception, so representing numbers, as physically measured on the sur-
information through the use of color should be face of the graphic itself, should be directly
avoided, if the information can be conveyed in proportional to the numerical quantities
black and white. Red and blue are particularly represented.
poor choices as people with the most common Figures 21.5 and 21.6 show the importance of
form of color blindness are often unable to distin- the scale. While the data are the same, the inter-
guish red, purple, and blue. pretation is different, based on the scale. In
Using a solid and dashed lines not only avoids Fig. 21.5, the score (represented on the y-axis) is
unnecessary color, it adds information visually. A based on a 30-point survey where the differences
dashed line is harder to see than a solid line, so between treated and control groups were clini-
viewers will immediately understand that it is the cally meaningful. In Fig. 21.6, the score is based
less important line, even before reading the label, on a 100-point survey where the differences
confirming that it represents the (untreated) con- between treated and control groups were clini-
trol group (Fig. 21.5). cally not significant.
Graphical Excellence: Integrate with statis- Hierarchy of graphical perception: Angle
tical and verbal descriptions. and slope.
21  Visualizing Data 201

Fig. 21.7  Causes of Femoral tunnel


failure after anterior malposition
cruciate ligament (ACL)
reconstruction New trauma

Unknown

Impingement

Tibial tunnel malposition

Untreated laxity

Failure of fixation

Fig. 21.8  Causes of 0 10 20 30 40


failure after anterior
cruciate ligament Femoral tunnel malposition
reconstruction
New trauma

Unknown

Impingement

Tibial tunnel malposition

Untreated laxity

Failure of fixation

Hyperlaxity

Infection

Where possible, choose a scale resulting in depicted should not exceed the number of
lines that bank around 45°, avoiding extreme dimensions in the data.
angles, which are difficult to perceive. Making the pie chart in Fig.  21.7 three-­
dimensional adds an unnecessary dimension
Example 3: Pie Charts (“chartjunk”) as the thickness of each wedge
Pie charts are often used to show how different does not convey any information. Furthermore,
elements account for a specific portion of the the use of a third dimension creates a “visual
whole. However, based on the hierarchy of lie” (unintentional but deceptive optical illu-
human graphical perception, area is perceived sion) in which the segments of the pie that are at
rather inaccurately, so alternatives are preferred, the bottom of the pie appear bigger than those at
when possible. the top, because the perspective used to repre-
Figures 21.7, 21.8, and 21.9 are based on data, sent the third dimension makes the thickness of
which appear in a table format in the original the disk more apparent at the bottom of the
publication by Trojani et al. [3]. graphic.
Visual integrity: The number of Visual integrity: Show data variation, not
information-­ carrying (variable) dimensions design variation.
202 S. Lyman et al.

Fig. 21.9 Femoral Infection , 2


Hyperlaxity, 4
tunnel malposition is the
main cause for failure in Failure of fixation , 5
anterior cruciate
ligament reconstruction Untreated laxity , 5
(ACLR). The causes of
failure of ACLR Femoral tunnel
occurring between 1994 Tibial tunnel malposition
and 2005 in ten French malposition , 11 36
orthopedic centers are
shown. For each type of
failure, the number of
cases is indicated

Impingement
12

Unknown
15 New trauma,
30

The colors of the pie wedges are arbitrary add- Graphical Excellence: Reveal the data at
ing meaningless variation that distracts, rather several levels of detail.
than informs. The colors may even imply a “visual The circular bar chart (Fig. 21.9) shows the
lie,” suggesting that wedges of related colors may proportion of each type of ACLR failure, like
represent groups that are similar for some reason. the pie chart, but uses curved lines, rather than
Visual integrity: Write out explanations of wedges to represent each type of failure.
the data on the graphic itself. Label important Instead of using arbitrary colors, a gray scale is
events in the data. used to accentuate visually that each adjacent
The remote legend (Fig.  21.7) forces the arc is longer than the next so that the types of
viewer to look back and forth between the legend failure are arranged in order of frequency. The
and the data (pie wedges) to understand the pie labels are placed within or beside each arc,
chart. Wherever possible, labels should appear avoiding remote legends. The labels are also
next to the data points so viewers see the data followed by the number of cases of that type of
point and what it represents in one glance ACLR failure, revealing another level of detail
(Figs. 21.8 and 21.9). of the data.
Hierarchy of graphical perception: Lines Graphical Excellence: Integrate with statis-
are perceived more accurately than areas. tical and verbal descriptions. Serve a clear
Converting the information into a bar graph purpose.
(Fig.  21.8) conveys the information more effi- Visual Integrity: Graphics must not quote
ciently because length is perceived more accu- data out of context.
rately than area. However, while this type of Figure legends 21.7 and 21.8 fall short of tell-
graph makes the number of cases of each type of ing the viewer the point of the graphic. Figure 21.9
failure much easier to understand visually, it legend includes the author’s interpretation of the
makes the proportion that each type of failure data and also provides important contextual
makes up more difficult to discern. information.
21  Visualizing Data 203

Example 4: Maps 0 10 20 30 40
Maps are very useful when comparing disease Alabama
Arizona
prevalence, treatment costs, or other phenomena California
that vary depending on geographical location. Connecticut
Figures 21.10, 21.11, 21.12, and 21.13 show data District of Columbia
on obesity in the USA in 2016 [4]. Georgia
Idaho
Graphical Excellence: Make large datasets Indiana
coherent. Kansas
The bar graph (Fig. 21.10) makes the propor- Louisiana
Maryland
tion of each state’s population who is obese very Michigan
clear, but it’s difficult to appreciate any informa- Mississippi
tion about relationships among states in an alpha- Montana
Nevada
betical list. A map is a visually efficient way to
New Jersey
make geographic information easier to see. New York
Figure 21.11 is a map where it is easy to see that North Dakota
obesity affects a greater proportion of adults in Oklahoma
Pennsylvania
the southeast part of the USA and a lower propor- South Carolina
tion on the East and West coasts and that Tennessee
Colorado, Hawaii, Massachusetts, and the Utah
Virginia
District of Columbia have the lowest proportion West Virginia
of obese adults. Wyoming

Fig. 21.10  Proportion of adults who are obese (body


mass index over 30) by state in 2016

20-24.9%
25-29.9%
30-34.9%
>35%

Fig. 21.11  Percentage of adults who are obese (body mass index over 30) by state in 2016
204 S. Lyman et al.

20-24% 25-29% 30-35% >35%

Fig. 21.12  Percentage of adults, who are obese (body mass index over 30) by state in 2016

20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Fig. 21.13  Percentage of adults, who are obese (body mass index over 30) by state in 2016
21  Visualizing Data 205

Hierarchy of graphical perception: Color. easily identify the state in a map without the
The use of color in Fig.  21.11 is confusing. label, making the labels redundant and a distrac-
Although the “color wheel” has been used so that tion from the information.
relationships between colors have a direct rela- Visual integrity: The representation of num-
tionship to the data, this relationship is difficult to bers, as physically measured on the surface of
discern. The fact that red indicates a proportion the graphic itself, should be directly propor-
much higher than green does make sense based tional to the numerical quantities represented.
on the “color wheel” but is not obvious without Using a gray scale allows the appropriate rep-
having to look at the remote legend. resentation of the data as a continuous variable.
Use of color should be limited to graphics Figures  21.11 and 21.12 represented the preva-
where the color itself conveys important infor- lence of obesity as if they occurred as categories
mation, as color is low in the hierarchy of graph- (20–24%, 25–29%, 30–34%, or >35%), but these
ical perception. Assessing relationships between categories were arbitrary. Whether a state has a
colors accurately is difficult even for the nor- proportion of adults in a specific category (20–
mally sighted and impossible for the significant 24%, 25–29%, 30–34%, or >35%) has no inher-
proportion of people who are color blind. ent meaning. Although a graduation from white
Hierarchy of graphical perception: Color. to a color (e.g., teal, in Fig. 21.12) could also rep-
In Fig. 21.12, color has been replaced by hue. resent values in continuity, eliminating the unnec-
Hues are less accurately perceived than color, essary color removes a distraction, so that the
when considering the hierarchy of human graphi- viewer can focus on the data.
cal perception. However, the relationship between
hue and number is easily understood. In Example 5: Tables
Fig.  21.12, lighter represents lower and darker These tables represent sample data comparing
represents higher numerical values—a relation- two surveys for the assessment of hand function.
ship that is easy to see. Table 21.1 title is descriptive but redundant,
Visual economy. including information that is repeated in the table
In Fig.  21.12, the two-letter abbreviation for itself. Information to help the reader understand
each state has been removed as most people can the table should appear in a footnote (Table 21.2).

Table 21.1  Responsiveness of Survey A and Survey B, showing the means, and 95% confidence intervals for the
scores at injection day, Day 30, and the change from injection day to Day 30
Survey Time point N Mean p-value Effect size Standard response mean
A Injection day 109 42 (38, 46) <0.001 1.5 1.2
Day 30 88 72 (68, 76)
Change 87 30 (25, 35)
B Injection 109 20 (16, 23) <0.001 −0.5 −0.7
Day 30 89 12 (9, 14)
Change 88 −9 (−11, −6)
The p-values indicate the significance of the difference between scores at Day 30 and injection day. Effect sizes and
standard response means for each instrument are also displayed

Table 21.2  Responsiveness of Survey A is greater than Survey B in patients with Dupuytren’s disease
Survey A (N = 87) Survey B (N = 88)
Mean change in score (95% CI)a 30 (25, 35) −9 (−11, −6)
Effect size 1.5 −0.5
Standard response mean 1.2 −0.7
a
Mean change in score between Day 0 (treatment with a collagenase injection) and 30 days after treatment for Survey
A (87 patients) and Survey B (88 patients) are reported. The 30-day mean change in score was significant for each
survey (both p < 0.001).
206 S. Lyman et al.

Table 21.1 is also confusing, showing patient is the absolute magnitude of the value that is
number (N) for different time points, when the relevant.
only important information is the change in mean
survey scores between the day of treatment and Take-Home Message
30  days later (change). Also, even though the • The goal is not to make tables and figures look
effect size and standard response mean are the nice but to make sure that viewers understand
most important information, they are far away your research accurately and efficiently. Just
from the name of the survey instrument whose as understanding grammar and syntax make
effectiveness they report. for clear writing, understanding the hierarchy
Table 21.2 title summarizes the analysis and of human graphical perception ability is essen-
provides an interpretation to help readers under- tial for creating graphs and charts that make
stand the data. A footnote provides methodologi- complex information easy to understand.
cal information required to understand the • The appropriate use of figures and tables, cre-
information, so that the table stands alone and ated with Graphical Excellence and Visual
readers do not need to refer to anything else (such Integrity, is as important as well-written text
the rest of the manuscript or poster). in the dissemination of research.
Tables should be used when it is important for
the viewer to see the specific numbers that are
shown. Some of the values in Table  21.1 have References
been removed so that Table  21.2 includes only
the essential numbers to convey the information 1. Cleveland WS, McGill R.  Graphical perception:
theory, experimentation, and application to the devel-
from the analysis. Streamlining the information opment of graphical methods. J Am Stat Assoc.
also makes it possible to remove unnecessary 1984;79(387):531–54.
boxes and lines. 2. Tufte ER. The visual display of quantitative informa-
For the information in Tables 21.1 and 21.2, a tion. 2nd ed. Cheshire: Graphics Press; 2001.
3. Trojani C, Sbihi A, Djian P, Potel JF, Hulet C, Jouve F,
table is better suited than a graph. A graph would Bussière C, Ehkirch FP, Burdin G, Dubrana F, Beaufils
show positive data points for Survey A versus P. Causes for failure of ACL reconstruction and influ-
negative ones for Survey B, and this would create ence of meniscectomies after revision. Knee Surg
a “visual lie,” where the fact numbers above or Sports Traumatol Arthrosc. 2011;19(2):196–201.
4. The state of obesity, trust for America’s Health and
below zero appear important, whereas due to dif- Robert Wood Johnson Foundation. https://stateofobe-
ferences in the way the two surveys are scored, it sity.org/adult-obesity/.
Part IV
Basic Toolbox for the Young Clinical
Researcher
How to Prepare an Abstract
22
Elmar Herbst, Brian Forsythe, Avinesh Agarwalla,
and Sebastian Kopf

22.1 Introduction the study fits, then they will generally examine
the introduction and discussion of the paper, and
Orthopaedic research encompasses technical only readers who have a particular interest in the
notes, case reports, systematic reviews, meta-­ topic will read the entirety of the paper. In many
analyses, retrospective studies, and prospective ways, the abstract functions as the most impor-
studies. While there is a continuum of strength tant part of the manuscript because it “sells” the
associated with each type of research, the com- publication to readers. If readers cannot under-
monality in the dissemination of all orthopaedic stand the study during a brief overview, the man-
research is the abstract—which serves as a snap- uscript is unlikely to undergo further critical
shot of the work that was completed [8, 9, 13, review regardless of if the study has an impecca-
14]. Abstracts are submitted for review in confer- ble design or impactful results.
ences, and if accepted, it is indexed as part of the
proceeding. When fellow researchers then read
the published paper, the manuscript is rarely 22.2 Types of Abstracts
examined in its entirety on the first pass. Rather,
researchers will carefully review the abstract to There are two general types of abstracts: descrip-
understand the study’s purpose, methodology, tive and informative. A descriptive abstract is
results, and conclusion. From this brief overview, approximately 100 words detailing the purpose,
readers determine if the study fits their needs. If goals, and methods of the article. In a descriptive
abstract, a description of the study’s results is not
provided; thus, it is necessary to read the article
in its entirety to view the results. The purpose of
E. Herbst
Department of Orthopaedic Sports Medicine, TU this type of abstract is to introduce the subject to
Munich, Munich, Germany readers, who must then read the entire paper to
B. Forsythe · A. Agarwalla learn about the results and conclusions of the
Midwest Orthopaedics at Rush, Rush University investigation (Example 1). Informative abstracts,
Medical Center, Chicago, IL, USA on the other hand, are approximately 350 words
e-mail: brian.forsythe@rushortho.com and serve as an overview of the entire paper
S. Kopf (*) detailing the purpose, background, methods,
Center for Orthopaedics and Traumatology, Hospital results, and conclusions. Informative abstracts
Brandenburg, Medical School Theodor Fontane,
Brandenburg an der Havel, Germany provide data on the content of the work and high-
e-mail: s.kopf@klinikum-brandenburg.de light salient points of the study in its entirety so

© ISAKOS 2019 209


V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_22
210 E. Herbst et al.

that readers can understand the study and its tendon allograft, or (3) reconstruction with
implications in its entirety without having to read Achilles tendon allograft. Posterior tibial transla-
the complete text (Example 2). These types of tion was measured at neutral and 20° external
abstracts generally follow a format set forth by rotation, and then each specimen underwent a
the publishing journal. preload, cyclic loading protocol of 250  cycles,
It should be noted, however, that abstracts that and, lastly, load to failure.
are submitted to conferences are generally longer Results: The intact specimens achieved the
(approximately one page in its entirety) and a fig- greatest failure load compared to both reconstruc-
ure and/or table can be provided. Abstracts that tions (2048 ± 969 N, p = 0.001). Quadriceps ten-
are submitted to conferences follow the same for- don allografts had a higher maximum force during
mat as an abstract that describes a manuscript, failure testing than the Achilles allograft (2017,
but generally more emphasis can be placed on 1837 N, respectively, p = 0.007). No significant dif-
presenting the results or discussing the findings ferences were noted between quadriceps and
of the investigation. Achilles allograft for differences in displacement
of the graft, creep deformation, or stiffness.
Example 1: Descriptive Abstract [6] Construct stiffness during failure testing was great-
Posterior cruciate ligament (PCL) reconstruction est in the intact group (169 ± 9 N/mm, p = 0.0005)
generally uses an Achilles tendon allograft, compared with the Achilles (56 ± 13 N/mm) and
although recently, quadriceps tendon has been quadriceps (47 ± 3 N/mm) groups.
used an alternative option due to its size and high Conclusion: Quadriceps and Achilles tendon
bone density. In this investigation, we compared allografts had similar biomechanical properties
the biomechanical strength of quadriceps tendon when used for a PCL reconstruction, but both
versus an Achilles tendon allograft during PCL were inferior to the native PCL. However, quad-
reconstruction. Thirty fresh-frozen cadaveric riceps tendon allografts displayed a stronger con-
knees were assigned to one of the three groups: struct with failure load and stiffness in comparison
(1) intact PCL, (2) PCL reconstruction with to Achilles tendon allografts.
quadriceps tendon allograft, or (3) PCL recon- Clinical relevance: The quadriceps tendon is
struction with Achilles tendon allograft. Posterior a viable graft option in PCL reconstruction as it
tibial translation was measured at neutral and 20° exhibits a greater maximum force but is other-
external rotation, and then each specimen under- wise comparable to the Achilles allograft. The
went a preload, cyclic loading protocol of findings of this investigation expand allograft
250 cycles, and load to failure. availability in PCL reconstruction.
Apart from descriptive and informative abstracts,
Example 2: Informative Abstract [6] structured abstracts must be distinguished from
Background: Previous investigations of posterior unstructured abstracts. Structured abstracts follow a
cruciate ligament (PCL) reconstruction suggest clear structure as mentioned below (e.g. back-
that normal stability is not restored in many ground, methods, results, and conclusion; Example
patients. The Achilles tendon allograft is frequently 2) [12]. In contrast, unstructured abstracts do not
used, although recently, the quadriceps tendon has follow distinct paragraphs but represent a running
been utilized due to its size and high bone density. paragraph as shown in the Example 3.
Purpose: The purpose of this investigation
was to compare the biomechanical strength of a Example 3: Unstructured Abstract of a
quadriceps versus an Achilles allograft during Narrative Review [2]
PCL reconstruction. We hypothesize that quadri- Meniscectomy is one of the most popular ortho-
ceps allografts have comparable mechanical paedic procedures, but long-term results are not
properties to those of Achilles allografts. entirely satisfactory, and the concept of meniscal
Methods: Thirty fresh-frozen cadaveric knees preservation has therefore progressed over the
were assigned to one of the three groups: (1) years. However, the meniscectomy rate remains
intact PCL, (2) reconstruction with quadriceps too high even though robust scientific publica-
22  How to Prepare an Abstract 211

tions indicate the value of meniscal repair or non-­ stem cells (MSCs). The objective of this study was
removal in traumatic tears and nonoperative to show that a coculture of anterior cruciate liga-
treatment rather than meniscectomy in degenera- ment (ACL) cells and MSCs has a beneficial effect
tive meniscal lesions. In traumatic tears, the first-­ on ligament regeneration that is not observed when
line choice is repair or non-removal. Longitudinal utilizing either cell source independently.
vertical tears are a proper indication for repair, Autologous ACL cells (ACLcs) and MSCs were
especially in the red-white or red-red zones. isolated from Yorkshire pigs, expanded in  vitro,
Success rate is high and cartilage preservation and cultured in multiwell plates in varying
has been proven. Non-removal can be discussed %ACLc/%MSC ratios (100/0 75/25, 50/50, 25/75,
for stable asymptomatic lateral meniscal tears in and 0/100) for 2 and 4 weeks. Quantitative mRNA
conjunction with anterior cruciate ligament expression analysis and immunofluorescent stain-
(ACL) reconstruction. Extended indications are ing for ligament markers collagen type I (collagen-
now recommended for some specific conditions: I), collagen type III (collagen-III), and tenascin C
horizontal cleavage tears in young athletes, hid- were performed. We show that collagen-I and
den posterior capsulo-meniscal tears in ACL tenascin C expression is significantly enhanced
injuries, radial tears, and root tears. Degenerative over time in 50/50 cocultures of ACLcs and MSCs
meniscal lesions are very common findings (p  ≤  0.03), but not in other groups. In addition,
which can be considered as an early stage of collagen-III expression was significantly greater in
osteoarthritis in middle-aged patients. Recent MSC-only cultures (p ≤ 0.03), but the collagen-I-
randomized studies found that arthroscopic par- to-­collagen-III ratio in 50% coculture was closest
tial meniscectomy (APM) has no superiority over to native ligament levels. Finally, tenascin C
nonoperative treatment. Thus, nonoperative treat- expression at 4  weeks was significantly higher
ment should be the first-line choice, and APM (p  ≤  0.02) in ACLcs and 50% coculture groups
should be considered in case of failure: 3 months compared to all others. Immunofluorescent stain-
has been accepted as a threshold in the ESSKA ing results support our mRNA expression data.
Meniscus Consensus Project presented in 2016. Overall, 50/50 cocultures had the highest colla-
Earlier indications may be proposed in cases with gen-­I and tenascin C expression and the highest
considerable mechanical symptoms. The main collagen-I-to-collagen-III ratio. Thus, we con-
message remains: save the meniscus! clude that using a 50% coculture of ACLcs and
Commonly, unstructured abstracts belong to MSCs, instead of either cell population alone, may
narrative reviews of current literature where a better maintain or even enhance ligament marker
structured design would not be appropriate expression and improve healing.
because of the lacking Results or Methods sec-
tions. However, some basic research journals ask
also for unstructured abstracts (Example 4). 22.3 Components of an Abstract
Regardless of the type of abstract, each abstract
should provide a brief background or introduction Depending on the journal, an abstract can be written
to the topic, a summary of the methods, and the as a free-flowing paragraph or each component of
results followed by a concluding remark as the abstract must be separated and formatted into a
described more in detail in the following sections. formal structure. Some journals may require addi-
tional sections that discuss the clinical relevance,
Example 4: Unstructured Abstract of an limitations, or a description of the study design.
Experimental Study [3]
Ligament and tendon repair is an important topic
in orthopaedic tissue engineering; however, the 22.3.1 Background
cell source for tissue regeneration has been a con-
troversial issue. Until now, scientists have been The background of an abstract is generally one to
split between the use of primary ligament fibro- two sentences that present the problem that the
blasts and bone marrow-derived mesenchymal investigator aims to address. In this short space,
212 E. Herbst et al.

the questions that need to be answered are “why is the research gap that this study aims to fill without
this study important?” and “what is the impact of the reader having to conduct an extensive litera-
this study?”. A good statement in background ture search. It’s important to keep the background
section tells the reader what is already known of the abstract short. While an extensive back-
about the topic and what needs to be investigated. ground allows the reader to be well-informed of
A common mistake in this section is to discuss previously established knowledge regarding the
what has already been studied about the topic but topic, it prohibits a more detailed discussion of
fails to discuss the gap that remains in the litera- the present investigation [1, 4, 5].
ture on that topic. A statement of what needs to be
further investigated does not need to be explicitly
stated; rather the reader should be able to infer 22.3.2 Purpose
what further research is warranted. Examples of a
good background section and the most common The purpose is another one to two sentences illus-
pitfall are provided in Table 22.1. The background trating the problem statement of the investigation.
section sets the stage for the reader to understand This portion of the abstract needs to address the
question “What is this study investigating?” The
Table 22.1  Examples of appropriate and insufficient
statement needs to be a concise explanation of
background statements in an abstract with comments on what the study specifically aims to investigate.
the differences between the statements (bolt) Within this section, the scope should be identifi-
Appropriate background statements able—whether the study is addressing a specific
• Previous investigations of posterior cruciate ligament problem or an overarching generic issue. In other
(PCL) reconstruction suggest that normal stability is words, the scope should identify the applicability
not restored in many patients. (Why is the study
of the investigation. For example, “The purpose
needed?) The Achilles tendon allograft is frequently
used, although recently, the quadriceps tendon has of this investigation is to compare the biomechan-
been utilized due to its size and high patellar bone ical strength of a quadriceps tendon versus an
density [6] (What is known about the subject?) Achilles tendon allograft during a transtibial PCL
• During arthroscopic ACL reconstruction, transtibial reconstruction” [6]. In this example, the purpose
drilling techniques are widely used because they
simplify femoral tunnel placement and reduce of the investigation (biomechanical strength of
surgical time (What is known about the subject?) different grafts) and the scope of the investigation
However, there has been concern that this technique (PCL reconstruction using a transtibial approach)
results in non-anatomically positioned bone tunnels, are clearly stated such that the reader understands
which may cause abnormal knee functionality [10].
(Why is the study needed?) what the investigators plan to conduct. It is com-
Insufficient background statements mon to see abstracts combine the purpose with the
• PCL reconstruction outcomes are variable and have background into a single section. Together, the
not had the same success as ACL reconstructions ­background and purpose serve to tell the reader
(research question not evident and not specific
what has already been established in the literature
enough). Numerous surgical techniques have been
described, including open and arthroscopic tibial inlay and what this current study aims to investigate.
and transtibial techniques. Currently, the most
frequently utilized technique is an open transtibial
technique with an Achilles tendon allograft [6] Fact Box 22.1: Background and Purpose
(insufficient background information related to the
research question) –– Highlight the rationale and importance
• During arthroscopic ACL reconstruction, transtibial of the study
drilling techniques are widely used because they –– Do not summarize what has already
simplify femoral tunnel placement and reduce trauma been studied
and surgical time by employing a single-incision
approach. Thus, the transtibial approach was seen a –– The purpose should be brief and follows
major innovation and quickly became a popular the background sentence
technique for arthroscopic ACL reconstruction [10] –– The purpose should be a clear statement
(research question not provided; the reviewer is not on what the study is investigating
informed why the study is needed)
22  How to Prepare an Abstract 213

22.3.3 Methods takes up much-needed space in the abstract and


may confuse the reader. It is important to care-
The Methods portion is generally the second lon- fully detail a description of the study design in a
gest section of the abstract that needs to address succinct manner that allows the reader to gener-
the following questions: “How was the study ally understand how the study was completed.
conducted?” “Who/what was included in the Gaps in this section can confuse the reader and
study?” “What were the investigational arms?” cause them to question the credibility of the
“How many subjects were included?” “What was study’s results. An example of an appropriate and
measured?” “When were measurements taken?” improper discussion of methodology is provided
“How was the study designed?” After reading in Table 22.2. With a proper discussion of meth-
this section, the reader should have a clear under- ods, the reader is able to understand the salient
standing of how the study was carried out by the details of how the study was carried out. On the
investigators. In general, statistical analyses that other hand, an improper description of the meth-
were employed in the investigation are not ods includes immense detail that prevents the
included in the methods section unless there was reader from quickly and effectively understand-
a special statistical test performed (i.e. ing the study. In the provided example, much of
Bonferroni-adjusted t-test) that assisted in inter- the detail provided is appropriate for inclusion
preting the results. The platform used to conduct within the manuscript itself as it provides the
the statistical analyses is not mentioned in the reader a detailed and reproducible description of
abstract, and those details are discussed within the study. In the abstract, only a brief description
the manuscript itself. describing what was conducted should be
While this section is designed to help the included, while details allowing the reader to
reader understand how the study was conducted, reproduce the study should be included within
it is easy to include extraneous information that the manuscript itself [1, 5].

Table 22.2  Appropriate and improper description of a study’s methods in an abstract


Improper description of methodology Proper description of methodology
34 patients scheduled to undergo a From April 2017 to May 2017, consecutive patients from a single institution
primary ACL reconstruction were scheduled to undergo a primary ACL reconstruction were prospectively
prospectively evaluated in this evaluated. Following anaesthesia, the dial test was performed on both lower
investigation. The dial test was extremities at 30° and 90° of knee flexion by two examiners with the patient
performed under anaesthesia on both supine. Both examiners performed all evaluations throughout the study to
the affected and unaffected knees at promote consistency. Tibial external rotation was measured with a
30° and 90° of flexion with a goniometer. Intraoperatively, with the knee at 30° of flexion and with a varus
goniometer by two examiners. load applied to the knee, the PLC gap was measured with a calibrated nerve
Intraoperatively, PLC gaps were hook prior to reconstruction and following reconstruction. Knees with more
evaluated prior to reconstruction and than 14 mm of lateral tibiofemoral compartment opening are considered to
immediately following. have incompetent PLC. Postoperatively, the dial test was again performed on
Postoperatively, the dial test was both the affected and unaffected knees at 30° and 90° of knee flexion by the
again administered at 30° and 90° on same examiners using a goniometer. With an expected difference of 11 ± 5°
both knees by the same two of external rotation for the affected extremity on dial test before and after
examiners [7] (From this description, ACL reconstruction and with statistical power set at 0.8, the sample size
it is not clear whether the study was needs to be at least 25 patients. The values from examiner #1 and examiner
prospective or retrospective, how the #2 were averaged to provide mean numeric values for tibial external rotation
authors collected and quantified the with dial test performance. Intraclass correlation (ICC) was assessed for both
data, and how the statistical analysis examiners. Two-sample paired t-test was used to generate 95% confidence
was performed) intervals and p-values for each investigated condition [7] (This Methods
section provides information on the study design, investigation, and data
analysis as well as proper statistics section)
214 E. Herbst et al.

data presented in the abstract should be consis-


Fact Box 22.2: Methods tent with the findings presented in the manu-
–– Focus on the key elements of the study script. A recent study demonstrated that the
design to properly inform the reader findings presented in the abstract did not accu-
–– Measurement methods must be given rately reflect the manuscript in nearly 78% of
accurately cases [11]. An example of appropriate and insuf-
–– Special statistical tests (i.e. Bonferroni ficient results sections is provided in Table 22.3.
correction) should be mentioned Well-written and detailed abstracts will not
only describe the results but will also include
numerical values, such as mean, median, standard
deviation, and statistical comparisons, to fortify
22.3.4 Results their statements [1]. When discussing statistical
significance, only the p-value is included as a par-
This is the point of the abstract that begins to dif- enthetical note after listing the result. Specific
ferentiate between descriptive and informative measures that describe the statistical test, such as
abstracts, and it functions as the most important t-value or degrees of freedom, are excluded from
component of the abstract because readers are the abstract. In general, it is best to not to describe
interested in learning about the findings of the the results vaguely by using words such as
investigation. The purpose of this section is to “small,” “massive,” “unlikely,” or “significant”;
answer the question “What was found?” The rather clear and direct statements should be made
results of the study should be presented clearly in to describe the results. While it’s important to dis-
as much detail as possible. The length or quality seminate the results of the investigation, every
of this section should not be compromised. The single result does not need to be documented in

Table 22.3  Description of a proper and inappropriate description of the results of an investigation in an abstract
Improper description of results Proper description of results
At 30°, there was a significantly larger Thirty-eight consecutive patients who underwent ACL reconstruction
dial test result in the affected knee prior were prospectively evaluated in the 6-month study period between April
to ACL reconstruction compared to after 2017 and May 2017. The mean age of the included patients was
ACL reconstruction (29.6° vs. 19.0°, 32.0 ± 12.6 years, with mean BMI of 26.3 ± 7.3 kg/m2. Most patients
p < 0.0001) and compared with the were male (58.6%) with sports-related injuries (66.0%) and a median
unaffected knee (29.6° vs. 22.5°, time between injury and surgical intervention of 31 days. The ICC
p < 0.0001), but this difference was between both examiners was 0.969, indicating a high reliability of the
eliminated after reconstruction (14.0° vs. gathered measurements. At 30°, there was a significantly larger dial test
13.5°, p = 0.69). At 90°, there was a result in the affected knee prior to ACL reconstruction compared to after
significantly larger dial test result in the ACL reconstruction (29.6° vs. 19.0°; 95% CI [−4.9, −6.6]; p < 0.0001)
affected knee before ACL reconstruction and compared with the unaffected knee (29.6° vs. 22.5°; 95% CI [5.8,
compared with after ACL reconstruction 7.4]; p < 0.0001), but this difference was eliminated after reconstruction
(31.6° vs. 21.1°, p < 0.0003) and (14.0° vs. 13.5°; 95% CI [0.7, −1.2]; p = 0.69). At 90°, there was a
compared with the unaffected knee (31.6° significantly larger dial test result in the affected knee before ACL
vs. 21.9°, p < 0.0001); with this reconstruction compared with after ACL reconstruction (31.6° vs. 21.1°;
difference was eliminated after 95% CI [−4.9, −6.9]; p < 0.0002) and compared with the unaffected
reconstruction (12.1° vs. 11.9°, knee (31.6° vs. 21.9°; 95% CI [−5.2, −9.4]; p < 0.0001); with this
p = 0.3189) [7]. (Even though the results difference was eliminated after reconstruction (12.1° vs. 11.9°; 95% CI
of the dial test are presented, no 1.31, −1.7]; p = 0.41). The PLC gap was measured at less than the
information on the study cohort are critical 12 mm both pre-ACL reconstruction (6.9 mm) and post-ACL
given. Data presentation lacks standard reconstruction (6.3 mm). The PLC gap decreased significantly (95% CI
deviations and/or confidence intervals, [−0.1, −0.7]; p = 0.0009) [7]. (This results section follows a clear
making proper understanding of the structure with demographics first, followed by inter-rater reliability data
results difficult. Furthermore, the authors and the primary outcome variables. The authors provide confidence
did not present the data as described in intervals as well as data on the aforementioned PLC gap in accordance
the methods section (Table 22.2)) to the methods section in Table 22.2)
22  How to Prepare an Abstract 215

the study. While it is important to document the


Fact Box 22.3: Results primary discovery of a study, it is of particular
–– Data should be consistent with the importance to illustrate how that finding impacts
results in the manuscript clinical practice to improve patient care and out-
–– Include numerical values (i.e. mean, comes. If the study was of a basic science nature,
standard deviation, confidence interval) the authors should discuss how the conclusions
–– If the results section is too long, focus will stimulate further clinical investigations or
on the main findings alter clinical practice. Some journals may require
–– Whenever the term significant is used, it that the clinical relevance be separated from the
should be followed by a p-value conclusion and placed within a separate heading
within the abstract. Examples of concluding
remarks are provided in Table 22.4. Appropriate
the abstract as this may convolute the overall mes- concluding remarks discuss the main finding of
sage. If the result can be easily misinterpreted, it the investigation and how it fits into clinical prac-
should be excluded from the abstract but described tice. On the other hand, insufficient concluding
in greater detail within the manuscript where it remarks discuss the main finding of the investiga-
can be more clearly explained.
In the provided example, the results of the Table 22.4  Appropriate and poor examples of conclud-
study were not significant at 90° and therefore do ing remarks within an abstract
not need to be included in the abstract. However, Appropriate concluding remarks
it is important to have these results within the • Quadriceps and Achilles tendon allografts had similar
manuscript itself. Additionally, in this study, the biomechanical properties when used for a PCL
reconstruction, but both were inferior to the native
PLC gap was measured to evaluate for concomi-
PCL. The quadriceps tendon is a viable graft option in
tant PLC injury. Patients with concomitant PLC PCL reconstruction as it exhibits a greater maximum
injury were removed from the study. While these force but is otherwise comparable to the Achilles
results are important within the manuscript, they allograft. The findings of this investigation expand
allograft availability in PCL reconstruction [6] (after
are extraneous to the abstract.
a summary of the key findings, the authors provide a
sentence related to the clinical relevance of the study)
• Incompetence of the ACL accounts for nearly 10° of
22.3.5 Conclusion tibial external rotation as evidenced by the dial test. If
the dial test is positive during examination of a
traumatic knee injury, an isolated ACL injury should
This section of the abstract contains the most not be excluded. Thus, findings of the dial test should
concise and important take-away message from thus be interpreted with caution in the setting of ACL
the study. This portion of the abstract is generally injury [7] (after a summary of the key findings, the
one to two sentences and answers the question authors provide a sentence related to the clinical
relevance of the study)
“What are the implications of the investigation?” Insufficient concluding remarks
In addition to the primary discovery of the study, • Quadriceps and Achilles tendon allografts had similar
other important findings should be described as biomechanical properties when used for a PCL
well. The goal of this section is to describe how reconstruction, but both were inferior to the native
PCL. However, quadriceps tendon allografts
the results of this investigation fit into the scope
displayed a stronger construct with failure load and
of previously conducted research and how the stiffness in comparison to Achilles tendon allografts
study impacts our current knowledge base. Since [6] (in this example, the concluding sentences are
readers may skip directly to this section, it is the somewhat contradicting each other. Furthermore, the
clinical relevance is missing)
authors’ responsibility to make a concise and
• Incompetence of the ACL accounts for nearly 10° of
accurate assessment of the results of the investi- tibial external rotation as evidenced by the dial test
gation and its implications [1]. [7] (here, the authors missed to highlight the
The conclusion should also incorporate a importance and clinical relevance of their key
statement that discusses the clinical relevance of findings)
216 E. Herbst et al.

tion; however, they fail to discuss the importance include the aforementioned phrases in the
of those findings. abstract, they may be removed to save space for a
sentence or phrase that helps convey the message
of the abstract.
Fact Box 22.4: Conclusion Abstracts do not include references, and it is also
–– The conclusion must be supported by important to ensure that the abstract does not have
the results. any undefined abbreviations. Lastly, each abstract
–– Make concise and accurate statements. must contain a list of 4–6 “keywords” that will help
–– Provide a sentence on the clinical or sci- guide the search indexes in finding this article.
entific merit of the study. Some authors recommend writing the abstract
from scratch after finishing a manuscript, while
some authors believe that it is best to copy direct
phrases from manuscript to implement into the
22.4 General Guidelines abstract. Those who write abstracts in the former
mechanism believe that copying information
An abstract is a 100–350-word snapshot of the directly from the manuscript leads to an abstract
manuscript, and it should not contain any infor- that is non-confluent or a summary that contains
mation that is not further supported within the too much or too little information. In this method,
manuscript. While the abstract is the first, and in it is best to reread the manuscript in its entirety
many cases, the only, portion of a research project and then summarize information in a new way
that is examined by readers, it should be the last that is unique from the original manuscript.
thing that is completed. Some may believe that an Copying portions of the manuscript is an efficient
abstract should be written first since it is a short method for creating the abstract since every piece
overview of the paper and immediately proceeds of information lies within the manuscript itself.
the manuscript, but it is much easier to summarize Ultimately, neither method is considered more
a manuscript that has already been completed. efficacious than the other, but the modality in
Additionally, it is more efficacious to write an which the abstract is formed is dependent on the
abstract without concern for the word count and writer’s preference and their comfort level.
then pare it down to the specified word limit.
Within a limited space, it is easy for writers to
misconstrue or bias the message of a research Fact Box 22.5: Abstracts for Scientific
publication; thus, the writer must ensure that Articles
readers cannot misinterpret the abstract. Each –– Follow the instructions for authors
component of the abstract should be able to stand –– Tell the same story as in the manuscript
alone such that the reader can clearly understand –– Focus on the data related to the purpose
the message of each aspect without having to and hypothesis of the study
refer to other sections of the abstract for clarifica- –– The conclusion must be supported by
tion. The abstract, and manuscript for that matter, the data and highlight the scientific
should be written in the past tense and in the third merit
person. For example, the sentence “The surgeon
fixed the anterior cruciate ligament at the mid-
point of the anteromedial and posterolateral bun-
dle” should read “The anterior cruciate ligament Take-Home Message
was fixed at the midpoint of the anteromedial and • An abstract is a snapshot of the manuscript
posterolateral bundle.” Instead of phrases that and is a crucial part of each manuscript as it is
incorporate “I” or “we,” phrases such as “the freely available and frequently read.
investigation demonstrates,” “the results illus- • Abstracts should not contain any information
trate,” or “this study explains” should be included that is not further supported within the
in the manuscript. While it is appropriate to manuscript.
22  How to Prepare an Abstract 217

• Regardless of the purpose of the abstract, it posterior cruciate ligament reconstruction with quad-
riceps versus Achilles tendon bone block allograft.
should contain the aim of the study, a brief Orthop J Sports Med. 2016;4:2325967116660068.
overview of the methods, the most important 7. Forsythe B, Saltzman BM, Cvetanovich GL, Collins
results, and a conclusion addressing the clini- MJ, Arns TA, Verma NN, et  al. Dial test: unrecog-
cal or scientific merit of the study. nized predictor of anterior cruciate ligament defi-
ciency. Arthroscopy. 2017;33:1375–81.
8. Frazer A.  How to write an effective conference
abstract. Emerg Nurse. 2012;20:30–1.
9. Ickes MJ, Gambescia SF.  Abstract art: how to write
References competitive conference and journal abstracts. Health
Promot Pract. 2011;12:493–6.
1. Andrade C.  How to write a good abstract for a sci- 10. Kopf S, Forsythe B, Wong AK, Tashman S, Anderst
entific paper or conference presentation. Indian J W, Irrgang JJ, et  al. Nonanatomic tunnel position
Psychiatry. 2011;53:172–5. in traditional transtibial single-bundle anterior cru-
2. Beaufils P, Becker R, Kopf S, Matthieu O, Pujol ciate ligament reconstruction evaluated by three-­
N.  The knee meniscus: management of traumatic dimensional computed tomography. J Bone Joint Surg
tears and degenerative lesions. EFORT Open Rev. Am. 2010;92:1427–31.
2017;2:195–203. 11. Li G, Abbade LPF, Nwosu I, Jin Y, Leenus A, Maaz
3. Canseco JA, Kojima K, Penvose AR, Ross JD, M, et  al. A scoping review of comparisons between
Obokata H, Gomoll AH, et  al. Effect on ligament abstracts and full reports in primary biomedical
marker expression by direct-contact co-culture of research. BMC Med Res Methodol. 2017;17:181.
mesenchymal stem cells and anterior cruciate liga- 12. Pierce LL. How to make a beef stew...or write a struc-
ment cells. Tissue Eng Part A. 2012;18:2549–58. tured abstract. Rehabil Nurs. 2017;42:243–4.
4. Cartwright R, Tikkinen KA, Vierhout ME, Koelbl 13. Turbek SP, Chock TM, Donahue K, Havrilla CA,
H.  How to write an ICS/IUGA conference abstract. Oliverio AM, Polutchko SK, et al. Scientific writing
Int Urogynecol J. 2010;21:509–13. made easy: a step-by-step guide to undergraduate
5. De Smet AA, Manaster BJ, Murphy WA,J. How to write writing in the biological sciences. Bull Ecol Soc Am.
a successful abstract. Radiology. 1994;190:571–2. 2016;97:417–26.
6. Forsythe B, Haro MS, Bogunovic L, Collins MJ, Arns 14. Weinstein R. How to write an abstract and present it at
TA, Trella KJ, et  al. Biomechanical evaluation of the annual meeting. J Clin Apher. 1999;14:195–9.
How to Make a Good Poster
Presentation
23
Baris Kocaoglu, Paulo Henrique Araujo,
and Carola Francisca van Eck

23.1 Introduction distributed across the meeting (Fig. 23.2). Some


meetings employ a combination of a physical and/
Poster presentations are an important part of every or e-poster or an event to let the authors pitch the
scientific meeting [1, 17, 20]. Often new ideas and research presented on their poster with a short
concepts are presented here [5]. A poster can be oral presentation. Regardless of the format, the
an excellent way to present a research project to poster should catch the attention of the audience
an audience of interested peers and can be used to while representing the study data in a clear and
obtain feedback on a study [8, 16]. Peers can concise fashion [9].
include fellow researchers but also surgeons,
physical therapist, nurses, and engineers, and
Fact Box 23.1
more [12, 19]. One major advantage of a poster
presentation over a podium presentation is that a A poster is an excellent way to present a
poster is available to be viewed during the entire research project and obtain feedback from
duration of the meeting and can therefore gain peers.
more exposure [18]. Various types of poster pre-
sentations exist. Perhaps the most commonly
known format is a printed poster displayed in an This chapter aims to help orthopedic research-
exhibit hall on a poster board (Fig. 23.1). However, ers in the preparation and presentation of a scien-
more meetings are transitioning to electronic tific poster. The learning objectives are to know
poster (e-­posters). An e-poster is essentially a the various different types of poster presentation,
slide show presentation in which the slides be familiar with the technical aspect of how to
advance automatically available on computers make a scientific poster, and understand what to
do at the scientific meeting to get the most out of
B. Kocaoglu presenting research in poster format.
Department of Orthopedic Surgery, Acibadem
University Faculty of Medicine, Istanbul, Turkey
P. H. Araujo 23.2 Guidelines to Prepare
Santa Luzia Hospital, Clínica COB, Brasília, Brazil a Poster Presentation
C. F. van Eck (*)
Department of Orthopedic Surgery, University of Because a poster is not designed as oral presenta-
Pittsburgh Medical Center, Rooney Sports Complex, tion, it should be prepared differently than a lec-
Pittsburgh, PA, USA ture. A poster should attract and engage the
e-mail: vaneckc@upmc.edu

© ISAKOS 2019 219


V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_23
220 B. Kocaoglu et al.

Fig. 23.1  Example of a conventional, printed poster displayed in an exhibit hall on a poster board

viewer by generating visual interest [2–4, 7, 10, should be disclosed, and contact information for
11, 13, 15, 21, 22]. However, when it comes to the corresponding authors should be provided
presentation of the data, this must be done on a
stand-alone basis and be self-explanatory. This
means that the readers of the poster should be Fact Box 23.2
able to understand the study aim, methods, A poster should be concise enough to attain
results, conclusion, and significance, even when the readers’ attention and also complete
the presenting author is not there to explain any- enough to allow interpretation without ver-
thing. In addition, the figures must have clear fig- bal presentation.
ure legends and labeling where appropriate to
facilitate this.
Generally, there should be a title portion, fol- [14]. Detailed instructions on how to prepare
lowed by objectives, methods, results, and dis- these sections are discussed below.
cussion/conclusion section. Tables and/or figures The title should be concise and attract the
can be used to allow for easier/more interesting attention of people passing by. Oftentimes titles
presentation of the data. Conflicts of interest are too long. Phrasing the title as a strong state-
23  How to Make a Good Poster Presentation 221

Fig. 23.2  Example of an e-poster. An e-poster is essentially a slide presentation in which the slides advance automati-
cally available on computers distributed across the meeting

ment or a question is generally better to spark the Similar to the methods, the results are often
interest of the readers. All authors with their cre- better presented in table or figure format to
dentials and the affiliated institution(s) should catch and keep the reader’s attention. Unlike in
follow the title. If there are any conflicts of a manuscript, only the most important findings
­interest to disclose, this should also be done. The of the study should be described keeping this
person in charge of making the poster should section short and concise (Fig.  23.4). The dis-
check the guidelines pertaining to how to dis- cussion/conclusion section should state the
close potential conflicts of interest from the spe- summary of the study. A brief discussion fol-
cific meeting/organization. lows. The focus of the discussion should be on
The first text box should discuss the objective/ clinical relevance of the work presented, limita-
hypothesis of the study. A short background may tions, and future implications. Similar to the
be provided if relevant for understanding the goal introduction section, the use of references is
of the study. However, care must be taken to generally considered optional, and references
avoid making the poster to wordy and lose the should be used sparingly to decrease unneces-
attention of the reader. References are considered sary wordiness of the poster distraction from the
optional but again can be used sparingly if this is message of the research.
felt to be fundamental in understanding the ratio-
nale of the study. The methods section should be
brief but present enough detail to understand the 23.3 Technical Aspects
study design, nature, population, data collection,
and statistical analysis. If tables or figures can be Perhaps equally important to the content of the
used instead of text, this should be strongly con- poster is how well this content is presented [2–4,
sidered (Fig. 23.3). 7, 10, 11, 13, 15, 21, 22]. To start, the preparer of
222 B. Kocaoglu et al.

Fig. 23.3  With regard to the methods section of the the example on the right, the same information is con-
poster, use tables or figures instead of text where possible. veyed using text only. Text is less appealing and harder to
The example on the left shows how figures are used to interpret
gain the readers’ interest as well as to reduce the text. In

Fig. 23.4  Unlike in a


manuscript, only the
most important findings
of the study should be
discussed in the results
section, keeping the
result section short and
concise. The example on
the top lists only the key
findings and uses an
easy to interpret graph.
The example on the
bottom describes the
same findings in
text-only format. The
latter is more difficult to
interpret for the reader
and may not generate as
much visual interest
23  How to Make a Good Poster Presentation 223

the poster should check the specific meeting to your institution. This can include the organiza-
find out the dimension of the poster, which are tion logo, picture, or slogan (Fig.  23.5).
allowed and recommended. Most posters are cre- Conversely, this may benefit you as much as it
ated using PowerPoint [Microsoft, Redmond, benefits your institution, as the reputation of your
WA, USA] or something comparable. If you are institution alone may attract viewers to your
part of a larger academic institution, hospital sys- poster. If no such template exists and you are the
tem, or research group, it may be worth checking first person making it, try to pick a calm back-
if poster templates are in existence of your insti- ground color, perhaps matching your institution’s
tution, which you can then utilize. logo colors, combined which a text color which
stands out from the background [4]. For example,
avoid yellow text on a white background. Refrain
Fact Box 23.3 from using colors, which may be unconsciously
Optimal visual poster presentation perceived to be offensive such as red text. There
includes a calm background color and a is a fine line between attracting attention and the
neutral font which stands out from the poster being a visual overload. The size and font
background and is large enough to read of the text are also very important. It is best to use
from a distance. a neutral font that is easy to read from a distance,
such as Arial or Sans Serif. Generally, the size of
the letter is the largest for the title and section
headings (at least >62), medium sized for the text
The optimal format to present and promote (at least >44), and the smallest for the references
your organization is to have a unanimous format and corresponding authors’ contact information
that is easily recognized by others as belonging to (>36) (Fig. 23.5) [2–4, 7, 10, 11, 13, 15, 21, 22].

Fig. 23.5  The official template for ESSKA (European have a unanimous format, which includes your logo. Use
Society Sports Traumatology, Knee Surgery and a calm background color and a neutral font with a color
Arthroscopy) congress. The optimal way to present and that stands out from the background
promote yourself, your poster, and your organization is to
224 B. Kocaoglu et al.

Lastly, if your poster is a printed poster, it will usually based on the overall presentation, includ-
need to be printed ahead of the conference. ing the abstract, the poster itself, and the atten-
Several companies are available online which tiveness and presence of the presenting author.
perform scientific poster printing. You will need The latter includes proper attire following the
to upload your presentation file to their website, established dress code for the meeting. If unsure
and generally you can select if you would like to about the dress code, ask someone who has been
see proofs before printing. This may be worth- to the meeting before or contact the meeting
while when the poster contains pictures or graphs directly. It is always better to air on the side of
to ensure the resolution is high enough to provide being overdressed. Remember, you are represent-
a good-quality poster at the size that it needs ing yourself, your research, and your institution.
printed on. Take into account that receiving the
printed poster may take several weeks [3]. Allow
extra time to reprint the poster if upon its arrival Fact Box 23.4
misprints or mistakes are identified and a new Proper dress code is important when pre-
copy needs to be printed. Many large meetings senting a poster as it reflects on youself,
now offer poster-printing services, which will your research, and your institution.
have your poster ready for you at the meeting
site. The major benefit is that this avoids the bur-
den of traveling with several posters. However,
the major limitation is that if you arrive to find
there is something wrong with your poster, there Meeting guidelines should be checked as
will likely not be sufficient time for revisions. poster presenters can be disqualified for
These pros and cons will have to be considered. submission during future meeting if rules
are not followed.

23.4 At the Conference


A consideration can be to add a handout to your
When arriving at the meeting, you will need to poster, which the readers may take with them. This
hang up (or upload) your poster. Be sure to check could list the abstract, key points, and the contact
your poster numbers, location, and setup times information for the corresponding authors. Some
on the specific conference website. Some meet- meetings will have the abstract for each study
ings provide pushpins to hang up the poster, but available for registered attendees online, on a flash
this is not always the case, so it is best to check in drive, or printed in the program book [6, 23].
advance.
Ensure that the presenter of the poster is avail- Take-Home Message
able to stand with the poster at least during the • Poster presentations are a key component of
mandatory time slots but more frequently than any scientific conference.
this if possible. Presenters can be disqualified for • Often new ideas and concepts are presented
submission during future meeting if these rules here.
are not obeyed. Popular times for people to view • Various types of poster presentations exist,
posters are during the (lunch) breaks and of including a printed format displayed in an
course the poster sessions. Showing the poster to exhibit hall, e-posters available on computers,
colleges or presenting it at a lab or research meet- and a combination of a poster with a short talk.
ing at one’s own institution prior to the scientific • A poster should attract and engage the viewer
meeting may help generate comments and help by generating visual interest.
the poster presenter be ready to answer any ques- • However, when it comes to presentation of the
tions the audience might ask. Special judges may data, this must be done on a stand-alone basis
be assigned to score posters for awards. These are and be self-explanatory.
23  How to Make a Good Poster Presentation 225

• The person in charge of making the poster rates of podium versus poster presentations at the
Arthroscopy Association of North America Meetings
should check the guidelines for the meeting 2008-2012. Arthroscopy. 2017;33:6–11.
the poster is being presented. 9. Gundogan B, Koshy K, Kurar L, Whitehurst K. How
• Although the content of the poster is impor- to make an academic poster. Ann Med Surg (Lond).
tant, so is the quality of the visual 2016;11:69–71.
10. Hamilton CW. At a glance: a stepwise approach to suc-
presentation. cessful poster presentations. Chest. 2008;134:457–9.
• Choose previously used templates from your 11. Hand H. Reflections on preparing a poster for an RCN
institution to ensure uniformity and easy iden- conference. Nurse Res. 2010;17:52–9.
tification of presentations related to the 12. Kleine-Konig MT, Schulte TL, Gosheger G, Rodl

R, Schiedel FM.  Publication rate of abstracts pre-
institution. sented at European Paediatric Orthopaedic Society
• Ensure that the presenter of the poster is avail- Annual Meetings, 2006 to 2008. J Pediatr Orthop.
able to stand with the poster during the man- 2014;34:e33–8.
datory time slots. 13. Lourie RJ.  Preparing a poster presentation. Nurse

Educ. 1989;14:10, 18, 23.
• Reviewing the poster with colleagues prior to 14. Matsen FA 3rd, Jette JL, Neradilek MB. Demographics
the scientific meeting may help generate com- of disclosure of conflicts of interest at the 2011 annual
ments and improve the poster (presentation). meeting of the American Academy of Orthopaedic
• The presenter should be dressed in proper Surgeons. J Bone Joint Surg Am. 2013;95:e29.
15. Miller JE. Preparing and presenting effective research
attire. posters. Health Serv Res. 2007;42:311–28.
• It is always better to air on the side of being 16. Naziri Q, Mixa PJ, Murray DP, Grieco PW, Illical
overdressed. EM, Maheshwari AV, Khanuja HS. Adult reconstruc-
• You are representing yourself, your research, tion studies presented at AAOS and AAHKS 2011–
2015 Annual Meetings. Is there a difference in future
and your institution [7]. publication? J Arthroplasty. 2018;33(5):1594–7.
17. Ohtori S, Orita S, Eguchi Y, Aoki Y, Suzuki M,

Kubota G, Inage K, Shiga Y, Abe K, Kinoshita H,
References Inoue M, Kanamoto H, Norimoto M, Umimura T,
Furuya T, Masao K, Maki S, Akazawa T, Takahashi
K.  Oral presentations have a significantly higher
1. Abicht BP, Donnenwerth MP, Borkosky SL, Plovanich
publication rate, but not impact factors, than poster
EJ, Roukis TS.  Publication rates of poster presenta-
presentations at the International Society for Study
tions at the American College of Foot and Ankle
of Lumbar Spine meeting: review of 1126 abstracts
Surgeons annual scientific conference between 1999
from 2010 to 2012 meetings. Spine (Phila Pa 1976).
and 2008. J Foot Ankle Surg. 2012;51:45–9.
2018;5:1347–54.
2. Beal JA. Preparing for a poster session—some practi-
18. Preston CF, Bhandari M, Fulkerson E, Ginat D, Koval
cal suggestions. Mass Nurse. 1986;56:5.
KJ, Egol KA. Podium versus poster publication rates
3. Boullata JI, Mancuso CE.  A “how-to” guide in pre-
at the Orthopaedic Trauma Association. Clin Orthop
paring abstracts and poster presentations. Nutr Clin
Relat Res. 2005;(437):260–4.
Pract. 2007;22:641–6.
19. Schulte TL, Trost M, Osada N, Huck K, Lange T,
4. Briscoe MH. Preparing scientific illustrations: a guide
Gosheger G, Holl S, Bullmann V.  Publication rate
to better posters, presentations, and publications.
of abstracts presented at the Annual Congress of the
New York: Springer; 1996.
German Society of Orthopaedics and Trauma Surgery.
5. Daruwalla ZJ, Huq SS, Wong KL, Nee PY, Murphy
Arch Orthop Trauma Surg. 2012;132:271–80.
DP. “Publish or perish”-presentations at annual
20. Voleti PB, Donegan DJ, Baldwin KD, Lee GC. Level
national orthopaedic meetings and their correlation
of evidence of presentations at American Academy of
with subsequent publication. J Orthop Surg Res.
Orthopaedic Surgeons annual meetings. J Bone Joint
2015;10:58.
Surg Am. 2012;94:e50.
6. Donegan DJ, Kim TW, Lee GC. Publication rates of
21. White A, White L. Preparing a poster. Acupunct Med.
presentations at an annual meeting of the american
2003;21:23–7.
academy of orthopaedic surgeons. Clin Orthop Relat
22. Wipke-Tevis DD, Williams DA. Preparing and present-
Res. 2010;468:1428–35.
ing a research poster. J Vasc Nurs. 2002;20:138–42.
7. Erren TC, Bourne PE.  Ten simple rules for a good
23. Zelle BA, Zlowodzki M, Bhandari M. Discrepancies
poster presentation. PLoS Comput Biol. 2007;3:e102.
between proceedings abstracts and posters at
8. Frank RM, Cvetanovich GL, Collins MJ, Arns TA,
a scientific meeting. Clin Orthop Relat Res.
Black A, Verma NN, Cole BJ, Forsythe B. Publication
2005;(435):245–9.
How to Prepare a Paper
Presentation?
24
Timothy Lording and Jacques Menetrey

24.1 Abstract Submission the abstract may be edited and repurposed for
your submission. Use the word limit as much as
The first step in presenting your research at a possible, as it can be difficult to distil complex
meeting is submitting an abstract. The call for work into only a few hundred words. For each
abstracts often occurs quite early, especially for section of your abstract, determine the crucial
larger meetings, and can close a considerable points you want to convey. From this core infor-
time before the conference is to take place. For mation, expand your text to the word limit in a
example, the call for abstracts for the 2019 comprehensive manner. The submission process
ISAKOS congress closes in September 2018, a is competitive, and if English is not your first lan-
full 9 months before the meeting takes place. guage, it may be worth asking a colleague to
Submission is usually online. Required infor- review your abstract prior to submission, to make
mation will include the author’s details and affili- sure your message is clear.
ations, the title of your project, the abstract itself,
as well as any financial disclosures.
The word limit for the abstract text can vary 24.2 Presentation Structure
widely, from as few as 300 to as many as 800 and Preparation of Slides
words. Some meetings will dictate abstract sub-
headings, but most are variations of the standard Most free paper presentations are 6 min in length.
scientific structure: background, aims, methods, Careful preparation is important, to ensure that that
results, and conclusions. You may have already the premise, findings, and relevance of your work
prepared a manuscript for your research, and if so are successfully conveyed in this short timeframe.
Preparing your talk and preparing your slides
T. Lording (*) go hand in hand and for simplicity are considered
Melbourne Orthopaedic Group, here together. A paper examining the importance
Windsor, VIC, Australia of tibial bony and meniscal slopes in anterior cru-
The Alfred Hospital, Melbourne, VIC, Australia ciate ligament injury is used as an example [1].
e-mail: tlording@iinet.net.au Similar to your abstract, most paper presenta-
J. Menetrey tions will follow a standard structure, including
Centre de Médecine du Sport et de l’Exercice, an introduction, methods, results, discussion, and
Hirslanden Clinique la Colline, Geneva, Switzerland conclusion. Due to the time constraints inher-
Service de Chirurgie Orthopédique et Traumatologie ent in a standard 6-min conference presenta-
de l’Appareil Moteur, University Hospital of Geneva, tion, it is important to convey the most i­mportant
Geneva, Switzerland

© ISAKOS 2019 227


V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_24
228 T. Lording and J. Menetrey

i­nformation clearly and concisely. Again, if you Most meetings mandate the second slide to be
have already prepared a manuscript for your a disclosure slide, outlining any potential con-
work, this will simplify the process of preparing flicts of interest in the work.
your talk. Usually, we would estimate three slides
per minute to be an appropriate pace for a scien-
tific presentation. 24.2.2 Introduction
Many institutions have slide templates avail-
able for presentations. If not, it is important to The introduction is critical in setting the context
choose a template, colour scheme, and font that of your research. In two or three slides, or roughly
is easy for your audience to read. As a general 90  s, you need to convey to the audience the
rule, try to keep slides uncluttered, with a few background to and purpose of your work, as well
main points per slide and clear, illustrative dia- as your aim and hypothesis. More than any other
grams. Many presenters use slides with huge section of your presentation, this should be tai-
amounts of text or statistical information, often lored to your audience. For example, the audi-
accompanied by an apology for using such a ence at a general meeting may require more
“busy slide”. The message is clearer if just the context to understand a paper about tibial slope in
relevant information is presented. Try not to anterior cruciate ligament (ACL) reconstruction
have more than eight lines of text on any slide than would be required at an ACL-specific sub-
and avoid small font sizes. speciality meeting. Briefly, the current state of
the science and the context of your research
should be presented.
24.2.1 Title Slide The scientific method involves generating a
hypothesis and working to prove or disprove this
The title slide should include, at a minimum, hypothesis. It is important to clearly state the aim
the title of your work and the full authors’ of your work as well as the hypothesis at the end
names. Most would include the details of the of your introduction.
conference, as well as the institutions from
which the work arose (Fig. 24.1). Remember to
acknowledge your co-authors when you intro- 24.2.3 Methods
duce your work, and it is courteous to thank the
conference organisation for the opportunity to In this section, you should outline the materials
present. and methods of your project. Due to the inherent

Fig. 24.1  Title slide


24  How to Prepare a Paper Presentation? 229

time constraints, this section must necessarily be clarity to your slides and to avoid detailed expla-
a summary of rather than a fully detailed descrip- nations, do not hesitate to add landmarks if you
tion of the methods used. At the same time, impor- present anatomical or histological pictures. It is
tant points, such as demographic differences reasonable to offer to discuss your results further
between treatment groups that might influence with interested parties after the presentation.
your results, should be highlighted and not glossed
over. Photographs of testing rigs for biomechani-
cal studies, or illustrations of radiological mea- 24.2.5 Discussion
surements, are particularly helpful in aiding the
audience to understand your work (Fig.  24.2). The discussion section gives the greatest licence
Finally, any illustrations may help you to present to the presenter to get the message of their work
the methods used in a succinct manner. across. How does your work fit in with the find-
ings of others? What do you think the main impli-
cations of your work are? Does your work help to
24.2.4 Results explain the questions raised by other authors?
Does your work raise new and interesting ques-
The results section is often the shortest in a paper tions? These are the themes that may be explored
presentation. Frequently two or three slides is all in the time available and will vary from paper to
that is required to present the main findings. paper. Briefly, your work should be compared to
Graphical representation is often helpful for your the current body of knowledge, using previous
audience and more intuitive to understand than works, comparable investigations, followed by
tables of numbers. Consider the data carefully its originality, novelty, and clinical relevance.
when choosing the most appropriate illustrative Your findings should be critically addressed point
style. For the tibial slope paper, a bar graph high- by point and hypothetical explanations explored.
lights the differences between groups well One trick is to write on post-it notes the different
(Fig. 24.3). Don’t present busy tables with doz- findings you may have to discuss and stick them
ens of numbers. Focus on the most important and on your computer or a board. It gives you a good
pertinent findings of your results and present it in overview and allows you to properly organise
a way that leads on to your discussion. To add your discussion.

Fig. 24.2  Methods slide demonstrating measurement technique in the tibial slope study. Includes material from
Elmansori et al. [1] (Reproduced with permission from Knee Surgery, Sports Traumatology and Arthroscopy)
230 T. Lording and J. Menetrey

Fig. 24.3  Results slide


comparing lateral and
Results-Slope
medial bony and
meniscal slopes.
Includes material from ACL GROUP
Elmansori et al. [1] CONTROL GROUP
12
(Reproduced with • Lateral bony slope > Medial bony
permission from Knee slope 10
Surgery, Sports
• Medial meniscal slope > Lateral 8
Traumatology and
meniscal slope
Arthroscopy) 6
• Lateral meniscus greater impact on
functional slope that medial 4
meniscus
2

0
LTS MTS LMS MMS

Fig. 24.4  Discussion of


relevant results from the
Discussion
literature, highlighting
the importance of the
lateral tibial slope.
Includes material from
Stijak et al. [6] * Stijak KSSTA 2007
(Reproduced with *  lateral bony slope in ACL injured than controls
* Hashemi AJSM 2010
permission from Knee
*  lateral bony slope in ACL injured ,  medial bony
Surgery, Sports slope in ACL injured males
Traumatology and * Dare AJSM 2015
Arthroscopy) *  lateral bony slope in ACL injured paediatric
patients

* Hudek CORR 2011


*  lateral meniscal slope in ACL injured

Fig. 24.5  Discussion of


a postulated mechanism
Discussion
by which increased
lateral tibial slope may
increase ACL strain. A case-control study of anterior cruciate ligament volume, tibial plateau
Includes material from slopes and intercondylar notch dimensions in ACL-injured knees
Simon et al. [4] R.A. Simona, J.S. Everharta, H.N. Nagarajanb, A.M. Chaudharic, *
(Reproduced with A B
Axial
permission from the Compression
Journal of Femoral
Biomechanics) Femur
External
Rotation

Medial
Lateral Plateau
Plateau

Simon R et al, J Biomech, 2010

For example, in the tibial slope presentation, one slide examined the emerging evidence link-
one slide was dedicated to the results of similar ing lateral meniscal injury to rotational instabil-
studies in the literature and highlighted the appar- ity, and one slide discussed the potential role for
ent importance of the lateral tibial slope, one and current evidence for osteotomy to modify
slide discussed a postulated mechanism by which tibial slope in the ACL-injured knee (Figs. 24.4,
lateral compartment slope may cause ACL injury, 24.5, 24.6, and 24.7).
24  How to Prepare a Paper Presentation? 231

Fig. 24.6  Discussion of


the role of the lateral
Discussion
meniscus in rotational
instability. Includes
material from Shybut
et al. [3] (Reproduced 20
with permission from 18
16
the American Journal of

Internal Rotation (º)


14
Sports Medicine) and 12
Lording et al. [2] 10
(Reproduced with 8
permission from Clinical 6
4
Orthopaedics and 2
Related Research) 0
Intact ACL- M-/ALL+ M+/ALL- M-/ALL-

Shybut T et al, AJSM, 2015 Lording T et al, CORR, 2017

Fig. 24.7  Discussion of


the role of osteotomy.
Includes material from Discussion
Sonnery-Cottet et al. [5]
(Reproduced with
permission from the
• Slope modifiable by
American Journal of osteotomy
Sports Medicine) A B C
• Threshold for intervention
unclear
• Published data in 2nd revision
surgery
• Sonnery-Cottet AJSM 2014
• Dejour KSSTA 2015

Fig. 24.8 Limitations
slide
Limitations

• Recumbent scanning
• Effect on meniscal slope
• Control group with PFJ pain
• May have different slope
characteristics to true
asymptomatic population

The discussion should conclude with a slide should be introduced at this stage, and ensure
and brief mention of the limitations of your study your conclusions are supported by your results
and how these might influence your results (Fig.  24.9). Do not overstate findings you were
(Fig. 24.8). expecting but you have not found. You have to
find the right balance between advertising your
results without overdoing it.
24.2.6 Conclusions In a basic science presentation, a slide on
which you report on future and ongoing works
Clearly state the conclusions of your work in at may prevent questions with respect to further
most two slides. No new information or ideas development of your research.
232 T. Lording and J. Menetrey

Fig. 24.9 Conclusion
slide with three clear
points Conclusion

• Tibial bony and meniscal slopes may be reliably


measured using an MRI based method

• Increased bony and meniscal slopes are risk factors for


ACL injury

• As the meniscus corrects slope towards the horizontal,


meniscal loss may potentiate risk by increasing the
functional slope

Fact Box 24.1: Presentation Structure


Section Slides Content
Title slide 1 Title, authors, and affiliations
Declarations 1 Possible conflicts of interest
Often prescribed format
Introduction 2–3 Background and purpose of study
Statement of aims and hypothesis
Materials and methods 3+ Summary of investigative method
Illustrations useful
Results 2–4 Short presentation of relevant result data
Usually includes tables/graphs
Discussion 2–4 How does your work fit with previous work?
What are the implications of the results?
Does your work answer questions raised by others?
Does your work raise new questions?
Conclusions 1–2 Clear statement of conclusions

24.3 Preparation for Questions relevant work, especially if there is a long lead


time between abstract submission and your pre-
Taking questions about your research during the sentation. To this end, it is well worth while doing
allocated discussion time can be the most difficult a literature search in the days leading up to the
and stressful part of presenting your paper. As the presentation, to identify any relevant new publi-
questions may be unpredictable, this cannot be cations. Thirdly, consider not only the questions
entirely rehearsed, but it is certainly possible to answered but also the questions raised by your
anticipate some questions and to be prepared. work.
Firstly, consider the work itself. As it is not Lastly, take advantage of any available oppor-
possible to go into too much detail during the tunities to practise your presentation. Many units
presentation itself, particularly regarding your encourage a “dry run” presentation within the
methods, this is one area where the audience may group prior to larger meetings. This will allow
ask for more details. Secondly, consider the work you to practise not only the presentation but also
of others. Likely you will have discussed your to practise fielding questions and to see what type
work in the context of the results of other of questions your presentation stimulates. In two
researchers. Make sure you are up to date with words: be prepared!
24  How to Prepare a Paper Presentation? 233

24.4 On the Day self with how to advance slides. If you need a laser
of the Presentation pointer, it is a good idea to bring one as one is not
always provided. Adjust the microphone so it is
24.4.1 Dress close enough to pick up your voice but not so close
that your speech is muffled. Inexperienced pre-
It is important to dress appropriately for your pre- senters tend to rush and speak quietly, so be con-
sentation. In most circumstances, this would scious of speaking up, slowly and clearly.
mean suit and tie for males and equivalent for Similarly, when answering questions take a
females. Some meetings, such as smaller, sub- moment to consider before responding.
speciality meetings or meetings held at summer
or winter resort locations, may have a more Take-Home Message
relaxed dress code. If in doubt, always err towards • Presenting a conference paper is an excellent
dressing conservatively. way to share your research and ideas and to
generate discussion with your colleagues.
• Presenting is intimidating, particularly for
24.4.2 Audio-Visual inexperienced authors and at large meetings,
but experience is the best teacher and it gets
You will need to find the speaker’s room or audio-­ easier with time.
visual personnel to provide and upload your • Preparation, rehearsal, and anticipation are
slides. In smaller meetings this may be as simple essential to get your message across effectively
as going to the desk at the back of the room at the and to portray your work in the best light.
beginning of the session. Larger meetings often
have a central speaker’s room you need to locate.
This is not always easy, especially at large meet- References
ings with many concurrent sessions, so make
sure to arrive early to leave yourself ample time 1. Elmansori A, Lording T, Dumas R, Elmajri K, Neyret
P, Lustig S. Proximal tibial bony and meniscal slopes
for this task. are higher in ACL injured subjects than controls: a
Once your slides have been uploaded, review comparative MRI study. Knee Surg Sports Traumatol
them to ensure that you have uploaded the correct Arthrosc. 2017;25:1598–605.
version and that no formatting or layout errors 2. Lording T, Corbo G, Bryant D, Burkhart TA, Getgood
A. Rotational laxity control by the anterolateral liga-
have occurred. Always check that your videos are ment and the lateral meniscus is dependent on knee
properly displayed on the organisational support flexion angle: a cadaveric biomechanical study. Clin
and find out if they are automatically or manually Orthop Relat Res. 2017;90:1922–8.
played. Be prepared to comment on the content 3. Shybut TB, Vega CE, Haddad J, Alexander JW,
Gold JE, Noble PC, Lowe WR.  Effect of lateral
of a video if there are playback issues at the time meniscal root tear on the stability of the anterior
of the presentation, which is not infrequent. cruciate ligament-deficient knee. Am J Sports Med.
2015;43:905–11.
4. Simon RA, Everhart JS, Nagaraja HN, Chaudhari
AM.  A case-control study of anterior cruciate liga-
24.4.3 Delivery ment volume, tibial plateau slopes and intercondylar
notch dimensions in ACL-injured knees. J Biomech.
Take a moment to stand at the podium before the 2010;43:1702–7.
start of your session to get your bearings. As a rule, 5. Sonnery-Cottet B, Mogos S, Thaunat M, Archbold
P, Fayard JM, Freychet B, Clechet J, Chambat
on the podium you will have a view of your slides, P.  Proximal tibial anterior closing wedge osteotomy
as well as a clock or countdown for the time limit. in repeat revision of anterior cruciate ligament recon-
Usually you will not be able to use a presenter’s struction. Am J Sports Med. 2014;42:1873–80.
view showing the next slide and prompts, so if you 6. Stijak L, Herzog RF, Schai P.  Is there an influ-
ence of the tibial slope of the lateral condyle on the
feel written prompts will aid your presentation ACL lesion? Knee Surg Sports Traumatol Arthrosc.
then prepare some small cards. Familiarise your- 2007;16:112–7.
How to Write a Clinical Paper
25
Brendan Coleman

25.1 Why Publish? reach the final difficult stages. Don’t be afraid of
controversial topics as they can produce some of
There are many reasons why surgeons want to the most interesting papers [7]. Once you have
publish their research. For some, it is a require- chosen your topic, you need to formulate your
ment of their position, or they wish to enhance research question and the hypothesis. The
their career and gain promotion. For others, there hypothesis is what you think the answer to your
is a moral obligation to the participants of the question will be. It does not matter if this is right
study to disseminate the knowledge gained and or wrong, as the study is going to determine the
improve the knowledge of the orthopaedic com- answer.
munity. There is an expectation for surgical train- Once a good research project has been identi-
ees to undertake research. Publishing research fied, a thorough literature review should be
can be helpful to learn the structure of scientific undertaken. It is important to read the classic
papers and enable surgeons to critique and iden- papers on the topic, but knowledge moves rap-
tify the main points of other research. Although idly, and the relevant articles published over the
the prospect of writing a manuscript for publica- last few years should also be reviewed. You need
tion can be daunting, the satisfaction of seeing to determine if the research question has been
your research published is worth the effort. investigated previously and what you are adding
to the knowledge base on the topic.
It is helpful to seek the guidance of a col-
25.2 Before You Start league who has published scientific research in
the early stages of planning your research. This
The best approach to research is to choose a topic will help in avoiding the common mistakes that
that you are passionate about and that has rele- are made in undertaking and publishing scientific
vance to your clinical practice. Writing a manu- research. Guidance could include identifying the
script is hard, and it is the last 10% of effort that right journal to submit your manuscript to. Once
makes the difference in producing a publishable you have identified the journal, it is worthwhile
paper. If the topic does not interest you, it is easy reading the journal to understand the style of
to let the manuscript preparation slide when you research and papers published [7].

B. Coleman (*)
Department of Orthopaedic Surgery, Middlemore
Hospital, Auckland, New Zealand
e-mail: Brendan.Coleman@middlemore.co.nz

© ISAKOS 2019 235


V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_25
236 B. Coleman

25.3 Types of Articles current gold standard treatment or no treatment.


The participants are randomly assigned to either
There are a number of types of scientific articles the intervention or control group. To ensure the
that can be produced. These are discussed in fur- study question is going to be answered, it is
ther detail in Part V of this book. These include: important that a power calculation is performed
to ensure an adequate study size is undertaken.
Controlled studies need ethical approval and reg-
25.3.1 Case Report istration and often need funding to perform.

This is the easiest place to start in producing a


scientific paper with a short description of a rare 25.3.5 Systematic Review or
or interesting case. It is uncommon for high-­ Meta-Analysis
impact journals to publish case reports as there
are few cases that have never been written about These studies are an aggregation of high-quality
before. Clinical photos are often important in randomized controlled trials published in the lit-
case reports. erature to investigate a specific question. There is
a tendency in surgical systematic reviews or
meta-analyses to conclude that the current evi-
25.3.2 Case Series dence is not strong enough to make recommenda-
tions on treatment. The PRISMA guidelines are
This is a review of a series of patients with the important when performing these types of studies
same condition who have been treated in a novel (www.prisma-statement.org) [11].
way. The data is often prospectively collected but
retrospectively analysed. The STROBE initiative
is an international collaboration of researchers 25.3.6 Laboratory-Based Research
and journal editors with the aim of improving the
quality of research in observational studies [12]. Basic science research is typically part of an
By adhering to the STROBE checklist, you ongoing research programme that aims to move
ensure that the reader can critically evaluate the into clinical research over time. These experi-
validity of your study. The STROBE statement mental studies can involve animals in research
can be found at www.strobe-statement.org. and usually need animal ethical approval.

25.3.3 Case-Control Study Fact Box 25.1: Types of Articles


• Case report
This is a type of study enabling comparison of • Case series
two cohorts of patients who are matched in every • Case-control study
aspect except the condition or intervention of • Controlled study
interest. This can be a difficult study to perform • Systematic review or meta-analysis
as it requires careful planning to ensure the • Laboratory-based research
groups are well matched and that confounding
factors are not introduced.
25.4 How to Write a Paper

25.3.4 Controlled Study Having completed the practicalities of the


research study, it can feel like you have com-
This remains the gold standard of evidence- pleted the research, but it is now that the critical
based medicine comparing an intervention group element of the research takes place. Writing
to a control group. The control group may be the and completing the manuscript can be the most
25  How to Write a Clinical Paper 237

challenging element of the study, and it can be should sound like they were hastily translated
easy not to complete the manuscript and pub- from Icelandic by a non-native speaker” [8].
lish the results. Revising is a difficult skill, but the success of
The goal of the manuscript is to present the your paper is dependent on your ability to revise
importance of your research and your findings. and edit. Experienced researchers make three
Writing needs to be precise and compact, utiliz- times as many changes to the paper as novice
ing short, simple sentences whilst avoiding com- writers [5]. One revision strategy is to learn your
plex scientific jargon [3]. You should be explicit common errors and do a targeted search for them
and clear in describing the benefit of your paper. [2]. All writers have idiosyncrasies, and using the
The title of the paper should convey the impact search function in your writing application may
of the results in a succinct manner. It should eliminate issues and improve the writing style
avoid reporting the results within the title. [2]. Performing numerous drafts is typical when
Randomized controlled trials, meta-analyses and preparing a manuscript. Feedback from experi-
systematic reviews require this to be in the title. enced researchers both within and outside your
“Five-year follow-up of unicompartmental knee specialty field is crucial to improving your paper.
replacement” is not very informative compared Once feedback has been received, you can decide
to “Increased risk of early revision in unicom- to adjust your paper as necessary.
partmental knee replacement”.
Writing a scientific manuscript is not like
writing a novel. It requires a standardized manner 25.5 Abstract
of construction—introduction, methods, results
and discussion (IMRAD) [3]. Although the The abstract needs to be a stand-alone record of
majority of scientific manuscripts are written in your research to entice potential readers into
English, many readers will not have English as reading the full paper. With the prevalence of
their primary language [3]. It is imperative that online publication databases, the abstract is more
the manuscript is easy to read and well organized important than ever as a means of selling your
and the language is simple and clear. paper. For many readers, the paper does not exist
beyond the abstract [1]. Most journals require
abstracts to conform to a formal structure and
Fact Box 25.2 with a defined word count, typically 250 words.
IMRAD method for writing manuscripts— A structured abstract usually has sections on
introduction, methods, results and background, methods, results and conclusions.
discussion The background section gives a very brief out-
line of why the subject is important and why the
reader should care about the problems and the
Writing the manuscript is hard and requires a results. This section is typically two to three
rigid writing schedule to be put in place to ensure sentences.
progress and completion. Follow the advice of The methods section should provide the reader
George M.  Whitesides, and start with a blank with information to understand what was done.
piece of paper, and write down, in any order, all This should include study design, study popula-
the important ideas that occur to you concerning tion, treatment groups and a brief description of
the paper [10]. Important elements may include the treatment if needed and the primary outcome
the research topic and why is it important, the measure. The method section does not have to be
hypothesis, the major finding and other important in great detail.
findings. The results section is the key element of the
When you create your first draft, let your ideas abstract, because readers of the abstract want to
flow leaving any revision and editing for future learn the findings of the study. The results section
drafts. As Paul Silvia advises, “your first drafts should contain as much detail as the word count
238 B. Coleman

Table 25.1  Author checklist author should outline the research question or
Introduction hypothesis [9]. This needs to be explicit and clear
Are the objectives clear? in describing the hypothesis and where the poten-
Is the importance of the study adequately emphasized? tial benefit of the paper is to the reader. Although
Is the subject matter of the study new?
a relatively brief section, the introduction is vital
Is previous work on the subject adequately cited?
Materials and methods
to set the scene for the paper. Stay focused and
Is the study population detailed adequately? concise—you do not want to answer questions of
Are the methods described well enough to reproduce controversy in the introduction; leave that for the
the experiment? discussion.
Is the study design clear?
Are statistical methods included?
Are ethical considerations provided?
Results
25.7 Methods
Can the reader assess the results based on the data
provided? The methods section of the paper should be the
Is the information straightforward and not confusing? easiest part of the manuscript to write as it records
Are there adequate controls? what you have done during the study. The details
Are statistical methods appropriate? of the study should have all been known prior to
Discussion
the commencement of the study. This section
Have you commented adequately on all their results?
Have you explained why and how their study differs
should include method of analysis and statistics
from others already published? but should not include the results. The methods
Have you discussed the potential problems and section should be like a cookbook providing step-­
limitations with their study? by-­step instructions that can be repeated by oth-
Are the conclusions supported by the results? ers [7]. It requires enough accuracy and clarity to
enable another researcher to repeat the study.
permits. It should include results of statistical Previously published research methods should be
analysis and secondary measures. referenced if relevant [6].
The conclusion section should describe the Unless you have expertise in statistical analy-
most important take-home message of the study. sis, advice should be sought from a statistician
Other findings of importance can also be prior to commencing the study [3]. The most
described. It is important that the conclusions common error in research is to recruit too few
match the findings of the paper and that the find- patients leading to a beta error where no differ-
ings are not overstated. ence may be demonstrated [7]. It is critical to per-
Further information about writing of an form a power calculation prior to starting the
abstract is contained in Chap. 22 (Table 25.1). study, ensuring you recruit adequate participants
to demonstrate a difference in the treatment
groups. Once the number of participants has been
25.6 Introduction determined, an additional number of patients
should be recruited to allow for loss to follow
The aim of the introduction is to present your (transfer bias). The arbitrary minimum accept-
research question and the importance of your able transfer bias determined by high-powered
research to the current literature. It should begin journals is 20% at 2 years [7].
with a brief thought-provoking review of previ- Statistical methods are a necessary subhead-
ous research on the subject, demonstrating why ing within the methods section. You need to
the topic is important, interesting or problematic explain the statistical analysis used and informa-
in some way. It should create a niche for your tion regarding the power calculation.
study by indicating a gap in the current knowl- The study population needs to be clearly
edge or how you are extending the knowledge on defined for the reader. It should be defined by
the subject. Once you have created the niche, the providing clear inclusion and exclusion criteria.
25  How to Write a Clinical Paper 239

The definition should include the time frame dur- outcomes follow, presenting early then late out-
ing which the study was undertaken. This should comes and complications.
begin with the total number of patients eligible In writing the results, the presentation of data
for the study followed by the number who met should be mixed with text, tables and figures.
the inclusion/exclusion criteria. The number of “A picture is worth a thousand words” also applies
patients followed up and the period of follow-up to scientific papers, and a well-constructed figure
need to be described. Ideally, a maximum of 20% or table is an excellent method to present mul-
of patients should be lost to follow-up to make tiple data points in a simple and effective manner.
the data robust [7]. A table or figure should be a stand-alone repre-
If there are diagnostic criteria for the inclusion sentation of data. Although a succinct summary
in the study, these should be defined, e.g. patients of the meaningful data presented in figures or
were identified with rheumatoid arthritis, meeting tables should be included but only to highlight
the diagnostic criteria established by the American the important points rather than repeating the
College of Rheumatology. Four of these seven cri- data already presented.
teria must be met for a patient to be classified as During research, often excessive amounts of
having rheumatoid arthritis: (1) morning stiffness, data are collected. It is vital that the author is
(2) arthritis of three or more joint areas, (3) arthri- selective in the data that is presented. It must
tis of hand joints, (4) symmetric arthritis, (5) rheu- show the major findings of study but should dis-
matoid nodules, (6) serum rheumatoid factor and card any excessive data that does not add to the
(7) radiographic changes. quality of the paper. Excessive data can be con-
Common problems with the method section fusing and distracting for the reader, taking away
include too long, too vague and not descriptive from the important findings. Do not mistake this
enough. Fluency of the text can be aided through for falsification or manipulation of data which is
consistency in the point of view in which it is unethical [6]. Rather you do not wish to present
written: first person “we” or passive voice. If results that will not add to the current knowledge,
writing in the first person, it can be easy to guide clinical practice or are inconsequential.
become repetitive in beginning sentences with Avoid repetition in reporting results.
“we did…” [13]. When revising the draft, ensure The perception of complications is different
that there is variation in the beginnings of the amongst surgeons, in part due to the impression
sentences [13]. In writing the paper, the methods that the treatment has failed but also due to the
are often presented directly from the research legal consequences in some jurisdictions. Good
protocol. This results in the methods being pre- clinical practice means that complications or
sented in the future tense yet the paper is being adverse events during a study should be reported.
written on completion of the study and needs to This enables information upon which shared
be written in the past tense. decision-making can occur between patient and
Approval from the appropriate institutional surgeon. Reporting of complications can be a
review board or ethical committee is an impor- vital element in altering techniques or implant
tant check on the validity of the research design. choices. It is imperative that complications are
In the methods section, ethical board approval reported in an unbiased manner.
should be documented. In accordance with the good clinical practice
guidelines of the International Conference on
Harmonization, a serious adverse event is clearly
25.8 Results defined as any untoward medical occurrence that:

The results should be presented in a logical and • Results in death


orderly sequence. Initially, information about the • Is life-threatening
study population and preoperative data should be • Requires inpatient hospitalization or prolon-
presented. Operative findings and post-operative gation of existing hospitalization
240 B. Coleman

• Results in persistent or significant disability/ erature but should compare and contrast your
incapacity findings with those of other published studies,
• Necessitates medical or surgical intervention explaining any differences to previous research
to prevent permanent impairment to a body [4]. An explanation of the meaning and impor-
structure or a body function tance of the findings of your study should be
• Leads to foetal distress, foetal death or con- presented. Consideration should also be given
genital abnormality or birth defect to alternative explanations for the results, par-
ticularly in relation to any unexpected or differ-
Complications can be classified into treatment ent findings. Although the importance of the
related or patient related. Treatment related can major findings of the study may be clear to you,
be due to surgical technique or a device or treat- it may not be obvious to the reader [6].
ment. Patient related can be due to a local tissue Providing context of your findings in the wider
condition or to the overall patient condition. knowledge of the subject gives direction to
A statistically significant difference between future research.
treatment groups shows your study was powered The third part of the discussion is to address
sufficiently to determine a difference. However a the strengths and weaknesses of your study. It
statistical difference does not necessarily corre- is especially important to address the weak-
late to a clinical difference. For example, a statis- nesses of your study as the reviewer and read-
tically significant difference of three points on ers will be aware of the limitations. You should
the Constant Shoulder score does not equate to a provide an honest critique of your paper’s limi-
clinically relevant difference to the patients. tations, addressing the potential impact this
Overlapping confidence intervals between groups has on your study’s results and the relevance of
indicate there is no clinically significant the findings. Addressing potential issues with
difference. your paper will save you from comments from
reviewers and leading to a negative impression
of your paper by the reviewer or editor. It pres-
25.9 Discussion ents you as a thoughtful scientist. Where weak-
nesses exist, provide solutions or alternative
The purpose of the discussion is to place the find- explanations if possible. Discussing the limita-
ings of your research into context with the wider tions can create interest and pose further ques-
knowledge of the subject. Due to the differing tions pointing to opportunities for future
nature of studies, the discussion section will vary research.
in its length and composition although the gen- In discussing the limitations of your study, it is
eral structure remains similar. The structure of important to recognize any bias within the study.
the discussion should start by recording the major Bias occurs in all scientific papers to some
findings of the study. The major findings should degree, and it is difficult to eliminate all bias.
include the results of the hypothesis outlined in Selection bias occurs when the treatment groups
the introduction. The findings should be the mes- are dissimilar. Prospective randomization mini-
sage that you want the readers to take away from mizes selection bias as do strict inclusion and
the paper—do not repeat every finding of the exclusion criteria. Performance bias occurs due
results section. to the person performing the study or treatment.
The second part presents a summary of the An example is a single surgeon study introduces
current literature. This mirrors the introduction performance bias by the subtleties of their tech-
but should not be repetitive. In the discussion nique and experience that may not be reproduc-
section, the current literature expands on the ible to other surgeons. It is impossible to eliminate
major findings of the current study and helps performance bias, as someone has to perform the
place it within the wider body of knowledge on research. Recording bias occurs if the data col-
the topic. It is not a complete review of the lit- lector does not exhibit objectivity. This can be
25  How to Write a Clinical Paper 241

Table 25.2  Common mistakes in writing manuscripts 25.11 Manuscript Submission


Introduction and discussion too long
Lack of coherence and fluency in text Once you have completed your manuscript, put it
Overly long review of literature aside for a week then reread, checking thoroughly
Incorrect tense of methods
tables and figures for accuracy and spellcheck.
Lack of approval from institutional review board
Incomplete data
Then share it with your co-authors for comment
Not including detailed description of statistical and amend the paper as needed. Once you have
methods chosen the journal, check the journal submission
Poor quality figures, graphs and photos instructions then submit and wait. Very few
Not discussing limitations papers are accepted immediately, and journals
Concluding results beyond the study design expect resubmission after corrections. Reviewers
will comment on how the paper could be
minimized with patient-reported outcomes or improved to allow publication. Revision of the
having the collector independent and blinded to paper taking into account the reviewers’ com-
the treatment received. Reporting bias considers ments and performed in a timely manner will
how the outcomes of the study are reported. enhance the chances of publication. If the paper
Using internationally recognized and validated is not suitable for publication, it will be made
outcome scores allowing comparison with other clear by the editor and reviewers. If the paper is
studies can minimize this [7]. rejected, then it should be revised taking advan-
The final section of the discussion involves a tage of the reviewers’ feedback and then submit-
conclusion or the major take-home messages ted to another journal. Achieving a publication is
from your study. This should reiterate the answer hard work, but it is rewarding to see your manu-
to your research question and add the implica- script in print (Table 25.3).
tions, practical application or recommendations
from the study. The conclusion should only fea-
ture statistically significant findings although 25.12 Summary
trends seen in the study can be reported in the
discussion section (Table 25.2). Writing a manuscript for publication is a demand-
ing task but, when successful, is a rewarding
experience. The topic should be thought-­
25.10 References provoking, sparking the interest of both the writer
and reader. A winning paper starts with excellent
Journals can differ in the style that references are preparation in determining the method and statis-
recorded, so read the author guidelines of the tical analysis of the study. Writing should be
journal you plan to submit to prior to writing the
reference section. The standard is 20–30 refer-
Table 25.3  Take-home points
ences for a scientific paper. The references should
Choose an interesting research topic
include the high-quality papers relevant to the
Structure your manuscript in the IMRAD method
topic and avoid low-quality papers if possible. Write in a clear, concise manner conveying the
Cite your own paper if required, but unnecessar- important information
ily quoting your own papers in an attempt to Revise to eradicate errors and repetition
improve citations detracts the quality of your The abstract should stimulate the reader to read the
paper. There are a number of electronic referenc- paper
Explain what your research adds to the current
ing tools available, and it is recommended that knowledge
you utilize one of these tools during your manu- Discuss the limitations of your research to encourage
script writing. Examples of free bibliography further research
managers online include Mendeley, Zotero and Persevere until your manuscript is accepted for
Citation Machine. publication
242 B. Coleman

clear, focused and organized to allow the reader 3. Earnshaw JJ. How to write a clinical paper for publi-
cation. Surgery. 2012;30(9):437–41.
to gather the key points of your research that are 4. Faber J. Writing scientific manuscripts: most common
relevant to the reader. The IMRAD style provides mistakes. Dental Press J Orthod. 2017;22(5):113–7.
a format for the writer to follow. Ensure that dis- 5. Faigley L, Witte SP. Analyzing revision. Coll Compos
cussion around the topic is relevant to the hypoth- Commun. 1981;32(4):400–14.
6. Kallestinova ED.  How to write your first research
esis and findings of your study. Be sure that the paper. Yale J Biol Med. 2011;84(3):181–90.
conclusions of your paper match the results of 7. Karlson J, Marx RG, Nakamura N, Bhandari M, edi-
your study. Stay disciplined, work hard and you tors. A practical guide to research: design, execution
will be able to write a winning paper. and publication. Art Ther. 2011;27(Suppl 2):1–112.
8. Silvia PJ.  How to write a lot. Washington, DC:
American Psychological Association; 2007.
9. Swales JM, Feak CB. Academic writing for graduate
students. 2nd ed. Ann Arbor: University of Michigan
References Press; 2004.
10. Whitesides GM.  Writing a paper. Adv Mater.

1. Andrade C.  How to write a good abstract for a sci- 2004;16(15):1375–7.
entific paper or conference presentation. Indian J 11. www.prisma-statement.org.
Psychiatry. 2011;53(2):172–5. 12. www.strobe-statement.org.
2. Belcher WL. Writing your journal article in 12 weeks: 13. Zeiger M.  Essentials of writing biomedical research
a guide to academic publishing success. Thousand papers. 2nd ed. San Francisco: McGraw-Hill
Oaks: SAGE Publications; 2009. Companies, Inc; 2000.
How to Write a Book Chapter
26
Thomas R. Pfeiffer and Daniel Guenther

26.1 Introduction 26.2 P


 rior to Writing or “Let’s Get
Started”
The invitation to contribute a chapter to a book
can be perceived as an honor and appreciation of Organize yourself! Screen the letter of invitation
expertise. Most often, the invitation is based on for:
prior scientific and clinical commitment. Writing
a book chapter offers the opportunity to collabo- • Title and topic of the book and your assigned
rate with colleagues, develop new or intensify chapter
existing connections, and spread expertise and • Co-authors
ideas. A good book chapter fulfills the expecta- • Timetable and deadlines
tions and demands of the reader, is detailed but • Rough estimation of expected workload
not lengthy, and fits into the context of the entire
book. Especially at the beginning of a scientist’s Before you start writing, ask yourself: Do you
career, guidelines on how to fulfill the editor’s feel comfortable with the given topic and is the
expectations and how to catch the reader’s inter- expected workload manageable under given cir-
est are extremely helpful. This chapter aims to cumstances? Failing to meet a deadline and
guide clinicians and researchers through the pro- repeated requests of extensions beyond the dead-
cess of composing a book chapter successfully. line may damage your reputation, making future
invitations and collaborations extremely unlikely.
When the decision is made to accept the invita-
tion, further information is shared by the editor
and a more detailed screening of the author’s
guidelines should be executed [1]. This screening
should address the following points (Table 26.1):
T. R. Pfeiffer (*)
Department of Orthopaedic Surgery, Trauma Surgery • Target readership
and Sports Medicine, Cologne Merheim Medical
Center, Witten/Herdecke University, • Style of the book
Cologne, Germany • Outline of the book including all chapters
e-mail: t.pfeiffer@email.de
D. Guenther Books vary in style based on the target reader-
Trauma Department, Hannover Medical School ship. Writing style should be adapted to their
(MHH), Hannover, Germany level of knowledge. Examples for different book
e-mail: guenther.daniel@mh-hannover.de

© ISAKOS 2019 243


V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_26
244 T. R. Pfeiffer and D. Guenther

Table 26.1  Important information to consider prior to Identify the leading author of the chapter and the
writing of a book chapter
role of the co-authors. The leading author should
Target • Students guide the group through the writing process.
readership • Residents
• Fellows
Summarize the expectations of your co-authors
• Consultants with appropriate time lines. All authors should be
• Researchers on the same page and should be updated regu-
• Patients larly through the writing process. At this point
Book style • Primer/student teaching textbook
two approaches are feasible:
• Specialist book
• Surgical atlas
Book • Chapter with subheadings to represent a 1. “The chain letter approach”: One author

outline group of themes (e.g., The Anterior writes a first draft of the chapter. This draft is
Cruciate Ligament (ACL)) forwarded to the next author, who adds con-
• Chapter about a special aspect of a
topic (e.g., the double bundle technique tent, revises the draft, and forwards the draft
of ACL reconstruction) to another author. The order of authors can be
adapted to experience and seniority. This
reduces the number of turnarounds and saves
styles in the field of orthopedic surgery and sports valuable time.
medicine are, but not limited to: 2. “The melting pot approach”: The work of

writing is equally divided through the authors.
1. Primer/student teaching textbook: A compen- The leading author should incorporate the dif-
dium of the most important facts and related ferent parts to the chapter and adapt the style
clinical knowledge on the teaching topic is to ensure the reading flow.
requested. The book should focus on the most
important essentials. A short repetition of rel- No matter which approach you choose, the
evant anatomy and physiology is helpful. final version should be double-checked and
2. Specialist book: Specialist books provide in-­ approved by all authors.
depth information on a certain topic. Based on Talk about authorship early on in the process.
most important current literature, state-of-the-­ Define who should be the first, second, or senior
art diagnostic tools or treatment methods are author. Sometimes, selecting the order can be
presented. A trend of increasing specialization uncomfortable. However, it will be even more
can be observed over the last decade. While uncomfortable if discussion about authorship
years ago, books with titles like Sports starts after all the work is done.
Medicine were published, nowadays more and Define deadlines for yourself and your co-­
more books with titles like The Pediatric authors. Missing deadlines will result in a rush
Anterior Cruciate Ligament are published. prior to submission, which can ultimately lead to
3. Surgical atlas: In the field of clinical orthope- decreased quality of the chapter.
dics, surgical textbooks with step-by-step The author has to keep in mind that the book
instructions are a popular format. High-­quality will compete on the market against comparable
images with virtual explanations, pitfalls, tips, titles. Knowledge about what has already been
tricks, and solutions from daily practice are published and what is lacking in current literature
requested. is inevitable. This knowledge can be achieved by
reviewing published articles in journals and text-
It is helpful to know the book’s outline and the books, scientific exhibits at conferences, and
other chapters of the book to prevent providing online versions. A good screening tool is PubMed
the same information twice. (www.pubmed.gov) provided by the US National
As soon as you informed yourself about the Library of Medicine and the National Institutes
frame conditions, contact your co-authors. of Health.
26  How to Write a Book Chapter 245

Fact Box 26.1 Fact Box 26.2


1. Contact your co-authors! Identify the
1. Come up with an outline of the book
leading author of the chapter and the chapter before starting the writing
role of the co-authors. process.
2. Talk about authorship! Define who
2. Outline can be elaborated as bullet

should be the first, second, or senior point, catch phrases, or flowcharts.
author. 3. Involve all authors to make sure that
3. Define deadlines for yourself and your everybody is on the same page.
co-authors. 4. Use the framework of the composed

4. Know what has already been published outline to add text, piece by piece.
and what is lacking in current literature.

26.3 H
 ow to Fill the First Page(s) 26.4 T
 he Writing Process or “How
or “The First Step Is Always to Get the Job Done”
the Hardest”
A good balance between theory and practice is
The beginning of the writing process often poses a key to an interesting and successful book chapter.
challenge and contains sticking points that young Chapters consisting of mostly theoretical topics
writers should be aware of. Frequently, one is should contain implications for daily practice,
tempted to write the first lines of a book chapter tips, tricks, or annotations. In contrast to a
intuitively or based on personal strengths and descriptive review, a book chapter does not nec-
interests. However, this approach involves the risk essarily follow the regular outline of a journal’s
of losing the common thread. It is essential to article. Usually, there is no material and methods
come up with an outline of the book chapter before or results section [2]. Further, extensive statistical
starting the writing process. Structuring should not analysis to support the presented message is not
be limited to subchapters and paragraphs, but the needed. In general, at the end of a chapter, main
structure of each paragraph and the line of argu- findings, conclusions, and consequences for clin-
ments should be drafted in the beginning. Based ical practice should be presented.
on personal preferences, the outline can be elabo- The production of a book may take 2 or more
rated as bullet points, catch phrases, or flowcharts. years, and the reader wants a long-lasting “prod-
It is helpful to have all authors involved in the pro- uct”; therefore, only a selection of the most sig-
cess of tailoring the line of arguments in order to nificant research performed in the past decade
maximize the clinical experience and to include a should be included. In contrast to a scientific
broader spectrum of opinions. All authors will journal article, expert opinions and clinical-based
benefit from brainstorming, and everybody will be practices, as opposed to scientific or evidence-­
on the same page. This way, the structure of the based practices, should be discussed. A book
book chapter will be maintained even if sections chapter does not require a full and complete rep-
are written by different authors. If you do not resentation of the existing literature [3].
approach your book chapter as a team, it is useful Keep the terminology consistent to increase
to involve other clinicians to make sure you do not readability. In general, present tense and third
get off track. person are used but can be altered in special
Once the outline of the book chapter is set, needs. There is no need for alteration or use of
each aspect of the outline can be addressed by synonyms for medical terms. A book chapter is
adding the corresponding information in the still a scientific manuscript. Authors should avoid
desired number of sentences. wordy sentences and relative clauses.
246 T. R. Pfeiffer and D. Guenther

Usually, write the conclusion at the very end. sponding author are responsible for keeping track
It should include the most important take-home of all changes and staying organized.
message of the article. Then, write the abstract. Read the final version of the book chapters
Afterward, double-check the title of the chapter. multiple times before finalizing it. If the book
Does it still match the content of the chapter? chapter is not in your first language, a revision by
Sometimes during the writing process, the central a native speaker is highly recommended to ensure
theme of the chapter changes stepwise, and at the a professional and scientific writing.
very end, rewording of some initial parts is
necessary.
Keep track of your references during writing. Fact Box 26.3
Sample reference programs are EndNote 1. Chapters consisting of mostly theoreti-
(Clarivate Analytics, Philadelphia, USA), Papers cal topics should contain implications
(ReadCube, Labtiva Inc., Cambridge, USA), or for daily practice, tips, tricks, or
RefWorks (ProQuest LLC, Ann Arbor, USA). annotations.
Travel libraries can enable the different authors 2. A selection of the most significant

to write on the same reference list. research performed in the past decade
Do not underestimate the impact of well-­ should be included.
chosen figures. A good text is accentuated by the- 3. Keep the terminology consistent.
matically suitable figures. High-quality figures 4. At the very end, rewording of some ini-
attract the reader’s attention. In addition, they are tial parts is necessary.
a valuable tool to emphasize the take-home mes- 5. Keep track of your references during
sage. Especially for the purpose of presenting writing.
diagnostic and therapeutic algorithms, flowcharts 6. High-quality figures attract the reader’s
and diagrams are useful to transfer theoretical attention.
knowledge to clinical work. Make sure that you
provide figures with sufficient resolution and
appropriate text reference. The related legend
must explain the figure sufficiently. The reader 26.5 Technical Considerations
should not have to return to the main text to find
explanations. All abbreviations used in the figure Important technical points to consider are:
must be defined in the legend.
Clear and consistent naming is needed to keep • Word count
track of drafts. The naming should include year, • Number and requirements of tables, figures,
month, day, running title, name, and version (e.g., and references
2017-11-11-ISAKOSResearchBook-PfeifferV1. • If applicable, online version/videos
docx). Each new draft should be saved. This • Author instructions
enables authors to backtrack the development of
the manuscript. In addition, make sure to save Most book chapters contain approximately
your work regularly during the writing process to 10–15 manuscript pages. A manuscript page con-
prevent losing data. tains about 4500 characters. Editor and publisher
Dropbox (Dropbox Inc., San Francisco, USA) plan total number of pages and assign different
can be used to share your drafts. Another option number of pages to the authors. This needs to be
is to work in a cloud. Google Cloud (Google strictly adhered to. Double spaced, font size 12,
LLC, Mountain View, USA) or iCloud (Apple and Times New Roman or Arial are regularly
Inc., Cupertino, USA) provide options to work requested styles. Do not spend time on excessive
online on one version with multiple authors. formatting as this will be performed by the pub-
However, keep in mind that you as the corre- lisher [4].
26  How to Write a Book Chapter 247

Tables and figures are usually saved as sepa- • Keep in mind that a book should be a long-
rate files. The technical requirements of figures lasting product and competes on the market
differ distinctly between different publishers. against comparable products.
Generally, a resolution greater than 300 dpi for • In general, a reader is attracted by scientifi-
figures is requested. As aforementioned, high-­ cally written text with consistent terminology
quality figures enhance the quality of a that is supported by well-chosen, meaningful
manuscript. figures.
Before final submission, all technical • A coherent train of thoughts is key to a suc-
requirements should be double-checked. When cessful book chapter.
the editor requests revisions and makes sug- • Chapters consisting of mostly theoretical top-
gestions to improve the draft, all changes ics should contain implications for daily prac-
should be highlighted. Proofreading is the tice, tips, tricks, or annotations.
final step of the writing process. The manu- • Finally, a printed version of the book will
script should be checked for misspellings and reward authors for the sometimes intense, but
proper grammar. Additionally, particular always interesting, work of chapter
attention should be turned to authors’ names preparation.
and institutions.

Take-Home Message References


• Writing a chapter is a team effort.
• However, the lead author, who must be defined 1. Lewerich B, Götze D.  What to do when you are
asked to write a chapter. In: Troidl H, McKneally MF,
prior to writing, needs to organize the group, Mulder DS, Wechsler AS, McPeek B, Spitzer WO,
keep everybody on the same page, and ensure editors. Surgical research. New York: Springer; 1998.
the flow of the chapter. 2. Kendirci M.  How to write a medical book chapter?
• Knowledge about what has already been pub- Turk J Urol. 2013;39(Suppl 1):37–40. https://doi.
org/10.5152/tud.2013.052.
lished and what is lacking in current literature 3. Woodrow L.  Publishing research: book chapters and
is imperative. books. In: Writing about quantitative research in
• It is important to come up with an outline of applied linguistics. London: Palgrave Macmillan;
the book chapter before starting the writing 2014.
4. Skipper T. Writing an effective book chapter. A guide
process. for authors working with the National Resource
• Then, the framework of the composed outline Center for The First-Year Experience & Students in
can be used to add text, piece by piece. Transition; 2011.
How to Write a Winning Clinical
Research Proposal?
27
Christian Lattermann and Janey D. Whalen

27.1 Introduction 27.2 Section 1

In the modern clinical research environment, it 27.2.1 Funding Mechanism


has become a necessity to communicate effi-
ciently, clearly and with a certain amount of
enthusiasm in order not to lose the audience, be • The introduction to any scientific research
that patients, fellow researchers or a captive stu- proposal should be targeted to the actual
dent audience. It is imperative that a research funding mechanism.
proposal, grant application or scientific paper tell
a captivating story, containing the traditional ele- Funding mechanism generally provides a clear
ments of research rationale, hypothesis, study description of what the funding entity is looking
design, material, methods, etc. Hence, the ability for. This information is often contained in the
to package a cut and dry research proposal into a request for application (RFA) but sometimes must
captivating proposal is arguably the single most be sought out specifically on specific websites,
important and equally so underemphasized skill locations or mission statements. While the lan-
during scientific training in the medical guage used in these funding announcements can
profession. often be used more efficiently to put your infant to
This chapter will attempt to explain and illus- sleep, it is important to assure that your research
trate how this can be achieved by dividing the proposal carries a laser focus on the description of
chapter into a more narrative general section and the funding mechanism. Elements such as target
a rather dry, to the point second section. population, mechanism, disease, burden to soci-
While this is a somewhat “different than ety or sometimes even materials and methods
usual” approach, it serves a distinct purpose. may be very specifically defined inside the request
Both parts are necessary to create a scientifically for proposals (RFP). Often the request for appli-
relevant, valid and interesting proposal. cation will use specific language, phrases or terms
that offer themselves to be repeated in your pro-
posal. Identifying these phrases and using them
C. Lattermann (*) frequently will assure that there is no doubt about
Brigham and Women’s Hospital, Harvard Medical the relevance of your research proposal to the
School, Boston, MA, USA identified funding mechanism. (e.g. the grant
e-mail: clattermann@bwh.harvard.edu mechanism is interested in funding proposals that
J. D. Whalen aid the “translation” of scientific data into the
Icartilage.com, Boston, MA, USA

© ISAKOS 2019 249


V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_27
250 C. Lattermann and J. D. Whalen

clinical realm; the key word is “translation”. This sometime less, and sometimes they can be a quad
word should be identified and clearly emphasized chart only. While the actual writing process for
in the research description.) those kinds of proposals may look easy, you may
find out quickly that a shorter proposal is not
• The science is meritorious; the story, however, necessarily easier to write. In fact, the old saying
needs to be flexible. by Blaise Pascal, “if I had more time, I would
have written a shorter letter”, holds true also for
The scientific question that you are asking grant proposals [1].
may not be completely aligned with the question It is hard to suggest an ideal amount of time
that is being asked by the RFP.  However, most for a proposal, as this always depends on how
research proposals ask more than one scientific much previous work you have done and how
question, i.e. there is more than one specific aim. much pre-existing language and sections you
If an RFP asks for “injury prevention techniques already may have in your portfolio. If you are
in female football players” and your research writing your first proposal in your specialty, give
proposal is targeting all football players, it is easy yourself a minimum of 2–3 months for a small
to include a specific aim that specifically identi- grant funding mechanism. Larger foundation
fies factors for female soccer players rather than grants and NIH-style grant mechanisms or coop-
the entire population. Data collection may still be erative grant mechanisms asking for 6 or 7$ fig-
carried out on the entire population, but the pro- ure grant support over multiple years require
posal will seek funding for just a subgroup of sometimes several years for a team of writers to
your overall scientific aim and is focused on the prepare a competitive proposal. A good rule of
RFP.  This flexibility may help you to broaden thumb is to identify the shortest amount of time
your relevance and impact. you think it will take you to write a complete
first draft and double that number. It should also
• Read the RFP carefully and with attention to be kept in mind that a good and winning pro-
detail. posal requires review by yourself as well as by
peers. This process may take 4–6 weeks. One
Every RFP is written differently and asks for strategy may be to push as hard as possible
different elements, page lengths, addenda and pre- towards a deadline, intentionally miss the dead-
requisites. Some require ethics committee propos- line in the last minute and then spend the next
als to be submitted prior to submission; some funding period fine-tuning and polishing the pro-
require investigators to be tied to a university; some posal for a submission to the next RFP cycle.
require “citizenship” (in the USA); some require There is no shame in not submitting a proposal
certain membership status (foundation grants); and to a first deadline. Resubmission into the next
some do not. It is important to read these descrip- cycle is usually possible, and a well-written pro-
tions carefully prior to embarking with a proposal. posal that has been reviewed by peers and pol-
Writing a successful grant proposal is hard work, ished is a win-­win for everybody even if it gets
and the last thing you want is to find out in the 11th rejected. It can serve as a starting point for a new
hour that you are not eligible. proposal or may be adapted into a paper or pre-
sentation. An unfinished proposal is a waste of
• Look at the deadlines and be realistic, and time, as it will not likely be funded or serve
create a master timetable. future proposals.
To track your timelines, it is advisable to cre-
Generally, smaller grant funding mechanisms ate a master timetable with soft deadlines for cer-
will ask for short, brief proposals, 1–2 pages, tain sections.
27  How to Write a Winning Clinical Research Proposal? 251

rationale from there. Then expand into the inno-


Fact Box 27.1 vation and relevance section. Once those three
A good rule of thumb is to identify the sections are written, bits and pieces of those sec-
shortest amount of time you think it will tions can be combined and rearranged to write
take you to write a complete first draft and the introductory section and an abbreviated ratio-
double that number. It should also be kept nale of the specific aims page. Hence, the specific
in mind that a good and winning proposal aims page is usually finished after the other sec-
requires review by yourself as well as by tions have been written and requires the largest
peers. An unfinished proposal is a waste of time commitment.
time, as it will not likely be funded or serve
future proposals. • Do your prep work.

Several aspects of a proposal are addressed


27.2.1.1 H  ow to Approach the Actual under what one can consider “prep work”.After
Preparation of the Proposal you have gathered your thoughts, you should try
• Learn to work in sections. No proposal gets to put those together in a brief chart or text. This
written in a day. can then serve as the famous “elevator pitch”.
The scientific equivalent to this is the “napkin
Grant sections can generally be divided into sketch”, a brilliant way to scrutinize your idea
the key elements containing the rationale, spe- and discuss it with peers and team members over
cific aims, statements about innovation and rele- a lab meeting. During my basic science training
vance and study design. Another essential and in the Ferguson lab in Pittsburgh under Chris
often more voluminous part is comprised by the Evans, we had an unwritten rule. If you could
“technical” pages. sketch out your project on a napkin during the
While it is important to have a clear research Friday afternoon lab meeting and convince your
question, hypothesis and specific aims, it is often peers of the value of the project, you would get
the technical pages that will cause the biggest the go ahead to pursue this project. While this is
problems when time is short. These technical a light-hearted version of the basic idea, it illus-
pages (i.e. standard operating procedures, inclu- trates a fundamental concept. A brief and to the
sion of women and children statements, etc.) are point illustration with few words requires careful
short, usually one or two paragraph statements, thought and a clear understanding of the subject
for which you or your institution may already matter. This exercise helps to focus any research
have pre-existing language. This language may idea. Some grant proposals require this type of
just need to be adapted to a specific proposal. sketch in the form of a “quad chart”.
These “technical” pages do not require ingenuity Based upon the “napkin sketch” or rough out-
but nevertheless need to be prepared. This is line of the project, an in-depth actual research
work that can also be delegated, or, if not, then protocol should be developed. While it is impor-
this is the type of work that may be done in-­ tant not to forget about the other parts of the grant
between surgical cases, before or after a clinic or proposal, this protocol is critical, as it may influ-
lectures or on an airplane or in an airport lounge. ence several other sections such as the biostatis-
tics section and feasibility of the project and may
• Have a clear view of the science, relevance, identify numerous hurdles that need to be
innovation and study design before you start addressed and that may subsequently alter the
writing the specific aims page. actual proposal itself.
Another helpful concept is to create a separate
It is advisable to work first on the hypothesis file for conceptual lines of reasoning, explanations,
and the specific aims and develop the scientific supportive literature, etc. for the main sections.
252 C. Lattermann and J. D. Whalen

These can be addressing significance, relevance, of argumentation. Amongst reviewers, there is a


economics, innovation, future directions, etc. Every “significant and relevant” (!) amount of fatigue
investigator constantly reads literature about the that sets in when proposals use the same few
specific topic of interest, and even though access is phrases repeatedly to justify their relevance (i.e.
almost unlimited, many times great ideas to justify “Over 200,000 ACL reconstruction are being per-
a hypothesis suddenly show up and disappear the formed in the United States annually”).
next day. Creating a small folder on the phone or the It can become quite comical from the perspec-
computer allows for a quick repository of these tive of a reviewer when most of proposals have
ideas that can be recalled when needed. the same sentence in the relevance statement and
Involvement of a trained biostatistician from then even cite the same paper. The one proposal
the beginning is a critical step for all clinical pro- that will introduce a different approach becomes
posals. Even for animal-based basic science instantly attractive. Hence, the significance/rele-
research, it has become mandatory in many insti- vance statement often benefits from the author
tutions. The estimated sample size, power spending some time researching lesser known but
­calculations and potential study design consider- relevant information obtained from epidemiology
ations should be made early on to design the pro- papers on the topic. This information can often be
posal correctly. It is easier to do this upfront found in lesser known, non-orthopaedic sources
rather than at the end, as the study design affects such as the Centers for Disease Control and
the actual specific aims, time lines as well as Prevention (CDC), governmental white papers or
budgeting. scientific articles published by authors from a dif-
Research your internal submission deadlines ferent profession, who may have a different per-
and include those on your master timetable. Most spective of the problem. Try to address the actual
institutions require you to run any grant submis- “knowledge gap” and previous failures to solve
sion to be run through a research office. Each the problem and/or new aspects that highlight the
office has different time requirements to approve importance of your topic or line of research.
a grant project. Sometimes there may even be
internal institutional decisions about eligibility
for submission that may exclude an investigator 27.3.2 Innovation
from a grant submission upfront. Hence, it is
important to inform your institution about a grant This may be one of the most important sections
submission early on and start the internal process of the grant proposal, as many grants are seeking
early. to fund “highly innovative” proposals. How does
one portrait a proposal as “highly innovative”?
While there is no gold standard of innovation,
27.3 Section 2 it is worth pointing out several commonly seen
arguments towards innovation of a project that
The second section of this guideline will address are usually not appreciated by a critical reviewer.
the subsections of a proposal and how to package I will list those, briefly and without judgement:
scientific information in a captivating and inter-
esting fashion. • Incremental changes on an existing measure-
ment technique or technology to measure an
already described and published event or
27.3.1 Significance/Relevance pathology
of Your Research • Being in possession of a tool that is looking
for an application
The significance and relevance statements come • Material testing of a new device or commer-
with substantial time requirements. It is not cially available proprietary technique
advisable to use well-known and worn-out lines • Retrospective data analysis
27  How to Write a Winning Clinical Research Proposal? 253

An innovative proposal instead should answer • Outcomes tools/instruments should be


the following questions: reported with their descriptions (i.e. literature
referencing their purpose, validation, MCID,
• Why are you or your team the group to do this MDC).
research? • Study time points and group determinations.
• What barriers can you break down that have • Study endpoints (a study endpoint is the most
not been broken down yet? important outcome criterion as success and
• Are you communicating, reaching across failure to answer your hypothesis are deter-
“silos” or have a unique team? mined by the study endpoint).
• Are there aspects of your proposal that can • Anticipated difficulties, failure mitigation
only be done in your environment? strategies and limitations of the study.
• Which scientific connections can you make in
this proposal that are truly unique?
27.3.4 Biostatistical Support Is
• Be careful with statements such as “we show
Important
for the first time” but rather point out the scar-
city of data and approaches to the problem
As mentioned above, it is important to obtain
that you are tackling.
input from a biostatistician early in your study
planning as it can affect technical aspects of your
study. Specifically, it is important to address the
27.3.3 Material/Methods/Study
following issues in your statistical plan and data
Design
management plan:
This is the most technical part of your proposal
• Power analysis
and should be written as such. Hence, there is no
• Randomization
need to “tell a story” but stay short, brief and to
• Bias
the point. In the following is a short bullet point
• Calculation/estimation of potential study
list of elements that should be include in your
“drop-outs” and estimation of patients “lost to
material/method section:
follow-up”
• Estimate or verify enrolment numbers (may
• Pre-studies to determine sample size, enrol-
require a pre-study to determine those numbers)
ment numbers and feasibility.
• Missing data management plan
• Brief mention of the labs’ or clinics’ capacity
• How to deal with outliers
(the main part of this information will be
• Specific interim analysis time points
under resources, but it is helpful to put it into
• Statistical analysis techniques that may influ-
perspective; e.g. in an ACL study, it is helpful
ence the study design format for clinical trials
to mention that patients are being recruited
(i.e. adaptive designs, umbrella trial platforms,
from a clinic that performs over 300 ACL sur-
etc.)
geries a year).
• Subject flow and experience (i.e. how are
patients being recruited? Is this part of an 27.3.5 References
ongoing registry or cohort recruitment effort?).
• Strategies to prevent drop-outs and patients Nowadays reference managing software is man-
that are lost to follow-up. datory for a scientific paper or proposal.
• Strategies to prevent bias. It is important to comply with the rules about
• For some grant mechanisms, it is important to references. Our group once submitted an NIH
create prospective enrolment charts to assure proposal with 16 references instead of 15 (as was
diversity. This can be included in the M/M called for in the request for proposals), and the
section. proposal was rejected.
254 C. Lattermann and J. D. Whalen

A few bullet points about this topic are as the topic or your proposal, that they did not like
follows: you or your team and that the critique is not fair.
A seasoned researcher will step away for a few
1 . Use a reference managing software. days and then come back to the letter and dissect
2. Be current (i.e. last 3–5 years unless citing a the critique carefully. It is important to under-
landmark paper that is relevant). stand that this critique has been written for two
3. Do not inundate the reviewer in unnecessary reasons. First, it is to point our scientific weak-
references. nesses in the proposal. These are usually fixable
problems and give you an opportunity to substan-
tially improve your proposal. Second, the review-
27.3.6 Not Funded and What Now?
ers are trying to guide you towards what they
expected and would like to see return to the study
Finally, a word about how to deal with the deci-
section or grant review committee. This insight is
sion of your proposal. There are two forms of
extremely valuable and allows the researcher to
rejection:
adjust a future proposal according to those rec-
1. Technical rejection for failure to comply with ommendations. In certain cases, the proposal was
submission guidelines, etc. identified as simply misguided for the mecha-
2. Failure to convince the reviewers of the value nism in which case this is good advice, as a
of your work resubmission will not likely be successful.

Make sure you keep things in perspective.


Remind yourself that the odds of getting funded Fact Box 27.2
are relatively slim. Funding lines may be as low There are two forms of rejection—techni-
as single digit percentiles, and you are competing cal and failure of relevance. Make sure you
against the best researchers in your field. keep things in perspective. Remind your-
The technical rejection is usually consequence self that the odds of getting funded are rela-
of not having read the instructions properly. You tively slim. Accept the rejection letter, read
will get a response quickly and typically prior to it, and put it aside. A seasoned researcher
the actual review process. This hurts but is pre- will step away for a few days and then
cisely the reason why we point out in the chapter come back to the letter and dissect the cri-
how important it is to read the instructions. tique carefully.
Depending on how competitive the mechanism
is, these technical rejections may happen easily.
Failure to get funded should be rather the Take-Home Message
norm and is not to be considered a failure of the • Write to a lay audience trained in the scientific
project or the author personally. Given the fact method but not in your specific scientific
that most of these funding cycles deal with hun- discipline.
dreds of proposals written by smart and well-­ • Be prepared and take your time.
trained individuals, it is unrealistic to think that • A successful proposal should often be revised
you will have a proposal accepted the first time prior to success.
around. While this may happen, more commonly, • Use all resources available and discuss your
a failure to get funded results in another set of proposal with colleagues.
excellent critiques.
Having gone through this process many times
myself, I would recommend accepting the rejec- Reference
tion letter, read it and, put it aside. These letters
1. Shapiro FR.  The Yale book of quotations, section:
always hurt and create frustration. It is common Blaise Pascal. New Haven: Yale University Press;
to think that the reviewer just did not understand 2006. p. 583.
How to Review a Clinical Research
Paper?
28
Neel K. Patel, Marco Yeung, Kanto Nagai,
and Volker Musahl

28.1 Introduction can adapt to specific papers and individualize


based on their particular preferences.
The ability to critically review a paper is a valu-
able skill to have, not only for the peer-review
process of a journal but also to be able to interpret 28.2 Assessment of Research
findings of a paper in the context of clinical prac- Aims and Purpose
tice. However, typically there is no formal train-
ing on how to review a paper throughout medical Generally, the introduction states the issues
school or residency. Thus, this chapter will serve related to the topic of the paper and formulates
to provide general guidelines to use when review- the rationale for the questions and hypotheses of
ing a paper. the study. The organization of the introduction
Each section of a paper provides important may differ depending on whether the paper is a
information regarding the purpose, study design, clinical report, a study of new scientific data, or
findings, or interpretation of the results. As a a description of a new method. Most studies are
reviewer, having a set of guidelines or questions published in order to (1) report entirely novel
in mind while reviewing each section can be use- findings, (2) confirm previously reported work
ful to ensure that nothing is missed in the process (i.e., case reports, small preliminary series) when
of the review. The questions to ask and the items such confirmation remains questionable, or (3)
to evaluate can vary depending on the study introduce or address controversies in the litera-
design or article type. This chapter will review ture when data and/or conclusions conflict [1].
what should be present in each section of a paper One of these three purposes generally should be
and what questions should be asked within each apparent in this section. Usually, the first para-
section. The main sections of a paper to be dis- graph introduces the general topic and/or prob-
cussed are the introduction, which provides the lem and suggests its importance. The second
aims of the study, the methods, the results, and (and third) paragraph provides the rationale for
the discussion, which gives an interpretation of each question or hypothesis, and a final para-
the results. The guidelines outlined in this chap- graph states the questions and hypotheses. The
ter will serve as a good foundation that reviewers rationale should be established by providing rep-
resentative literature and places the rationale of
N. K. Patel · M. Yeung · K. Nagai · V. Musahl (*) the work in the context of the current body of
Department of Orthopaedic Surgery, University of
Pittsburgh, Pittsburgh, PA, USA
e-mail: musahlv@upmc.edu

© ISAKOS 2019 255


V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_28
256 N. K. Patel et al.

literature. The reviewer should critically check study’s methods. The design of the study should
the following points: be clearly described and justified: Was the trial
carried out prospectively or retrospectively? Was
• Does the introduction provide the necessary the design a randomized trial, a case control
background; in other words, why is it neces- design, or a case series? Furthermore, any
sary to do this study? obtained institutional review board or ethics
• What is known/unknown about the topic? committee approvals should be declared if appli-
• Is the research question of interest? Is it origi- cable. Similarly, any study registries such as clin-
nal or merely a repetition of existing knowl- ical trial registrations or systematic review
edge? Does the manuscript itself have a new registration should be stated.
message? The eligibility criteria of the patients or sub-
• Is the purpose clear? Is the hypothesis given? jects in the study should be clearly delineated. All
Is there any clinical relevance in a basic sci- inclusion criteria such as patient demographic
ence study? factors, diagnoses, etc. should be specified.
• Can the introduction be shortened? More Inclusion criteria based on diagnoses should be
often than not the introduction can be short- defined by specific symptoms, objective clinical
ened without losing the main message. findings, or radiographic parameter. In a study
including patients with femoroacetabular
If you are not familiar with the topic of the impingement, the author’s definition for a posi-
paper, you should search the previous literature. tive diagnosis should be stated; for example,
This will provide you with information about patients were diagnosed with cam-type impinge-
what is known/unknown and what is controver- ment in the presence of hip/groin pain, positive
sial about to the topic of the paper. As reviewer it flexion, adduction, internal rotation impingement
is important to assure that pertinent and up-to-­ testing, and positive radiographic parameters
date literature is cited while at the same time such as alpha angle greater than 50°. Similarly,
respecting some of the classic literature. all relevant exclusion criteria such as age, comor-
Additionally, it is important here to assure that bidities, and concurrent diagnoses should be
the current study is truly novel and not a duplica- determined a priori and clearly stated.
tion of a previously published study. A study’s interventions must also be well
described to allow reproducibility. In the case of
a surgical study, a detailed, step-by-step descrip-
28.3 Reviewing the Methods tion of the surgical technique similar to a com-
of a Clinical Research Paper prehensive dictated operative report should be
provided. Furthermore, prior conservative man-
The methods section is perhaps one of the most agement, surgical indications and decision-­
important sections to assess when reviewing a making, postoperative rehabilitation, and
clinical research paper. Poor methodology can restrictions should be reported. Similarly, stud-
lead to suspect or flawed results and conclusions. ies with medical interventions must include
The fundamental principles of critically apprais- reporting of all dosages, frequency of therapy,
ing this section are ensuring transparency, clarity, and duration of therapy.
and reproducibility of the methods and maintain- The outcome measures used in the study
ing internal validity by minimizing any sources should be described by the authors, with clear
of bias in the study. delineation of the primary outcomes that the study
Authors should describe in detail and in a seeks to assess versus secondary outcomes.
clear and logical sequence how the study was Authors must not only explain what the outcomes
designed, executed, and analyzed. A reader are but how they are assessed, as well as the tim-
should be able to accurately reproduce the study ing at which they are assessed. If clinical outcome
based on the author’s description within the measure or scores are used, these measures should
28  How to Review a Clinical Research Paper? 257

be validated for the corresponding diagnoses and corrections should be performed to adjust for the
patient population. For example, in early hip multiple comparisons.
arthroscopy research, some ­patient-­reported out-
come (PRO) scores initially designed and vali-
dated for hip arthroplasty were used to assess 28.3.2 Study Design Specific
surgical outcomes in younger, hip arthroscopy Methodology
patients. With development of novel patient-­
reported outcome scores such as HAGOS (Hip Core principles in assessing study methodology
and Groin Outcome Score) or IHOT (International have been discussed above; however, there are
Hip Outcome Tool), these newer measures vali- different salient points to assess depending on the
dated for use in young to middle-aged adults with type and design of study being critically
hip and groin pain may be more appropriate for appraised. For example, when analyzing a ran-
use in the femoroacetabular impingement and hip domized controlled trial, it is important to scruti-
arthroscopy demographic [4, 7, 8]. nize allocation concealment and blinding,
whereas in a retrospective cohort study, this may
not be relevant. The following section will dis-
28.3.1 Statistical Analyses cuss important methodological considerations
specific to different study designs.
The assessment of appropriate statistical analy-
ses can often be overwhelming. The reporting of
descriptive statistics is important to consider 28.3.3 Randomized Controlled Trials
when reviewing a paper. Typically in studies that
have a large sample size without outliers, the The key feature of randomized controlled trials is
mean and standard deviation should be reported. that their purpose is to control bias through ran-
On the other hand, if the study sample size is domization and blinding. The difference between
small and there is outlier with the sample, a an effective randomized controlled trial and a
report of the median and range of results is pre- poorly executed one can be attributed to its meth-
ferred to provide the reader with a better under- odology in implementing these features. The exe-
standing of the sample. Calculations to determine cution of the randomization process should be
sample size should be included in relevant clini- clearly defined. The specific method used to gen-
cal trials to ensure the study is adequately pow- erate the randomization sequence should be dis-
ered. The authors should disclose the statistical cussed, as well as the type of randomization and
software that was used for analyses. The type of relevant methodological details such as blocking.
tests used for statistical analyses for each com- The method of allocation concealment is an
parison should be discussed, and the reviewer important factor to consider when appraising the
should determine if these are appropriate for the methods of a randomized controlled trial. Many
type of variables analyzed in the study. Analyses strategies of allocation concealment can be
of continuous variables should be performed fraught with potential for selection bias. Sealed
using the appropriate tests such as student’s envelopes with randomization data within it can
t-test, whereas analyses of parametric variables be subverted by investigators by holding it to the
should be performed using tests such as the Χ2 light to look through it or even by discretely
test or Fisher’s exact test. Studies comparing the opening the envelopes. Using seemingly unbi-
means of more than two groups should use anal- ased determinants like randomizing chart num-
ysis of variance (ANOVA) testing if comparing bers can cause bias if investigators, knowing
one independent variable and two-way ANOVA their treatment allocation, choose to include or
or multivariate ANOVA testing for multiple exclude patients based on their perceived out-
independent variables. If multiple secondary comes. The literature cites many ways investiga-
comparisons are made, appropriate Bonferroni tors can decipher the randomization results if
258 N. K. Patel et al.

inadequate allocation concealment strategies are and Meta-Analyses (PRISMA) guidelines also
used [5]. Centralized randomization is the gold recommend a flow diagram to summarize the
standard method of allocation concealment. In inclusion and exclusion process, including the
this method, the central office is called to estab- number of studies included and excluded at each
lish study eligibility, at which time the random- step of the review and the reasons for such. The
ization allocation for the included patient is review process to screen studies for inclusion can
given to the investigator. include a title and abstract review, followed by a
Blinding is an important process to protect mandatory full-­text screening. This process should
randomization after the assignments to interven- be done in duplicate (by two independent review-
tions has been made and is another consideration ers) in order to reduce errors of excluding studies
in assessing a randomized controlled trial. that in fact meet the study criteria. Intra-reviewer
Blinding of subjects to their assigned interven- agreement statistics should be divulged to the
tion can remove any bias of preconceived notions reader to disclose how often consensus or dis-
or subject behavior that could affect the results of agreements occurred, and subsequently, methods
the study. Blinding can be extended to the care of resolving any conflict surrounding inclusion/
providers, outcome assessors, and other study exclusion should be reported. Any algorithms or
personnel to further reduce bias. The paper attempts to identify potentially overlapping study
should explain which parties were blinded fol- data should be discussed.
lowing assignment to intervention and how this Data extraction preferably should be per-
was managed and maintained. However, one formed in duplicate to minimize error. Any
must consider that sometimes it is near impossi- attempts to obtain data from original authors not
ble to blind certain parties such as blinding a sur- published in the included studies should be
geon to the intervention they are performing—a reported. All data points or variables that authors
characteristic that is unique to surgical studies. sought to extract from the included studies should
The CONSORT (Consolidated Standards of be stated, even if they were not available or pres-
Reporting Trials) guidelines and checklist are an ent in all studies.
invaluable resource to direct critical appraisal of Risk of bias and study quality of the included
randomized controlled trials (http://www.con- studies should be assessed by the authors. This
sort-statement.org/consort-2010) [6]. can be performed using various tools, scales, and
checklists. A commonly used tool is the Cochrane
Risk of Bias tool, which assesses various domains
28.3.4 Systematic Reviews of possible bias in the included studies, including
selection bias, performance bias, detection bias,
The search strategy is an important key to repro- attrition bias, reporting bias, and other biases [2].
ducibility of the systematic review. The strategy If a meta-analysis is performed, the choice of
can be quite complex in order for the authors to the summary measure (e.g., relative risk) should
thoroughly, yet efficiently identify all relevant be stated and justified. The statistical method
studies to include in the systematic review. The should be discussed, including the decision of
authors’ search strategy, including all search terms use of random effects models versus fixed effects
and search language, should be outlined in a way model, with authors providing rationale for their
that it can be repeated. Furthermore, all informa- choices. Heterogeneity or consistency of the
tion sources (databases, etc.) used should be results between studies should be analyzed.
described, including the date that the search was The PRISMA statement and checklist are an
last performed. With regard to study selection, invaluable resource to direct critical appraisal of
clear inclusion and exclusion should be delineated. systematic reviews (http://www.prisma-state-
Preferred Reporting Items for Systematic Reviews ment.org) [3].
28  How to Review a Clinical Research Paper? 259

28.3.5 Biomechanical Studies • All outcome measures mentioned in the meth-


ods should be reported.
The biomechanical testing method must be • The text should be consistent with tables and
described in detail, in a fashion that allows figures. Text can usually be shortened or
­reproducibility of the study. Any jigs or testing deleted in favor of tables and figures.
apparatus should be described, with diagrams • Depending on the type of study, the results
where appropriate to clearly communicate the section may contain the demographic data
experimental setup. Furthermore, any parameters instead of the methods section. In that case,
of biomechanical testing should be outlined check whether this data is clearly provided or
clearly, including any external loading conditions not.
applied, as well as the quantifiable amount and • Are the results adequately reported, with num-
direction of force applied. The use of cadaveric bers (not only percentages) and distribution
models requires reporting of specific age, type of values, like standard deviation (SD), standard
preparation, and medical or surgical history of error (SE), or confidence interval (CI)? Is the
the included specimen. Similarly, the use of any reproducibility of the measurements (i.e.,
robotic tools or apparatus used for evaluation of intra- and/or inter-observer intraclass correla-
kinematic forces should be clearly described. All tion coefficients (ICC)) reported?
outcome parameters of biomechanical testing • Are the appropriate number of significant fig-
should be delineated, whether it be bending, tor- ures used based on the repeatability of the
sional, or tensile strength of an orthopedic measures?
implant or kinematic forces measured across a
joint.
28.4.1 Statistical Significance

Fact Box 28.1 Statistical significance is only a guide and it can-


Assessment tools that can help while evalu- not address whether findings are important for
ating the methods of a paper: clinical outcomes. In clinical medicine, confi-
dence intervals are often more helpful than statis-
• Randomized Controlled Trials tical significance. For example, a study
–– CONSORT guidelines investigating the effect of the treatments A and B
• Systematic Reviews on the length of hospital stay may conclude that
–– PRISMA guidelines/checklist there is a statistically significant difference
–– AMSTAR checklist (P = 0.001) in the length of stay between treat-
ments A and B. However, if the length of stay is
4.1 days with treatment A and 4.7 days with treat-
ment B, it is not clinically meaningful even
28.4 Scrutinizing the Results though it is statistically significant. This example
of a Paper indicates that statistical significance is not neces-
sarily equal to clinical significance.
The results should mirror the methods as On the other hand, when the result shows “no
described, including the statistical methods. The significant difference” between the groups, the
points that should be checked are as below: analysis should also be assessed because there is a
possibility of type II error (β). It could be due to
• Is the results section organized in the same insufficient power of the test (1  −  β), and the
order as the methods section? question of whether a power analysis (sample size
• The numbers including variables should be calculation) was performed (usually written in the
reported. section of statistical analysis) should be asked.
260 N. K. Patel et al.

Please assure the authors do not report “trends” in must assess if this agreement/disagreement
cases where statistical significance could not be favors the author’s interpretation of the results.
reached. This usually leads to confusion and bet- As a reviewer, it is important to have some base-
ter be addressed by increasing the sample size or line knowledge about other studies done on the
not reporting. In summary, we should scrutinize same topic in order to determine if the author’s
the results closely and interpret them carefully by line of thought makes sense. It is difficult to have
checking the abovementioned points. a depth of knowledge of certain topics, especially
as a young investigator. Thus, in those cases it
can be useful to read the papers that are being
28.5 I mportant Points to Evaluate referenced by the author in the introduction and
in the Discussion discussion to gain a better understanding of the
literature. With better knowledge of the topic, it
In general, the discussion of a paper serves as a will be become easier to follow the author’s logic
section to interpret the results and present the in the discussion and will allow the reviewer to
meaning of the findings. This section typically properly assess if that logic is appropriate.
includes the following: The strengths and limitation of a study design
play an important role when determining the
• Summary of the main findings of the paper. clinical relevance of the paper and also the poten-
• Focus on the main findings of the paper, and tial impact the paper can have on clinical prac-
avoid generalization to a point where the data tice. Many of the points that will be mentioned in
presented in the current study cannot support the strengths and limitations section of the dis-
the discussion. cussion are already mentioned earlier in
• Correlation of the main findings with previous “Reviewing the Methods of a Clinical Research
literature. Paper” section of the chapter. However, it is
• Strengths and weakness of the study design. important to highlight the factors that will most
• Future directions. directly influence the potential impact of the find-
• Conclusion/take-home message. ings of the study. The number of patients in a
study is an important factor since it directly
Each one of these components of the discus- affects the power of the conclusions made in the
sion should be evaluated to determine the poten- study. For example, if treatment A was shown to
tial clinical impact of the findings presented in have a decrease in length of hospital stay by
the paper. 3  days compared to treatment B, but this was
First, the summary of the main findings of the based on a trial of only ten patients, it does not
study should be comprehensive and set the stage influence clinical practice until a larger study is
for the discussion about these findings. There able to show the same effect. Additionally, mean
should be a balance between simply restating all follow-up time and loss to follow-up are also
of the results and omitting findings that might be important factors to consider when evaluating the
important for interpretation of the results in a impact of the findings of a clinical research paper.
clinical context. It is important for the reviewer to Future direction of a study provides informa-
ask whether this balance was achieved and to tion regarding the potential for the findings of the
make sure that findings from the results that study to influence additional research on the
could possibly add or negate from the author’s topic. This may be used further in the assessment
interpretation are not omitted. of the impact of the findings of a paper. If the
The main findings that are presented in the findings will serve as the basis for many future
beginning of the discussion are typically corre- studies, the influence of the paper may be greater
lated with previous findings in the literature. The than if it was just an isolated finding with no clear
findings of the current study can agree or dis- path of additional research that leads to clinical
agree with those in the literature, but the reviewer relevance.
28  How to Review a Clinical Research Paper? 261

28.6 Format for a Review


to a Journal Fact Box 28.2
Resources with guidelines and further edu-
Key components for a good peer review are (1) cation about reviewing a paper:
accepting to review in a timely fashion, (2) com- • https://journals.lww.com/jbjsjournal/
pleting the review in a timely fashion, and (3) Pages/Consultant-Reviewer-Guidelines.
providing a high-quality peer review. There are aspx
several components to a review for a journal, and • https://publicationethics.org/resources/
the main sections are the following: guidelines-new/cope-ethical-
guidelines-peer-reviewers
• Summary of the paper • http://senseaboutscience.org/activities/
• General Comments peer-review-the-nuts-and-bolts/
• Specific Comments • h t t p s : / / p u b l o n s . c o m / c o m m u n i t y /
• Comments to the Editor academy/

The review should start with a summary of


the paper that shows the author that you have a Take-Home Message
clear understanding of the purpose and results of • Reviewing a paper requires a systematic approach
the paper. This should be followed by general in order to ensure that nothing is overlooked.
comments that should outline the major points • Each section of the paper must be evaluated
that need to be further addressed in the paper for certain elements.
prior to being considered for publication. For • The introduction must provide a clearly rea-
example, if there is further clarification needed soning for the motivation of the study, the aims
about what statistical method was used for cer- of the study, and the hypotheses of the study.
tain comparisons in order to properly interpret • The methods must be described in detail such
the results, this should be included in the general that they can be easily repeated and must be of
comments. The next section is the specific com- the highest quality according to different met-
ments, which include line-by-line comments and rics set based on the study design.
overall comments about each section of the • Interpretation of the results should be in the
paper. The principles for reviewing each section context of clinical relevance, not only statisti-
outlined in this chapter play a big role when cal significance.
compiling the specific comments. The comments • Finally, the discussion should bring everything
in this section of the review include correction of together and allow the reviewer to determine the
grammatical errors, correction of terminology, impact that the study would have if published.
and clarification of any points that are confusing • Overall, the guidelines presented in this chap-
(this can be in the form of a question to the ter for evaluation of each section of a paper
author). Finally, a paragraph with comments to will serve as a good foundation that reviewers
the editor should be included in the review. This can adapt while reviewing specific papers.
will serve to provide the editor with the strengths
and limitations of the paper and the potential fit
of the paper in the journal. This section should References
also include a decision regarding if the paper
should be accepted for publication, reconsidered
1. Brand RA.  Writing for clinical orthopaedics and
after revision, or rejected. Different journals related research. Clin Orthop Relat Res. 2008;
may have different guidelines for the reviewer, 466(1):239–47.

2. Higgins J, Green S.  Cochrane handbook for sys-
but in general these are the sections that should tematic reviews of interventions. Chichester: John
be clearly distinguished in a review submitted to Wiley & Sons; 2008. https://search.library.wisc.edu/
a journal. catalog/9910060197402121.
262 N. K. Patel et al.

3. Moher D, Shamseer L, Clarke M, et al. Preferred report- group randomised trials. BMC Med. 2010;8(1):18.
ing items for systematic review and meta-­ analysis https://doi.org/10.1186/1741-7015-8-18.
protocols (PRISMA-P) 2015 statement. Syst Rev. 7. Stone AV, Jacobs CA, Luo TD, et  al. High degree
2015;4:1. https://doi.org/10.1186/2046-4053-4-1. of variability in reporting of clinical and patient-­
4. Ramisetty N, Kwon Y, Mohtadi N.  Patient-reported reported outcomes after hip arthroscopy. Am J
outcome measures for hip preservation surgery— Sports Med. 2017;46(12):3040–6. https://doi.
a systematic review of the literature. J Hip Preserv org/10.1177/0363546517724743.
Surg. 2015;2(1):15–27. https://doi.org/10.1093/jhps/ 8. Thorborg K, Tijssen M, Habets B, et  al. Patient-­
hnv002. Reported Outcome (PRO) questionnaires for young
5. Schulz KF.  Subverting randomization in controlled to middle-aged adults with hip and groin disability:
trials. JAMA. 1995;274(18):1456–8. https://doi. a systematic review of the clinimetric evidence. Br J
org/10.1001/jama.1995.03530180050029. Sports Med. 2015;49(12):812. https://doi.org/10.1136/
6. Schulz KF, Altman DG, Moher D.  CONSORT 2010 bjsports-2014-094224.
statement: updated guidelines for reporting parallel
Part V
How to Perform a Clinical Study: A Case
Based Approach
Level 1 Evidence: A Prospective
Randomized Controlled Study
29
Seper Ekhtiari, Raman Mundi, Vickas Khanna,
and Mohit Bhandari

29.1 Introduction Perhaps the first example of a true randomized


controlled trial (RCT) is Amberson’s 1931 study,
The documentation and reporting of clinical sce- in which a group of relatively similar patients
narios have been an important part of medical were allocated by coin flip to sanocrysin or no
practice for millennia. The earliest identified treatment for tuberculosis [5]. Another tuberculo-
example of what could be considered a case sis trial, conducted nearly two decades later,
series dates to at least 1600  BC.  The ancient likely represents the most important turning point
Egyptian papyrus (Edwin Smith Papyrus) in the history of RCTs in medicine. Sir Austin
describes the presentation and management of 48 Bradford Hill was the statistician overseeing a
different conditions, primarily traumatic injuries streptomycin trial for the treatment of tuberculo-
[45]. Despite this historical precedent, much of sis and was the first to utilize and formalize some
medical literature remained quite rudimentary for core components of the modern RCT, including
most of the intervening time. The application of truly randomized allocation, allocation conceal-
true scientific rigour to medicine was rarely seen ment, and blinding [9].
before the 1700s, when Enlightenment ideolo- The use of RCTs in orthopaedic surgery, how-
gies began to permeate into medical practice. It is ever, remains limited, partly due to the unique
difficult to establish the “first” controlled clinical and inherent challenges involved in conducting a
trial with certainty, but one of the most likely surgical RCT [15]. Despite these challenges (dis-
candidates was a trial conducted by Dr. James cussed below in further detail), RCTs remain the
Lind in 1747. Lind was a Scottish surgeon who gold standard of evidence-based medicine and
demonstrated through a controlled trial that citrus should continue to be the ultimate goal when
fruits were effective against scurvy in seafaring conducting clinical studies. Randomly assigning
sailors [9]. Throughout the eighteenth and nine- patients to treatment groups that are concealed
teenth centuries, the use of quasi-random alloca- from both the patient and the investigator and
tion methods, most notably alternate allocation comparing the different groups minimizes the
(i.e. “every other patient”), became increasingly risk of selection bias and mitigates the effect of
common and important [12]. some psychological factors that can impact the
outcome of a study. Finally, such a trial can help
to decide if a new treatment is worse, similar, or
S. Ekhtiari · R. Mundi · V. Khanna · M. Bhandari (*) better than the existing standard of care. Thus,
Division of Orthopaedic Surgery, McMaster well-executed RCTs do represent a true pinnacle
University, Hamilton, ON, Canada on the pyramid of evidence-based medicine [4].
e-mail: bhandam@mcmaster.ca

© ISAKOS 2019 265


V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_29
266 S. Ekhtiari et al.

29.1.1 Definitions directly to something else—this may include


any or all of the following: a different treat-
Many terms describing various aspects of clinical ment, a placebo, or no treatment. The term
trials are often used interchangeably and at times “control group” is used with various meanings
incorrectly. A detailed understanding of the both colloquially and in the scientific litera-
nomenclature is essential in designing, labelling, ture. For the purposes of this chapter, the
evaluating, and interpreting clinical trials. These terms “control” and “control group” will refer
terms are briefly defined in this section. Akobeng to all of the above possibilities.
provides an excellent and detailed review of each
component [4]. The following chapter will focus on basic
methods for conducting RCTs in orthopaedic
• Prospective refers to the a priori planning of surgery. The chapter will be separated into three
various stages of the trial. True RCTs are, by main sections: (1) planning for an RCT, (2) con-
definition, prospectively designed as random- ducting the RCT, and (3) knowledge dissemina-
ization and allocation can only occur prospec- tion and translation. Each section will contain a
tively. “Prospective” can apply to the study detailed discussion of the steps required for con-
design, the data collection, and the data analy- ducting a successful RCT. Throughout the chap-
sis. In a study that is fully prospective by ter, the Fluid Lavage of Open Wounds (FLOW)
design, data is collected for a predefined study, trial [61] will be used as a case study to demon-
and the data analysis is performed based on a strate the practical and real-life application of
predetermined plan. However, not all publica- the topics discussed. The FLOW study was a
tions stemming from an RCT are necessarily randomized, multicentre, controlled trial looking
fully prospective; prospectively collected data at the management of open fractures. Specifically,
from an RCT can also be analysed retrospec- the study compared irrigation pressures (very
tively. Prospective data collection tends to low vs. low vs. high pressure) and irrigation
capture more precise and accurate data as the
data collection instruments have been selected
and/or designed specifically to address the Fact Box 29.1: “Prospective Blinded
study objectives [19]. Randomized Controlled Trial”:
• Randomization refers to the method by Words with a Meaning
which patients are allocated to the various 1. Prospective: All planning, including
treatment and/or control groups. Ideally, this data collection instruments and data
should be a truly random method that has no analysis planning, is completed before
potential for participant or researcher interfer- the study is initiated.
ence. This does not include quasi-random 2. Blinded: One or more of the groups of
methods of allocation such as chart numbers, people involved are kept unaware of the
birth dates, or alternate allocation as often group allocations.
unrecognized bias or trends can influence the 3. Randomized: A true randomization
allocation. Specific methods of randomization technique is used to allocate patients to
are discussed below. group.
• Blinding (or “masking”) refers to the process 4. Controlled: There is a comparison
by which the treatment that each patient is group (may be placebo, no treatment, or
receiving remains unknown to one or more of other treatment).
five groups of individuals: the patient, the care 5. Trial: A clinical study.
provider(s), the outcome adjudicator(s), the Recommended resource: Akobeng
data collector(s), and the data assessor(s) [42]. AK. Understanding randomised controlled
• Controlled refers to the fact that the trials. Arch. Dis. Child. 2005;90:840–4.
treatment(s) of interest is being compared
29  Level 1 Evidence: A Prospective Randomized Controlled Study 267

solution (soap vs. normal saline) and their effects Feasibility refers to the practical and logisti-
on reoperation rate in open fractures. The study cal aspects of conducting a study. There are vari-
randomized a total of 2551 patient, of whom ous considerations within this component.
2447 were included in the final analysis. The Sample size is an important consideration: a sam-
investigators found that reoperation rate was ple size calculation (discussed later in this chap-
similar across irrigation pressures but was sig- ter) can help to estimate the number of subjects
nificantly lower for normal saline compared to required for adequate statistical power. A review
soap [61]. of hospital data can provide a rough estimate of
the potential number of subjects that can be
expected to be available for recruitment. These
29.2 Planning for a Randomized numbers should be treated cautiously, and factors
Controlled Trial such as declining participation, loss to follow-up,
ineligibility for inclusion, and random fluctua-
The adage “if you fail to plan, you are planning to tions in patient volumes should be considered.
fail” certainly applies when it comes to conduct- The number of subjects required/available will
ing RCTs. There are several crucial steps that also inform the timeframe of the study. It is
need to be taken before embarking on a full-scale important to ensure that adequate technical
RCT. These preliminary steps serve many func- expertise is available for the completion of the
tions, including setting the foundation for the project. Multicentre collaboration may be neces-
project, revealing technical and logistical chal- sary if the subject matter or outcome is highly
lenges with conducting the study, and verifying subspecialized and relatively rare. Funding is an
that the research question is indeed feasible, important issue, and a preliminary budget outline
novel, and interesting. These steps include defin- can be helpful in projecting the expected costs of
ing the research question(s), performing ­literature all aspects of the study. If external funding (e.g.
reviews, conducting surveys, executing pilot government research grants) is required, poten-
studies, calculating required sample sizes, and tial funding bodies should be identified early to
assembling the necessary support structures. ensure the project is be designed in a way that
meets their requirements and mandates. Finally,
the scope of the project should be broad enough
29.2.1 Defining the Research to answer the research question, but not too broad
Question as to cause confusion and risk losing focus on the
underlying question [29].
An appropriate, clearly defined research question The research question should be interesting.
is an indispensable part of conducting research at This may sound self-explanatory but can be dif-
all levels of evidence. As the foundation of the ficult to accomplish in actual practice. What may
entire project, this is arguably the most important seem interesting to a group of surgeons at one
step of the research process. As Kumar (2005) institution may be of limited or no relevance to
puts it, “it is like the identification of a destina- most other surgeons. This does not preclude the
tion before undertaking a journey…in the absence execution of the study, but does impact its appli-
of a destination, it is impossible to identify the cability and generalizability. Discussions with
shortest—or indeed any—route”. There are cer- experts in the field and potential funding bodies
tain requirements that should be considered in are good starting points to assess interest in the
the development of a good research question. idea [29]. Ultimately, a large survey of surgeons
Hulley et al. have suggested the use of the acro- in the relevant field is an effective way to objec-
nym FINER, which has become widely used and tively gauge interest in the research question and
accepted as important criteria for a good research assess its potential to change practice [46].
question [29]. The acronym stands for: feasible, Novelty is another important component of a
interesting, novel, ethical, and relevant. good research question. This is perhaps the most
268 S. Ekhtiari et al.

lishment of the Nuremberg Code. Despite this,


Case-Based Example 1: FLOW Survey: the Tuskegee Syphilis Study continued to enrol
Overview low-­income African-American subjects until
Before embarking on the FLOW study, the 1972. These patients were not told about their
investigators conducted an international syphilis status, and were not offered penicillin, a
survey of nearly 1000 orthopaedic sur- proven therapy for the disease [39]. Institutional
geons to identify their practice patterns and ethics board requirements should be consulted
the likelihood that they would change those for each specific study, but the Helsinki
practice patterns based on evidence from a Declaration provides useful general guiding prin-
large RCT.  The vast majority of respon- ciples [67]:
dents (94.2%) stated that they would
change their practice based on such data –– Research with humans should be based on
[46]. Some important factors in this high basic science where possible (i.e. lab and/or
level of interest likely include the potential animal models).
to improve outcomes in a highly vulnerable –– Research protocols should be reviewed by an
population, the presence of alternatives that independent research ethics board (REB)
are widely available and relatively inexpen- prior to initiation.
sive (high vs. low-pressure irrigation, nor- –– Informed consent must be obtained.
mal saline vs. soap), and a paucity of –– The individuals conducting the study should
pre-existing clinical evidence. be adequately qualified and trained.
–– Risks should not exceed benefits.

nuanced and least universal of the FINER crite- Whereas the FINER criteria provide useful
ria. Hulley et al. (2007) state that “good clinical general requirements for a good research ques-
research contributes new information” [29]. tion, the PICOT format presents a practical way
While this is certainly true, the concept of “new for defining and communicating the research
information” can be considered in various ways. question. This format was first outlined by
The research question does not need to be entirely Richardson et  al. as “PICO”: population, inter-
new. A previous research question can be applied vention, comparator, and outcome [48]. It was
to a new population, a new tool or measurement later expanded to PICOT to include the time-
can be used to assess a previous research ques- frame over which the study would be conducted
tion, and so on. As well, the importance of repli- [24]. Like the FINER criteria, the PICOT format
cation as part of the scientific method cannot be is applicable to all levels of evidence. The indi-
overstated. A recent analysis of education vidual components of the PICOT format are out-
research found that only 0.13% of publications in lined below.
the top 100 education journals were dedicated to The population of interest should be well
replicating previous research [37]. Thus, while defined. This is mostly accomplished through
novelty is important, it can be interpreted in many clear and well-defined inclusion and exclusion
different ways and should not be taken to mean criteria [60]. Important components include basic
that every research project has to have an entirely demographic information (age range, sex, etc.),
original research question. the nature of the injury/disease (e.g. acute vs.
While it may seem rather obvious that a subacute/chronic), and special populations (e.g.
research question should be ethical, the history paediatric or pregnant patients, etc.). Some com-
of clinical research contains many cautionary ponents of the target population may need to be
tales of unethical research, including some rela- considered even though they are not formally
tively recent examples. The turning point for part of the inclusion and exclusion criteria. For
ethics as a cornerstone of human research
­ example, two hypothetical studies conducted in
occurred following World War II, with the estab- Brazil and China may have identical inclusion/
29  Level 1 Evidence: A Prospective Randomized Controlled Study 269

exclusion criteria, but will inevitably have some Finally, the planned timeframe is important
inherent differences in the populations they to identify. Again, as with costs, this will not be
recruit. an exact projection, so some buffer room should
The intervention of interest is often a new be accounted for in case some steps take longer
technique, implant, or adjunct that is the primary than expected. The timeframe also has significant
focus of the study. The opportunity should be logistical implications: budgeting for research
taken at this early stage to define the intervention and administrative staff can vary significantly
clearly and in detail. All pertinent and applicable depending on the timeframe, and the graduation
details should be recorded. If a new surgical tech- and potential relocation of learners may need to
nique is being studied, a detailed description of be accounted for.
the operation, often published separately as a The FINER and PICOT criteria can be
technique article, is important. The comparator applied to the development of primary and sec-
is essentially the control group[s]. Again, as ondary research questions. The primary ques-
much detail as possible should be included. The tion should be a single, clearly stated question
type of comparator (e.g. no treatment vs. placebo driven by a hypothesis that forms the basis for
vs. alternate treatment) should be explicitly stated the design of the study and the choice of data
and clearly defined [62]. collection instruments [21]. Secondary research
The outcome is at the heart of why the study questions are other outcomes related to the
is being conducted. What is expected to improve intervention. Any results or answers to second-
(or not) with this new or different intervention? ary questions should be considered preliminary
Generally, in orthopaedics, there are three broad rather than definitive [64]. As well, secondary
categories of outcomes that are important to eval- questions should be limited in number to avoid
uate: generic outcomes, condition-specific out- the trap of many ­ comparisons. When many
comes, and utility outcomes. comparisons (i.e. statistical tests) are made, the
chances of finding a spurious result increase,
unless the appropriate post hoc statistical cor-
Case-Based Example 2: Excerpt from FLOW rections are performed to account for the multi-
Protocol ple comparisons [21].
The FLOW investigators described all
intervention arms in great detail in their
original study protocol, which was pub- 29.2.2 Literature Review
lished [62]. For example “…will use ster-
ile technique to inject 80 mL of the clear A detailed description of how to conduct litera-
liquid soap (Castile Soap, Triad Medical ture reviews, including systematic reviews, can
Inc. Franklin, Wisconsin—17% concentra- be found later in this book. Briefly, a thorough
tion in de-ionized water preserved in review of the literature is important to ensure that
90 mL bottles)”. Irrigation pressures were the research question is novel and that clinical
also defined objectively with PSI cut-offs. equipoise exists. Clinical equipoise refers to the
This level of detail is important for multi- fact that for a given comparison of treatment
ple reasons: it allowed the study to be con- arms (e.g. operative vs. nonoperative, drug vs.
ducted consistently across 41 sites on four placebo, operation X vs. operation Y, etc.), a con-
different continents, simplifies any future sensus of experts would not consider one arm
repetition of the study, it made for clear clearly superior to the other [25]. Clinical equi-
and consistent communication in subse- poise is an important requirement of an RCT,
quent publications, and it minimized the both from an ethical standpoint and in terms of
effects of varying protocols and practice potential clinical applicability. In addition to pub-
patterns. lished literature, it is important to consider
sources of unpublished and in-progress literature,
270 S. Ekhtiari et al.

such as trial registries (e.g. www.clinicaltrials.


gov), conference abstracts, and online research with 984 completed surveys. Had the inves-
communities where researchers can post their tigators distributed only 930 questionnaires
current works in progress (e.g. www.research- with the same response rate, they would
gate.net). have had an insufficient number of com-
pleted responses (521) [46]. This demon-
strates the importance of overshooting
29.2.3 Surveys sample size targets where possible.

A survey of experts in the field is not always a


component of preparing for an RCT, but it serves
as a valuable tool and should be considered in
most cases. A well-conducted survey can pro- Case-Based Example 4: FLOW Pilot Study
vide an objective basis upon which to develop The FLOW pilot study randomized 111
the RCT.  It can also be an asset in requesting patients and followed them for 1 year. Two
funding, as it can demonstrate clearly that the main issues were identified and addressed
question is interesting, relevant, and important. before the full-scale trial was conducted.
It is important to design and conduct the survey First, three of nine sites were found to have
in a fashion that maximizes the survey’s validity below-target compliance rates. These sites
and response rate. Sprague et al. provide a list of received additional training prior to partici-
12 principles for conducting a survey in ortho- pation in the FLOW RCT. In addition, the
paedic surgery, along with some important tips loss to follow-up rate of nearly 20% was
and strategies [57]. identified as an area for improvement. The
investigators outlined five strategies that
they used in the definitive trial in order to
improve retention [7].

Case-Based Example 3: FLOW Survey: A


Closer Look
Prior to conducting the FLOW study, the Fact Box 29.2: Defining the Research
investigators surveyed an international Question
group of surgeons. The survey was devel- PICOT and FINER are useful acronyms to
oped using focus groups, previous litera- keep in mind when defining the research
ture and key experts. The questionnaire question. FINER contains the overarching
was then pre-tested with an independent elements that should be included in a
group of four orthopaedic surgeons to research question, whereas PICOT pro-
assess face and content validity. A sample vides a practical method for communicat-
size analysis was performed for the survey ing the question.
to estimate the number of participants FINER: Feasible, Interesting, Novel,
required to achieve sufficient statistical Ethical, Relevant
power. This analysis estimated that at least PICOT: Population, Intervention,
930 questionnaires needed to be distrib- Controls, Outcomes, Timelines
uted, based on a 70% response rate, and Recommended resource: Farrugia P,
650 completed surveys needed. The inves- Petrisor BA, Farrokhyar F, Bhandari
tigators distributed 1764 surveys by mail M.  Practical tips for surgical research:
and in-person at a trauma course. Research questions, hypotheses and objec-
Ultimately, there was a 56% response rate, tives. Can. J. Surg. 2009;53:278–81.
29  Level 1 Evidence: A Prospective Randomized Controlled Study 271

29.2.4 Pilot Studies where additional support may be necessary. It


is important to consider the entire spectrum of
Pilot studies are smaller, shorter versions of the support staff that may be necessary: volun-
trial that may eventually be conducted. The pri- teers, research assistants, financial experts,
mary purpose of a pilot study is to evaluate the grants administrators, and statisticians. The
feasibility and logistical practicality of the pro- need for these various roles may fluctuate over
posed study. The purpose of a pilot study is not the different stages of the trial, and commit-
to test the hypothesis. In fact, results from pilot ments and contracts should be tailored as such
studies should not be interpreted in the same with some flexibility to allow for unexpected
way as results from full-scale clinical studies. circumstances.
The main focus during the pilot process is to
identify and resolve challenges in recruitment,
randomization, blinding, and retention [7]. In 29.3 Conducting a Randomized
most cases, data from a pilot study should not Controlled Trial
be included in the eventual analysis of RCT
data. This is because methodological changes Conducting an RCT is a long process with many
are often made based on the pilot study, which stages, some of which occur simultaneously. An
adds a source of uncontrollable variability to awareness of the trial’s stage of progress and
the data [34]. active preparation for upcoming stages are an
important factor in the smooth operation of an
RCT. The main steps of conducting an RCT are
29.2.5 Sample Size Analysis as follows: REB approval, trial registration,
patient recruitment, randomization, allocation
A thorough review of sample size analysis can concealment, blinding, control and intervention
be found in Chap. 20 of this book and thus implementation, follow-up, and statistical analy-
will not be discussed in detail here. Briefly, it sis. These steps are discussed in the below
is important to perform sample size calcula- section.
tions at each stage of the process, including
survey administration, pilot study, and full-
scale RCT.  Care should be taken not to be
overly optimistic about recruitment, eligibil- Case-Based Example 5: FLOW Study
ity, and retention rates [49]. In addition, the Support Team
treatment effect tends to be overestimated, The FLOW study employed one full time
particularly if the expected effect size is based research coordinator and one full time
on previous small studies [1]. Thus, where undergraduate research assistant for the
possible, the final sample size target should entirety of the trial. In addition, a project
be greater than the calculated sample size by manager worked about half time through-
10–40% [63]. out the project to oversee the project and
manage finances. A data manager and a
statistician were available for the entirety
29.2.6 Support Structures of the trial, but the early phases required
mostly data management whereas towards
It is important to identify, recruit, and train the the end of the trial, the statistician was
necessary personnel required to conduct the required nearly full time. Finally, a grants
RCT before the trial has begun. Pilot studies administrator helped to prepare and submit
can be very helpful in identifying strengths and grants for funding opportunities.
limitations of the research team and areas
272 S. Ekhtiari et al.

29.3.1 REB Approval and Trial physicians are often the first members of the
Registration orthopaedic surgery team to meet a patient and
thus can play an important role in maximizing
Prior to conducting an RCT, institutional REB recruitment by identifying eligible patients. If a
approval should be obtained. In addition, any trial is time-sensitive (e.g. time from presentation
necessary funding should be applied for and to operating room is a primary outcome or an
secured. Both processes will generally require a independent variable), it may be necessary to
detailed protocol of the study, including the lit- train triage and other emergency department staff
erature review, objectives, methods, hypotheses, to identify and flag eligible patients for
timelines, and detailed budget. Chapter 8 in this recruitment.
book provides a detailed overview of how to pre- What will the recruitment process entail? A
pare a study protocol. Once the protocol is com- detailed informed consent form, which includes a
plete, it should be registered with the appropriate patient copy, is necessary but not sufficient in
trial registry. In North America, this is most com- patient recruitment. It is important that the indi-
monly www.clinicaltrials.gov and in the viduals obtaining informed consent are well
European Union www.clinicaltrialsregister.eu. informed about the purpose of the study, the vari-
Prospective registration of all clinical trials is ous treatment arms, and the potential risks and
important because it ensures transparency, pre- benefits of each. Copies of the informed consent
vents duplication, reduces publication bias, and form should be readily accessible and available
lessens the likelihood of selective reporting [6]. to members of the research team at key patient
encounter locations (e.g. emergency department,
clinic, etc.). As well, if any incentives are being
29.3.2 Patient Recruitment provided for participation (e.g. gift cards, etc.),
these should also be available in advance in a
A useful way to think about patient recruitment is secure location.
to think about the “who, what, where, when, why, Where and when will the recruitment take
and how”. A detailed recruitment plan is impor- place? Depending on the details of the RCT,
tant to have at the protocol stage, as adequate recruitment may take place in the emergency
sample size is integral to RCT success. An unsuc- department, on inpatient wards, or in outpatient
cessful recruitment strategy can serve as a bottle- clinics. In the case of multicentre studies, the
neck that slows down or altogether halts an ideal recruitment location may vary between the
otherwise well-planned trial. different sites. Thus, consultation with staff at
Who will recruit the subjects? It is impor- each site is important in the planning and pilot
tant to identify and train the individuals who will stages to ensure that the protocol is consistent but
be recruiting patients. Care should be taken to adaptable to each site. It is also important to
ensure that these individuals are not those directly define a priori the “hours” of the trial. This is
responsible for the care of the patient. Involving particularly applicable if the trial is focused on
the patient’s care team (physicians, nurses, etc.) emergent and/or traumatic conditions, as they
can be an important starting point as there will may present at any hour. Realistic goals should
already be an established rapport. However, it is be set depending on the availability of research
preferable that the role of the patient’s care team support staff. For example, the HIP ATTACK
be limited to informing the patient about the trial is an ongoing, international RCT evaluating
research project and evaluating whether the accelerated (within 6 h of diagnosis) vs. standard
patient would be willing to meet with a member care for hip fracture patients. For the enrolment
of the research team [55]. An independent mem- of each patient, a research staff is required to
ber of the research team should then obtain enrol and randomize the patient in a timely man-
informed consent, and it should be made clear ner to achieve the target time for patients ran-
that the decision to not participate in the trial will domized to accelerated treatment. Thus, in this
not unduly affect the patient’s care. Resident trial, subjects are only recruited during daytime
29  Level 1 Evidence: A Prospective Randomized Controlled Study 273

working hours, although each recruitment site recruitment issues. Thoma et al. provide a detailed
may have differing definitions of working hours guide to troubleshooting each type of recruitment
depending on staff availability [28]. challenge [63].
Though patient recruitment in surgical trials
most often occurs in person, there are multiple
other methods for patient recruitment. These 29.3.3 Randomization
include media outlets, physician referrals, cold
calls/mailings, and online recruitment [63]. Randomization, as defined above, is essentially
Previous literature has identified physician refer- the process by which subjects are allocated to the
rals and online recruitment as the most cost-­ various groups of an RCT. It should be noted that
effective strategies [63]. Local and institutional methods such as the use of chart numbers, birth
regulations should be consulted before public dates, or alternating allocation are not truly ran-
recruitment strategies are considered. dom and thus considered “quasi-randomization”;
How will the recruitment be performed and they should be avoided if possible [17]. Even
why? There are multiple different recruitment truly random methods that are manually per-
strategies, and the most appropriate method formed (e.g. coin flips, dice throwing, etc.) are
should be chosen to fit the study design. These vulnerable to user error and/or interference. The
strategies include [63]: use of secure, computer-generated randomization
software is usually considered the best method
1. All patients recruited simultaneously and
for random allocation [17].
begin trial at the same time. There are at least four different types of ran-
2. Patients enter trial in a “batched” fashion. dom allocation, all of which may be acceptable
3. Continuous recruitment until target sample
for an RCT depending on the specific scenario:
size is reached.
4. Continuous recruitment until a target end date • Simple randomization is most commonly
is reached. performed using a random table number or
computer-generated randomization software.
Each strategy is better suited for certain study This is the easiest randomization technique to
designs than others. For example, recruiting all employ, but should generally be reserved for
patients and beginning a trial simultaneously large clinical trials. In smaller clinical trials,
works best for nonoperative trials, where many this method could result in groups that are
patients can be instructed to begin a therapy on or unbalanced in terms of total numbers and/or
about the same date. Batch enrolment can work baseline characteristics [59].
well for common, elective procedures, such as • Block randomization is useful in smaller tri-
total joint arthroplasty. For example, all eligible als to ensure similar allocation between
patients undergoing a total joint arthroplasty in groups. The “block size” refers to the number
each week can be recruited and enrolled in the of patients that are randomized at a time and
trial, and this process can be repeated as neces- should be a multiple of the number of treat-
sary. Continuous recruitment is the most common ment groups (e.g. if two treatment groups
recruitment strategy in surgical trials, particularly exist, block size can be 4, 6, 8, etc.). Subjects
when it comes to trauma. For most RCTs, recruit- in these small blocks are then randomized in a
ment is continued until the target sample size is balanced fashion to the treatment groups. For
reached. If there is a clear rationale to stop recruit- example, in a study with two treatment groups
ment on a specific date (e.g. seasonal conditions/ (A and B) and a block size of four, the first
injuries), this strategy may be considered. four subjects can be randomized in any of the
Patient recruitment difficulties may be encoun- following combinations: AABB, ABAB,
tered with various aspects of the RCT. These BBAA, ABBA, BAAB, and BABA.  One of
include the protocol itself, staff- or site-specific these six combinations is then selected at ran-
issues, surgeon-related issues, and patient-related dom and applied to the first four patients. The
274 S. Ekhtiari et al.

process is then repeated for each subsequent are overestimated by 41% in trials that do not
group of four subjects [4, 59]. The block sizes perform allocation concealment [52].
can be constant or variable for each Traditionally, a common technique for allocation
allocation. concealment was the use of opaque, sealed enve-
• Stratified randomization is a strategy that lopes, which a member of the research team
attempts to minimize the likelihood of hav- would open for each patient at the time of ran-
ing significantly different covariates between domization. The Cochrane Handbook of
patient groups. In this type of randomiza- Systematic Reviews of Interventions designates
tion, patients are grouped based on select, the use of sealed opaque envelopes as carrying a
predetermined prognostic factors before “low risk of bias” [27]. Empirical data, however,
being randomized. For example, in a study suggests that trials utilizing sealed opaque enve-
on the wound complications of total joint lopes are prone to tampering [33] and are more
arthroplasty, the presence of diabetes melli- likely to demonstrate statistically significant
tus (DM) is an important prognostic factor. results compared to those using distance random-
Thus, patients could be stratified based on ization [26]. Distance randomization occurs
DM status before being randomized to a when the randomization process is completely
treatment arm. Though this type of random- removed from the control of the individual(s)
ization is appealing due to its potential to enrolling the subjects. Usually, this occurs either
control for covariates, it can become compli- via telephone or the Internet, whereby the
cated if many covariates need to be con- research team member calls or logs into a secure,
trolled. As well, this type of randomization centralized service that then randomizes the sub-
requires that all participants be identified ject and records their allocation. Distance ran-
before randomization occurs [59]. While domization, particularly through third-party,
this may be well suited for some elective secure websites, is the preferred method for ran-
procedures (e.g. by identifying patients on a domizing subjects in today’s RCTs.
waiting list for the same operation), it would
not be applicable for any emergent or
trauma-related RCT. 29.3.5 Blinding
• Covariate-adaptive randomization is a
strategy that attempts to address some of the Blinding is a distinct concept from allocation
limitations of stratified randomization. concealment; it refers to the process by which
Essentially, subjects are enrolled sequentially one or more of the following groups are kept
in real time, and the covariates of interest (e.g. unaware of the subject’s assigned treatment
age, comorbidities, etc.) are entered for each group: (1) the subjects, (2) the caregivers, (3) the
patient. Covariate-adaptive randomization outcome adjudicators, (4) the data collectors, and
attempts to randomize new subjects into (5) the data analysts. Depending on the study
groups by considering the previously design, any or all of these five groups may be
­randomized subjects and correcting for any blinded [31]. According to the Consolidated
imbalances between groups [59]. Standards of Reporting Trials (CONSORT) state-
ment, the commonly used terms “single blinded”,
“double blinded”, and “triple blinded” should be
29.3.4 Allocation Concealment avoided, as they are ambiguous and uninforma-
tive. Rather, investigators should specifically
Allocation concealment refers to a situation in state which, if any, of the five groups outlined
which the individual enrolling the patients and above were blinded [43]. As well, it is important
performing the randomization does not know to clarify if the same person or group fulfilled
which group the next subject will be randomized more than one of the roles. For example, in a sur-
to. Previous literature has shown that effect sizes gical trial, the primary surgeon can potentially be
29  Level 1 Evidence: A Prospective Randomized Controlled Study 275

the caregiver, the outcome adjudicator, and the discussed in greater detail below, but this is
data collector. essentially the only way to truly blind subjects in
As a rule, as many groups as possible should an operative vs. nonoperative trial.
be blinded to the group allocation. In practice, Blinding surgeons in an operative trial is
particularly in the context of surgical trials, this is almost always impossible or highly impractical
not always possible. For example, it is impossible unless the intervention is an adjunct that can be
to blind the surgeon to an operative intervention easily masked (e.g. intraoperative local anaes-
[44]. Inability to blind one or more groups, how- thetic injection). Blinding of the remaining three
ever, should not automatically result in a com- groups, however, can and should be done in many
plete lack of blinding. A systematic review of cases. Blinding of outcome adjudicators and data
orthopaedic trauma RCTs found that less than collectors can be performed in several different
10% of RCTs reported blinding outcome adjudi- ways. Surgical scars can be concealed with
cators. Interestingly, up to 96% of outcome adju- patient clothing or large dressings [38]. Even
dicators could have been blinded with simple radiographs showing different implants can
methods not requiring major changes to the RCT sometimes be masked using creative digital alter-
protocol [30]. A more recent review of all surgi- ation, as demonstrated by Karanicolas et al. [31].
cal literature found that 52% of RCTs blinded the Radiographs and other imaging modalities can be
outcome adjudicators, compared to 67% of de-identified prior to being provided to the out-
RCTs, which could have done so [56]. This come adjudicator. In the case of patient-reported
review, however, was not systematically per- outcomes, an extra, independent individual may
formed and only examined the ten highest-impact be required to collect data from the patient, de-­
journals in medicine and surgery and thus may identify it, and then provide it to the data assess-
paint an overly optimistic picture. ment team. Blinding of the data assessment team
Blinding is particularly important in orthopae- is perhaps the simplest form of blinding to
dic RCTs, as many outcome measures are focused achieve, only requiring that data is coded in such
on patient-reported levels of function, pain, and a way that allows comparison between groups
quality of life. Thus, creative strategies may be but does not allow the data analyst to decipher
required to blind each group of individuals. each subject’s allocation. Despite this, no ortho-
Blinding subjects is a simple task in some medi- paedic RCTs published between 1988 and 2000
cal and procedural trials. For example, a trial of specified their use of this technique [8].
intra-articular knee injections comparing cortico-
steroids to placebo was performed with the sub-
jects blinded [47]. By having the syringes Case-Based Example 6: Randomization,
prepared by an independent healthcare provider, Allocation Concealment, and Blinding in the
the caregivers could have also been blinded in FLOW Study
this trial. Similar strategies can be used for other The FLOW study randomized subjects
nonoperative trials that do not require a visible using a custom-built web-based random-
construct such as a specific sling or brace type. ization system using variable block ran-
Blinding of subjects can also be relatively domization. Given that this was a ‘distance
straightforward for trials comparing two different randomization’ technique, allocation con-
types of operations, particularly if both opera- cealment was ensured. Patients, outcome
tions use similar surgical approaches. In trials adjudicators, and data analysts were all
comparing operative to nonoperative manage- blinded to subject allocation. In addition, a
ment, however, blinding becomes more challeng- central adjudication committee, also
ing. Blinding of subjects can be performed blinded to group allocation, assessed all
through sham surgery, which also provides an subjects for eligibility before or shortly
excellent control for the placebo effect of having after randomization.
undergone an operation [44]. Sham surgery is
276 S. Ekhtiari et al.

29.3.6 Control Groups and most appropriate choices for control groups.
The decision between the inclusion of one or both
An appropriate choice of control group(s) is of groups depends on the pre-existing evidence on
paramount importance in the success and impact the current standard of care. If there exists true
of an RCT.  There are many different types of clinical equipoise about the effectiveness of a
control groups that can be used in surgical trials. given operation, for example, it would be reason-
In general terms, these include no treatment con- able to randomize patients to either operative or
trols, placebo controls, and active treatment con- conservative treatment and compare their out-
trols. From a strictly scientific standpoint, the comes. However, if there is a well-established
best control group is a placebo group—i.e. a treatment that is known to be effective, it would
group that receives no active intervention, but is be unethical to withhold that treatment altogether.
blinded to this fact. This type of control can be Thus, the new intervention should be compared to
implemented with relative ease in trials of oral or the standard of care [29]. It can sometimes be dif-
injected medications. As discussed earlier, pla- ficult to determine whether it would be ethical to
cebo groups in the context of operative trials withhold the current standard of care. A treatment
essentially amount to sham surgery. A recent sys- may have become established as the “standard of
tematic review found six orthopaedic RCTs that care” based on little or no high-quality evidence.
utilized sham surgery. Interestingly, all six stud- Legally, a determination of “standard of care” is
ies found that sham surgery was as effective as often made based on the practice patterns of simi-
therapeutic surgery. The most recent of these six lar physicians. Thus, even if scientific evidence is
trials was published in 2012 [36]. lacking, it may be legally perilous to withhold a
Thus, sham surgery is clearly possible in commonly practiced standard of care [40].
orthopaedic RCTs and has the potential to pro-
vide very useful information. There are, however,
significant ethical concerns with randomizing 29.3.7 Follow-Up
patients to receive sham surgery. Patients are
exposed to risk, with no therapeutic intervention, The intended length of follow-up should be
which may violate the ethical principles of non-­ decided a priori (see the discussion on PICOT
maleficence and beneficence [44]. On the other format, Sect. 29.2). Of course, follow-up can
hand, the argument could be made that by reveal- then be extended beyond that timeframe, but the
ing some procedures to be no better than placebo, target follow-up is a decision made based on the
future patients can avoid undertaking unneces- known or expected natural history of the disease
sary risks. This does not, however, account for or injury. Arrangements should be made in
the fact that performing sham surgery also threat- advance to minimize loss to follow-up, such as
ens the trust in the doctor-patient relationship asking patients if they plan to relocate within the
because sham surgery status must be kept blinded study time period and obtaining multiple alter-
even at follow-up. Overall, sham surgery may be nate contacts for each subject. All loss to follow-
appropriate in select cases, particularly when ­up should be carefully recorded, including the
there is true clinical equipoise about the utility of reason for loss (e.g. loss of contact, death, with-
a surgical intervention and when that intervention drawal of consent, etc.). As well, all adverse
is minimally invasive, low risk, and common events, even those fully unrelated to the interven-
enough to confer the results with significant tion, should be carefully tracked.
potential impact [44]. Extra attention should be
paid to the informed consent process to ensure
that patients truly understand the rationale, logis- 29.3.8 Statistical Analysis
tics, and implications of sham surgery.
For most orthopaedic trials, active treatment Part III of this book contains a basic but thorough
and/or no treatment is the most commonly used approach to statistics. Therefore, only concepts
29  Level 1 Evidence: A Prospective Randomized Controlled Study 277

specifically related to analysis of RCT data will


be discussed here. It is recommended that a stat- Fact Box 29.3: The Challenge of the
istician, or at least an individual with formal sta- Intention-to-Treat Principle in Surgical RCTs
tistics training, be consulted throughout all steps For example, a patient drawn for non-surgi-
of data analysis for an RCT. The details of these cal treatment who, for any reason, then
analyses are beyond the scope of this chapter, but undergoes surgical treatment should,
are often complex and require the appropriate according to the intention-to-treat princi-
expertise. However, two important concepts that ple, still be analyzed as a non-surgical case.
should be understood for anyone involved in con- If this patient happens to present infection
ducting RCTs are the intention-to-treat principle at the surgical site, the results from such a
and non-inferiority analysis. study would show “occurrence of infection
The intention-to-treat principle is the tech- of the surgical site” as a “complication
nique of analysing all patients randomized to a from non-surgical treatment” [38].
given group together, regardless of if they went Recommended resource: Malavolta
on to receive the appropriate treatment or com- EA, Demange MK, Gobbi RG, Imamura
plete the study [4]. This simulates the real-life M, Fregni F.  Randomized Controlled
limitations of clinical practice—patients stop Clinical Trials in Orthopedics: Difficulties
attending appointments, relocate, change their and Limitations. Rev Bras Ortop.
minds, are misdiagnosed, or deviate from pre- 2011;46:452–9.
scribed treatments. By excluding these patients
from the primary analysis, the results would
demonstrate an optimistic “best-case scenario” inferiority analysis may be desirable as a first
rather than results that can be realistically step if the sample sizes required for a superior-
expected. Thus, the primary statistical analysis, ity trial are unrealistic. For example, in com-
where possible, should be based on the paring a new intervention with the current
intention-­to-­treat principle [43]. This concept standard of care, the incremental effect size
can often be applied to operative trials [14] but, difference is likely to be relatively small, and
if applied incorrectly, can also produce confus- thus the sample size required for the RCT
ing and nonsensical results. Malavolta et  al. would be very large. Alternatively, if the goal
demonstrate this with a clear example (see Fact is to demonstrate that the new intervention is
Box 29.3). “no worse” in a secondary outcome, but not in
Clearly, care should be taken to apply the the primary outcome, a non-inferiority trial can
intention-to-treat principle appropriately and to be performed [65].
data that is amenable to such analysis. Where A final note on statistical analysis in RCTs:
necessary, it is permissible to restrict some of the according to the CONSORT statement, baseline
secondary analyses to those who completed the differences among treatment groups should not
study protocol (i.e. protocol analysis). be tested for statistical significance. The rea-
Non-inferiority refers to a study design that soning behind this assertion is as follows:
simply seeks to show that the new intervention assuming that the trial has been conducted with
is “no worse” than the current standard of rigorous and appropriate methodology and that
treatment. “No worse” can be defined in vari- the randomization technique selected is appro-
ous ways, including a statistically significant priate for the trial, any baseline differences
difference, a minimal clinically important dif- between the groups are necessarily due to
ference (MCID), and more. This contrasts with chance. Thus, a test of statistical significance,
a “superiority trial”, which may be thought of which assesses the likelihood that these differ-
as the classic RCT.  In a superiority trial, the ences were due to chance, is redundant and
goal is to demonstrate that the new interven- unnecessary; in fact, its results may be mislead-
tion is superior to the current standard. Non- ing [43].
278 S. Ekhtiari et al.

29.4 Limitations of Randomized of Type II error for primary outcomes in ortho-


Controlled Trials paedic RCTs was over 90% [35].
A challenge unique to surgical RCTs is the
Randomized controlled trials do present certain issue of surgeon skill and experience. Difficult
challenges when it comes to surgical disciplines to quantify, variation in surgical skills can cer-
and particularly when an operative intervention tainly have an impact on the outcomes of an
is included as one of the treatment arms. Many RCT. This is particularly true in cases where the
of these concepts are discussed above; below is a surgeon involved is much more comfortable
summary of the major limitations of RCTs for with one or the other of the treatment arms [50].
assessing operative interventions. Certain spe- The issue of a learning curve for a surgeon per-
cific study designs, such as crossover designs, forming a new procedure is also a well-estab-
are simply not an option with surgical interven- lished phenomenon, with a certain number of
tion [38]. As well, while “placebo” (i.e. sham cases required before a steady state is reached
surgery) trials have been conducted and have [23]. Thus, the use of an expertise-based design
provided some very important evidence (e.g. in for orthopaedic RCTs has increased in recent
the case of knee arthroscopy), they do present years, though they remain relatively uncommon
ethical and public relations challenges [41]. The overall [16]. Finally, in a field where technology
intention-­to-­treat principle can be difficult and is always rapidly evolving and innovation is
confusing to apply if the trial is comparing an constant, the pace of clinical research can at
operative treatment group to other treatment times be too slow to keep up with and evaluate
groups [38]. As well, surgical technique and these changes [11].
expertise vary between surgeons and institu-
tions. An interesting example of how this may
skew the results of an otherwise well-conducted Fact Box 29.4: Limitations of RCTs in
trial is in the case of a large, multicentre RCT of Orthopaedic Surgery
displaced intracapsular hip fracture manage- Conducting an RCT in orthopaedic surgery
ment: consultant surgeons were nearly twice as presents a number of unique challenges.
likely to be present when a patient was allocated Some of these are immutable, while others
to total hip arthroplasty as opposed to fixation or can be managed to some extent with care-
hemiarthroplasty [32]. ful planning. Common examples of these
Another limitation of RCTs may be that their challenges include the use of:
reputation precedes them—in other words, the
results of RCTs are sometimes interpreted with a • Certain study designs (e.g. crossover
high degree of confidence based purely on the designs)
study design. In reality, however, between one-­ • True “placebo” arms
third and one-half of all orthopaedic RCTs are in • The intention-to-treat analysis
fact underpowered to detect an adequate effect • Blinding (especially surgeons and
size [2, 22]. Undoubtedly contributing to this patients)
concerning trend is the fact that less than 10% of
RCTs published in the highest-impact orthopae- Recommended resource: Malavolta EA,
dic journals prospectively calculate the adequate Demange MK, Gobbi RG, Imamura M,
sample size required to achieve sufficient power Fregni F. Randomized Controlled Clinical
[22, 35]. Thus, the risk of Type II error (i.e. Trials in Orthopedics: Difficulties and
accepting the null hypothesis when there truly is Limitations. Rev Bras Ortop. 2011;46:
a significant difference) is significant. Perhaps 452–9.
not surprisingly, Lochner et al. found that the rate
29  Level 1 Evidence: A Prospective Randomized Controlled Study 279

29.5 Knowledge Dissemination Traditional media outlets (e.g. print, radio,


television) represent a direct route for communi-
Knowledge dissemination is an important part of cating the results of scientific studies to the pub-
the scientific process and comes in many forms. lic. An examination of scientific research covered
Publication in scientific journals is often seen as in the news media, by Selvaraj et al., found that
the logical and desirable end goal of conducting the news was more likely to cover observational
research. This view, however, is increasingly rec- studies rather than RCTs and that the studies cov-
ognized as being too narrow [13]. While scientific ered were of relatively low quality (even among
journals remain an important pillar of communica- observational studies) [54]. Thus, it is important
tion within the research community, the results of for scientists to play an active role in interacting
important scientific studies need to be communi- with media outlets, to help accurately and effec-
cated beyond this silo. The general public and tively convey the take-away messages of impor-
local, regional, and national governing bodies are tant studies. Press releases play an important role
important target audiences that should be informed in the flow of knowledge from the scientific com-
about the scientific progress that is being made. munity to the journalistic community. About half
This is important for many reasons. First, of all scientific stories appearing in news media
much of medical research is funded by govern- originate from a press release [66], and 45% of
ment agencies and, thus, public taxpayer money. news stories use the press release as their sole
Furthermore, while publication in scientific jour- source of information when reporting on the
nals has the potential to change physician prac- study [53]. Thus, it is important to ensure that
tice patterns, it lacks the platform required to press releases are accurate, objective, easy-to-­
address systemic and funding barriers. For exam- understand, and enthusiastic but not
ple, if an RCT demonstrates that a certain opera- sensationalistic.
tion is better and more cost-effective than the More recently, social media outlets have
current standard of care, but the government con- become an important vehicle of communication
tinues to incentivize hospitals to perform the in almost every sphere of life, including business
standard of care operation, it would be very dif- [51], marketing [3], and politics [58]. The monthly
ficult for any individual surgeon to make the tran- audience for Facebook, Twitter, and online blog-
sition to the new technique. Finally, a scientifically ging websites have far surpassed the readership of
literate public has the knowledge and tools to even the most popular forms of print media [10].
make informed decisions about personal and Sharing the progress and results of scientific stud-
societal level issues [18]. ies through social media can help to engage the
Of course, publication within peer-reviewed public and to raise the profile of a given study in
scientific journals remains one important compo- the scientific community. Highly tweeted journal
nent of communicating the results of an articles are up to 11 times more likely to become
RCT.  Various sections within this book address highly cited than those without social media cov-
the process of preparing a scientific manuscript. erage [20]. Researchers looking to engage with
With specific regard to RCTs, the CONSORT social media should do so with knowledge and
statement is an important document that should awareness of its intricacies. It is very difficult, if
be reviewed regularly prior to, during, and after not impossible, to boil down a 3000-word RCT
the entire RCT process. This statement contains a manuscript into 140 characters with the intended
detailed road map to accurately and consistently message and necessary level of nuance. Social
communicate the results of an RCT [43]. These media experts or training may be sought out to
guidelines should be adhered to when submitting ensure that these new outlets are used to their best
a RCT manuscript to a scientific, peer-reviewed effects without compromising or misrepresenting
journal. the scientific process.
280 S. Ekhtiari et al.

29.6 Conclusion face some unique challenges, such as the diffi-


culty with blinding surgeons and sometimes
Randomized controlled trials are generally patients, the application of the intention-to-treat
regarded as the gold standard of scientific evi- analysis, variability in expertise and technique
dence, because they are designed to control for as between surgeons, and difficulty keeping up with
many sources of bias as possible. In planning an the pace of technological advances. After an
RCT, the first step is to come up with a good RCT, knowledge can be disseminated in many
research question based on the FINER and ways, including through publications in peer-­
PICOT criteria. Following this, surveys and pilot reviewed scientific journal, press releases, tradi-
studies can help to identify potential issues and tional media outlets, and social media.
challenges with the RCT.  Once a protocol has
been finalized and the appropriate support teams Take-Home Message
assembled, patients can be recruited. Recruitment • Randomized controlled trials are the gold
may occur sequentially, in “batches” or all at standard of scientific evidence.
once. Subjects are then randomly allocated to one • To conduct an RCT, careful planning is a must.
of two or more groups using sample randomiza- • Review the CONSORT guidelines as well as
tion, block randomization, stratified randomiza- definitions of key terms, including blinded,
tion, or covariate adaptive randomization. It is randomized, controlled, and allocation
important that the allocation is concealed from concealment.
the individual enrolling subjects. Blinding is • If there is blinding, simply state who was
desirable, as it limits the effect of suggested or blinded—do not use terms like “double-blind”
expected outcomes. Blinding can be performed or “triple-blind” without further elaboration.
for any or all of five groups of individuals: (1) the • Start with a clear, concise, and well-defined
subjects, (2) the care providers, (3) the outcome research question—do not rush past this step,
adjudicator(s), (4) the data collector(s), and (5) and remember the acronyms FINER and
the data analyst(s). Choice of control group PICOT.
depends on ethical considerations and the pres- • Surveying experts in the field and conducting
ence of a current standard of care and can include a pilot RCT can help to identify unexpected
no treatment controls, placebo controls, and/or barriers, test potential solutions, and aid sam-
active treatment controls. Try to anticipate and ple size calculations.
pre-empt issues with loss to follow-up. Involve • Plan and budget for every step of the process,
expert help for the statistical analysis process. from clipboards to management staff.
Finally, get the word out—to your colleagues, • Apply for REB approval early, and register the
government agencies, the public, and any other trial on the appropriate publicly available reg-
important stakeholders. istry before starting.
All attrition and adverse events during follow- • Ensure that patient recruitment is performed
­up should be documented, even if unrelated to the in a way that is effective but not coercive or
intervention in question. During the statistical disruptive to patient care.
analysis process, it is important to understand the • Use a truly random allocation method, prefer-
concepts of intention-to-treat and superiority vs. ably a secure computer-generated software.
non-inferiority analyses. Baseline differences • Blind as many groups as possible—this may
between groups should not be subjected to tests require some creativity, but is often more fea-
of statistical significance. Orthopaedic RCTs sible than would appear at first glance.
29  Level 1 Evidence: A Prospective Randomized Controlled Study 281

Appendix: Useful Inexpensive Resources


Title Link Description
CONSORT http://www.consort- Set of evidence-based requirements for accurate and consistent
Statement and statement.org reporting of RCTs
Website
GraphPad http://www.graphpad.comA user-friendly website with many free resources including
statistical guides and calculators that can be used for simple
statistical operations
SurveyMonkey http://www.surveymonkey. An intuitive survey platform that allows design and distribution
com of visually attractive, user-friendly surveys. Most accounts are
free or relatively inexpensive. Note that not all accounts are
compliant with health information privacy legislation
DSS Research http://www.dssresearch. A clinical research website with many free resources, including
Knowledge Center com/KnowledgeCenter. webinars and a free and intuitive sample size calculator
aspx
ClinicalTrials and EU http://www.clinicaltrials. Large international trial registries where new trials can be
Clinical Trials gov registered, and ongoing trials can be searched
Register http://www.
clinicaltrialsregister.eu
OxMaR http://www.ccmp.ox.ac.uk/ A free, open-source, randomization software developed by the
oxmar Nuffield Department of Clinical Medicine

8. Bhandari M, Richards RR, Sprague S, Schemitsch


References EH. The quality of reporting of randomized trials in the
journal of bone and joint surgery from 1988 through
1. Abdulatif M, Mukhtar A, Obayah G, Hardman 2000. J Bone Joint Surg Am. 2002;84-A(3):388–96.
JG. Pitfalls in reporting sample size calculation in ran- 9. Bhatt A. Evolution of clinical research: a his-
domized controlled trials published in leading anaes- tory before and beyond james lind. Perspect
thesia journals: a systematic review. Br J Anaesth. Clin Res. 2010;1(1):6–10. https://doi.
2015;115(5):699–707. https://doi.org/10.1093/bja/ org/10.4103/2229-3485.103599.
aev166. 10. Bik HM, Goldstein MC.  An introduction to social
2. Abdullah L, Davis DE, Fabricant PD, Baldwin K, media for scientists. PLoS Biol. 2013;11(4):e1001535.
Namdari S.  Is there truly “no significant differ- https://doi.org/10.1371/journal.pbio.1001535.
ence”? Underpowered randomized controlled tri- 11. Bothwell LE, Greene JA, Podolsky SH, Jones

als in the orthopaedic literature. J Bone Joint Surg DS.  Assessing the gold standard—lessons from the
Am. 2014;97(24):2068–73. https://doi.org/10.2106/ history of RCTs. N Engl J Med. 2016;374(22):2175–
JBJS.O.00012. 81. https://doi.org/10.1056/NEJMms1604593.
3. Akar E, Topçu B. An examination of the factors influ- 12. Bothwell LE, Podolsky SH.  The emergence of

encing consumers’ attitudes toward social media mar- the randomized, controlled trial. N Engl J Med.
keting. J Internet Commer. 2011;10(1):35–67. https:// 2016;375(6):501–4. https://doi.org/10.1056/
doi.org/10.1080/15332861.2011.558456. NEJMp1604635.
4. Akobeng AK.  Understanding randomised controlled 13. Brownell SE, Price JV, Steinman L.  Science com-
trials. Arch Dis Child. 2005;90(8):840–4. https://doi. munication to the general public: why we need to
org/10.1136/adc.2004.058222. teach undergraduate and graduate students this skill
5. Amberson J, McMahon B, Pinner M. A clinical trial as part of their formal scientific training. J Undergrad
of sanocrysin in pulmonary tuberculosis. Am Rev Neurosci Educ. 2013;12(1):E6–E10. http://www.
Tuberc. 1931;24:401–35. pubmedcentral.nih.gov/articlerender.fcgi?artid=3852
6. Aslam A, Imanullah S, Asim M, El-Menyar 879&tool=pmcentrez&rendertype=abstract.
A. Registration of clinical trials: is it really needed? 14. Bubbar V, Kreder H. The intention-to-treat principle:
N Am J Med Sci. 2013;5(12):713–5. https://doi. a primer for the orthopaedic surgeon. J Bone Joint
org/10.4103/1947-2714.123266. Surg Am. 2006;88(9):2097–9.
7. Bhandari M, Guyatt G, Jeray K, et al. Fluid Lavage 15. Campbell AJ, Bagley A, Van Heest A, James

of Open Wounds (FLOW): a multicenter, blinded, MA.  Challenges of randomized controlled surgical
factorial pilot trial comparing alternative irrigating trials. Orthop Clin North Am. 2010;41(2):145–55.
solutions and pressures in patients with open frac- https://doi.org/10.1016/j.ocl.2009.11.001.
tures. J Trauma. 2011;71(3):596–606. https://doi. 16. Cook JA, Elders A, Boachie C, et  al. A systematic
org/10.1097/Ta.0b013e3181f6f2e8. review of the use of an expertise-based randomised
282 S. Ekhtiari et al.

controlled trial design. Trials. 2015;16(1):241. https:// Joint Surg Am. 2008;90(5):1026–33. https://doi.
doi.org/10.1186/s13063-015-0739-5. org/10.2106/JBJS.G.00963.
17. Dettori J.  The random allocation process: two
31.
Karanicolas PJ, Farrokhyar F, Bhandari
things you need to know. Evid Based Spine M.  Blinding: who, what, when, why, how? Can J
Care J. 2010;1(3):7–9. https://doi.org/10.105 Surg. 2010;53(5):345–8. https://doi.org/10.1007/
5/s-0030-1267062. s10269-005-0145-9.
18.
Eagleman DM.  Why public dissemination 32. Keating JF, Grant A, Masson M, Scott NW, Forbes
of science matters: a manifesto. J Neurosci. JF.  Displaced intracapsular hip fractures in fit, older
2013;33(30):12147–9. https://doi.org/10.1523/ people: a randomised comparison of reduction and fixa-
JNEUROSCI.2556-13.2013. tion, bipolar hemiarthroplasty and total hip arthroplasty.
19. Euser AM, Zoccali C, Jager KJ, Dekker FW. Cohort Health Technol Assess. 2005;9(41):iii–iv, ix–x, 1–65.
studies: prospective versus retrospective. Nephron 33. Kennedy A, Grant A.  Subversion of allocation in

Clin Pract. 2009;113(3):c214–7. https://doi. a randomised controlled trial. Control Clin Trials.
org/10.1159/000235241. 1997;18(Suppl 3):S77–88.
20. Eysenbach G.  Can tweets predict citations? Metrics 34. Leon AC, Davis LL, Kraemer HC.  The role and

of social impact based on twitter and correlation with interpretation of pilot studies in clinical research.
traditional metrics of scientific impact. J Med Internet J Psychiatr Res. 2011;45(5):626–9. https://doi.
Res. 2011;13(4):e123. https://doi.org/10.2196/ org/10.1016/j.jpsychires.2010.10.008.
jmir.2041. 35. Lochner HV, Bhandari M, Tornetta P.  Type-II error
21. Farrugia P, Petrisor BA, Farrokhyar F, Bhandari
rates (beta errors) of randomized trials in orthopaedic
M.  Practical tips for surgical research: research trauma. J Bone Joint Surg Am. 2001;83(11):1650–5.
questions, hypotheses and objectives. Can J Surg. https://doi.org/10.2106/00004623-200111000-00005.
2009;53(4):278–81. https://doi.org/10.1503/cjs.036311. 36.
Louw A, Diener I, Fernández-de-las-Peñas C,
22.
Freedman KB, Back S, Bernstein J.  Sample Puentedura EJ.  Sham surgery in orthopedics: a
size and statistical power of randomised, con- systematic review of the literature. Pain Med.
trolled trials in orthopaedics. J Bone Joint Surg 2017;18(4):736–50. https://doi.org/10.1093/pm/
Br. 2001;83(3):397–402. papers3://publication/ pnw164.
uuid/611B66F4-34E7-4087-878C-527960117276. 37. Makel MC, Plucker JA. Facts are more important than
23. Gofton WT, Solomon M, Gofton T, et  al. What do novelty: replication in the education sciences. Educ
reported learning curves mean for orthopaedic sur- Res. 2014;43(6):304–16. https://doi.org/10.3102/001
geons? Instr Course Lect. 2016;65:633–43. 3189x14545513.
24. Haynes R.  Forming research questions. In: Haynes 38. Malavolta EA, Demange MK, Gobbi RG, Imamura
R, Sacket D, Guyatt G, Tugwell P, editors. Clinical M, Fregni F.  Randomized controlled clinical trials
epidemiology: how to do clinical practice research. in orthopedics: difficulties and limitations. Rev Bras
Philadelphia: Lippincott Williams & Watkins; 2006. Ortop. 2011;46(4):452–9. https://doi.org/10.1016/
p. 3–14. s2255-4971(15)30261-5.
25. Helmy A, Timofeev I, Santarius T, Hutchinson
39. Mandal J, Acharaya S, Parija SC.  Ethics in human
P.  What constitutes clinical equipoise. Br J research. Trop Parasitol. 2011;1(1):2–3.
Neurosurg. 2009;23(5):564–5. https://doi. 40.
Marchant GE, Scheckel K, Campos-outcalt
org/10.1080/02688690903029760. D.  Contrasting medical and legal standards of evi-
26. Hewitt C. Adequacy and reporting of allocation con- dence: a precision medicine case study. J Law
cealment: review of recent trials published in four gen- Med Ethics. 2016;44(1):194–204. https://doi.
eral medical journals. BMJ. 2005;330(7499):1057–8. org/10.1177/1073110516644210.
https://doi.org/10.1136/bmj.38413.576713.AE. 41. Mehta S, Myers T, Lonner J, Huffma R, Sennett
27. Higgins J, Green S. Cochrane handbook for systematic BJ.  The ethics of sham surgery in clinical
reviews of interventions version 5.1.0 [updated March orthopaedic research. J Bone Joint Surg Am.
2011]. Cochrane Collab. 2011;0(March):10–11. http:// 2007;89(7):1650–3.
www.cochrane.org/training/cochrane-handbook. 42. Misra S.  Randomized double blind placebo control
28. Hip Fracture Accelerated Surgical Treatment and Care studies, the “gold standard” in intervention based stud-
Track (HIP ATTACK) Investigators. Accelerated care ies. Indian J Sex Transm Dis AIDS. 2012;33(2):131–
versus standard care among patients with hip fracture: 4. https://doi.org/10.4103/0253-7184.102130.
the HIP ATTACK pilot trial. CMAJ. 2014;186(1):E52– 43. Moher D, Hopewell S, Schulz KF, et al. CONSORT
60. https://doi.org/10.1503/cmaj.130901. 2010 explanation and elaboration: updated guidelines
29. Hulley SB, Cummings SR, Browner WS, Grady
for reporting parallel group randomised trials. Int J
DG, Newman TB.  Designing clinical research, vol. Surg. 2012;10(1):28–55. https://doi.org/10.1016/j.
78. Philadelphia: Lippincott Williams & Wilkins; ijsu.2011.10.001.
2007. 44. Mundi R, Chaudhry H, Mundi S, Godin K, Bhandari
30. Karanicolas PJ, Bhandari M, Taromi B, et al. Blinding M. Design and execution of clinical trials in orthopae-
of outcomes in trials of orthopaedic trauma: an oppor- dic surgery. Bone Joint Res. 2014;3(5):161–8. https://
tunity to enhance the validity of clinical trials. J Bone doi.org/10.1302/2046-3758.35.2000280.
29  Level 1 Evidence: A Prospective Randomized Controlled Study 283

45. Nissen T, Wynn R. The history of the case report: a Intensive Crit Care Nurs. 2013;29(6):300–9. https://
selective review. JRSM Open. 2014;5(4):5. https:// doi.org/10.1016/j.iccn.2013.04.006.
doi.org/10.1177/2054270414523410. 56. Speich B. Blinding in surgical randomized clinical tri-
46. Petrisor B, Jeray K, Schemitsch E, et al. Fluid lavage als in 2015. Ann Surg. 2017;266(1):21–2. https://doi.
in patients with open fracture wounds (FLOW): org/10.1097/SLA.0000000000002242.
an international survey of 984 surgeons. BMC 57. Sprague S, Quigley L, Bhandari M.  Survey design
Musculoskelet Disord. 2008;9(1):7. https://doi. in orthopaedic surgery: getting surgeons to respond.
org/10.1186/1471-2474-9-7. J Bone Joint Surg Am. 2009;91(Suppl 3):27–34.
47. Raynauld J-P, Buckland-Wright C, Ward R, et  al.
https://doi.org/10.2106/jbjs.h.01574.
Safety and efficacy of long-term intraarticular steroid 58. Stieglitz S, Dang-Xuan L. Social media and political
injections in osteoarthritis of the knee: a random- communication: a social media analytics framework.
ized, double-blind, placebo-controlled trial. Arthritis Soc Netw Anal Min. 2013;3(4):1277–91. https://doi.
Rheum. 2003;48(2):370–7. https://doi.org/10.1002/ org/10.1007/s13278-012-0079-3.
art.10777. 59. Suresh K.  An overview of randomization tech-

48. Richardson WS, Wilson MC, Nishikawa J, Hayward niques: an unbiased assessment of outcome in clinical
RS. The well-built clinical question: a key to evidence-­ research. J Hum Reprod Sci. 2011;4(1):8. https://doi.
based decisions. ACP J Club. 1995;123(3):A12–3. org/10.4103/0974-1208.82352.
https://doi.org/10.7326/ACPJC-1995-123-3-A12. 60. Thabane L, Thomas T, Ye C, Paul J. Posing the research
49. Röhrig B, du Prel J-B, Wachtlin D, Kwiecien R,
question: not so simple. Can J Anesth. 2009;56(1):
Blettner M. Sample size calculation in clinical trials. 71–9. https://doi.org/10.1007/s12630-008-9007-4.
Dtsch Arztebl Int. 2010;107(31–32):552–6. https:// 61. The FLOW Investigators. A trial of wound irrigation
doi.org/10.3238/arztebl.2010.0552. in the initial management of open fracture wounds. N
50. Roman H, Marpeau L, Hulsey TC.  Surgeons’ expe- Engl J Med. 2015;373:2629–41.
rience and interaction effect in randomized con- 62. The FLOW Investigators. Fluid lavage of open

trolled trials regarding new surgical procedures. Am wounds (FLOW): design and rationale for a large,
J Obstet Gynecol. 2008;199(2):108.e1–6. https://doi. multicenter collaborative 2 × 3 factorial trial of irri-
org/10.1016/j.ajog.2008.03.002. gating pressures and solutions in patients with open
51.
Rothschild PC.  Social media use in sports fractures. BMC Musculoskelet Disord. 2011;11:85.
and entertainment venues. Int J Event Festival 63. Thoma A, Farrokhyar F, Mcknight L, Bhandari

Manag. 2011;2(2):139–50. https://doi. M.  How to optimize patient recruitment. Can J
org/10.1108/17582951111136568. Surg. 2010;53(3):205–10. https://doi.org/10.1503/
52.
Schulz KF, Chalmers I, Hayes RJ, Altman cjs.018012.
DG.  Empirical evidence of bias. Dimensions of 64. Thoma A, McKnight L, McKay P, Haines T. Forming
methodological quality associated with estimates the research question. Clin Plast Surg. 2008;35(2):189–
of treatment effects in controlled trials. JAMA. 93. https://doi.org/10.1016/j.cps.2007.10.009.
1995;273(5):408–12. https://doi.org/10.1001/ 65. Vavken P.  Rationale for and methods of superior-
jama.273.5.408. ity, noninferiority, or equivalence designs in ortho-
53. Schwitzer G.  How do US journalists cover treat-
paedic, controlled trials. Clin Orthop Relat Res.
ments, tests, products, and procedures? An evaluation 2011;469(9):2645–53. https://doi.org/10.1007/
of 500 stories. PLoS Med. 2008;5(5):e95. https://doi. s11999-011-1773-6.
org/10.1371/journal.pmed.0050095. 66. Viswanath K, Blake KD, Meissner HI, et  al.

54. Selvaraj S, Borkar DS, Prasad V.  Media coverage Occupational practices and the making of health news:
of medical journals: do the best articles make the a national survey of U.S. health and medical science
news? PLoS One. 2014;9(1):e85355. https://doi. journalists. J Health Commun. 2008;13(8):759–77.
org/10.1371/journal.pone.0085355. https://doi.org/10.1080/10810730802487430.
55. Smith OM, McDonald E, Zytaruk N, et al. Enhancing 67. WMA.  Declaration of Helsinki. Vol. 353. 1974.

the informed consent process for critical care https://doi.org/10.2471/BLT.08.057737.
research: strategies from a thromboprophylaxis trial.
Level 1 Evidence: Long-Term
Clinical Results
30
Daisuke Araki and Ryosuke Kuroda

30.1 Manuscript meant to teach orthopedic surgeons the manner in


which evidence can be evaluated and applied in
Evidence-based medicine (EBM) is the conscien- their practice. Among them, studies categorized
tious, explicit, and judicious use of current best evi- as level 1 include RCTs and systematic reviews of
dence in making decisions about the care of level 1 RCTs in therapeutic studies, prospective
individual patients [8]. Since the Canadian Task studies and systematic reviews of level 1 studies
Force on the periodic health examination originally in prognostic studies, testing of previously devel-
described the level of evidence in 1979 [12], the oped diagnostic criteria in series of consecutive
level of evidence system is an evidence-­based med- patients and systematic reviews of level 1 studies
icine tool that applies a hierarchal rating to a study’s in diagnostic studies, and clinically sensible costs
strength of evidence based on its study design. and values obtained from many studies and sys-
Nowadays, this system has been quite familiar, as tematic reviews of level 1 studies in economic and
its use has become widespread in medicine. decision analyses [11].
Regarding the evolution of EBM in orthope- Data derived from RCTs is considered to be
dics, the Journal of Bone and Joint Surgery, in the highest level of evidence, mainly because
recognition of the need to integrate clinical exper- randomization is the best way to balance known
tise with the best available systematic research, and the only way to balance unknown prognostic
introduced “Evidence-Based Orthopaedics” in factors within both treatment and control groups
2000 [10]. In the introduction to this new section, in a therapeutic study. On the other hand, it is also
randomized clinical trials (RCT) would form the important to recognize that not all clinical ques-
main contribution, because they are believed to tions can be answered with an RCT. While ran-
provide the highest-quality evidence and there- domization in RCTs can be stratified based on
fore, when available, should influence clinical prognostic factors, in some cases, it would be
decision-making. Among the papers to appear in unethical to actively randomize patients to cer-
this section were a series of four levels of evi- tain types of prognostic or risk factors. Prognostic
dence for primary research question, covering factors of a disease or intervention can be
prognosis, surgical therapies, diagnostic tests, and assessed with a long-term follow-up study design,
economic analyses [2–4]. These articles were which then provides the highest level of evidence
without being an RCT. In addition, there are other
D. Araki (*) · R. Kuroda situations where an RCT may not be feasible, for
Department of Orthopedic Surgery, Kobe University example, when the sample size required is too
Graduate School of Medicine, Kobe, Japan large or the follow-up requires many years [7]. In
e-mail: isuke@pop21.odn.ne.jp

© ISAKOS 2019 285


V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_30
286 D. Araki and R. Kuroda

this context, we will focus on level 1 studies of evaluation of adverse event/effectiveness moni-
the long-term clinical results. toring, and (6) statistical analysis. In particular,
the research design should fully consider the
ethical aspects of patient care, as well as the
Clinical Vignette 1
scientific aspects to accomplish the research
Study title (e.g.,): 20 years’ clinical follow-
and to properly verify the clinical hypothesis.
­up and osteoarthritic change after anterior
Investigators formulated the study question
cruciate ligament (ACL) reconstruction.
before the first patient was enrolled. Next, iden-
The title should include that the topic is
tify the optimal number of cases. If the optimal
important, relevant, and innovative. It is a
number of cases is not determined, it may fail to
summary of the abstract and should be
obtain significant results (i.e., underpowered).
short, descriptive, and interesting.

Clinical Vignette 3
First, it is unethical to design the use of pla-
Has informed consent been obtained? Is
cebo and randomization regarding this topic.
the study approved by the institutional
Patients assigned to the placebo arm of a clinical
review board or ethical committee?
trial must be made to believe they are receiving a
working treatment, even though they are not, for
the placebo effect to play a role at all. However, a 30.1.2 Informed Consent
more serious issue with the use of placebo remains
the possibility that participants are harmed by One of the most important ethical constructs of
receiving a placebo instead of an active treatment clinical ethics, informed consent, is an essential
[6]. If the patients are divided into two groups, (1) condition both for therapy and research.
arthroscopic inspection and ACL reconstruction Information that makes consent valid is generally
and (2) arthroscopic inspection only, the placebo thought to include the understanding of the risks
may expose the patients to higher levels of pain and benefits of the treatment(s) that patients may
and aggravate their condition. receive; understanding of the procedures that the
Second, concepts of study methodology are participant may undergo, including, in the case of
crucial to consider when placing a study into the RCTs, blinding and randomization; understand-
levels of evidence. There are some that advocate ing that participation in research is voluntary;
dividing the hierarchy levels into sub-levels and, finally, understanding the purpose of the
based, in part, on study methodology. Others sug- research [6].
gest that poor methodology will take a study
down a level [1]. Therefore, in starting clinical Clinical Vignette 4
research, researchers first create a “clinical Who performed the treatment? Who evalu-
research protocol.” ated the patient outcome? Who performed
the statistical analysis?
Clinical Vignette 2
Is the patient cohort adequately reported,
e.g., age, sex distribution, concomitant 30.1.3 Blinding
injuries, surgery, and sufficient power?
Blinding is most often performed in prospective
studies. Personnel who evaluate the outcomes of
30.1.1 Study Design interest may have a belief or suspicion of which
treatment offers the best outcome. They may
The research protocols include (1) principal interpret marginal results in a way that favors
investigator who has the responsibility of the their presupposition if they are privy to the treat-
research, (2) clinical hypothesis, (3) research ment administered. It is important to understand
design, (4) assessment of progress status, (5) who is performing the data collection. Therefore,
30  Level 1 Evidence: Long-Term Clinical Results 287

those who are evaluating the results should be


blinded to the treatment. Another way that bias • Comprehensive initial data collection,
can enter into the unblinded assessment process flexible scheduling, identification of
is through differential encouragement during a locators, data blinding, systematic sub-
performance test [5]. ject tracking, monitoring subject loss
(follow-up rate ≥80%), follow-up
period (≥2  years), and systematically
Clinical Vignette 5 approaching problem cases are manda-
How long should be the follow-up period? tory for the level 1 study.
• It is also essential to analyze the data
with adequate statistics and sample size.
30.1.4 Follow-Up Rate and Period

The number lost to follow-up is very important to


know as clearly this can affect the estimate of References
treatment effect. In general, the validity of a study
1. Atkins D, Best D, Briss PA, et  al. Grading quality
may be threatened if more than 20% of patients
of evidence and strength of recommendations. BMJ.
are lost to follow up. Therefore, 80% follow-up 2004;328(7454):1490.
of enrolled patients is necessary. Calculations of 2. Bhandari M, Guyatt GH, Swiontkowski MF.  User’s
results should include a worst-case scenario; that guide to the orthopaedic literature: how to use an arti-
cle about a surgical therapy. J Bone Joint Surg Am.
is, those that are lost to follow-up are considered
2001;83-A(6):916–26.
to have the worst outcome in the treatment group, 3. Bhandari M, Guyatt GH, Swiontkowski MF.  User’s
and those that are lost to follow-up in the control guide to the orthopaedic literature: how to use an
group have the best outcome. If there is still a article about prognosis. J Bone Joint Surg Am.
2001;83-A(10):1555–64.
treatment effect seen between the groups, then
4. Bhandari M, Morrow F, Kulkarni AV, Tornetta P 3rd.
this makes a more compelling argument for the Meta-analyses in orthopaedic surgery. A systematic
treatment effect observed being a valid estimate review of their methodologies. J Bone Joint Surg Am.
of the truth. When conducting a clinical trial, 2001;83-A(1):15–24.
5. Dettori J.  Class or level of evidence: epidemiologic
researchers attempt to minimize data loss. Careful
basis. Evid Based Spine Care J. 2012;3(3):9–12.
planning of research protocols, including com- 6. Nardini C.  The ethics of clinical trials.
prehensive initial data collection, identification Ecancermedicalscience. 2014;8:387.
of locators, flexible scheduling, systematic sub- 7. Poolman RW, Petrisor BA, Marti RK, Kerkhoffs GM,
Zlowodzki M, Bhandari M.  Misconceptions about
ject tracking, monitoring subject loss, and sys-
practicing evidence-based orthopedic surgery. Acta
tematically approaching problem cases, can Orthop. 2007;78(1):2–11.
ensure high follow-up rates [9]. 8. Sackett DL, Rosenberg WM, Gray JA, Haynes RB,
Regarding follow-up period, precise definition Richardson WS. Evidence based medicine: what it is
and what it isn't. BMJ. 1996;312(7023):71–2.
of “long-term” has not been determined yet.
9. Woolard RH, Carty K, Wirtz P, et  al. Research fun-
However, according to the instructions of high-­ damentals: follow-up of subjects in clinical tri-
quality journal in orthopedics, the observation als: addressing subject attrition. Acad Emerg Med.
period in clinical research is specified to be mini- 2004;11(8):859–66.
10. Wright JG, Swiontkowski MF.  Introducing a new

mum 2  years. Investigators should design the
journal section: evidence-based orthopaedics. J Bone
follow-up period along these instructions. Joint Surg Am. 2000;82(6):759–60.
11.
Wright JG, Swiontkowski MF, Heckman
JD.  Introducing levels of evidence to the journal. J
Fact Box Bone Joint Surg Am. 2003;85-A(1):1–3.
12. The periodic health examination. Canadian Task

• The careful planning of research proto- Force on the Periodic Health Examination. Can Med
cols are the key points to accomplish the Assoc J. 1979;121(9):1193–254.
high-quality clinical follow-up results.
Level 2 Evidence: Prospective
Cohort Study
31
Naomi Roselaar, Niv Marom, and Robert G. Marx

31.1 Introduction duced the assignment of levels of evidence to all


clinical articles in 2012 [15].
31.1.1 Levels of Evidence In addition to aiding classification, levels of
evidence help peer-reviewed orthopedics jour-
Levels of evidence are used in evidence-based nals maintain high standards of research quality.
medicine to categorize clinical research [4]. Improving quality was the primary reason for
Level 1 evidence is considered the most rigorous assigning levels of evidence in the Journal of
and is assigned to randomized controlled trials Hand Surgery beginning in November 2005 [8].
with proper blinding, appropriate randomiza- However, the quality of studies published in the
tion, and high follow-up rates. Level 2 evidence Journal of Bone and Joint Surgery (American)
is considered slightly less rigorous and is increased over time even before the introduction
assigned to prospective cohort studies. This is of levels of evidence [6]. In 2009, a retrospective
considered more rigorous than level 3 evidence, assessment found that between 1975 and 2005,
retrospective cohort studies, or level 4 evidence, the percentage of level 1 studies published in the
case series and case studies. The hierarchy of Journal of Bone and Joint Surgery (American)
levels of evidence is based on the risk of bias and increased from 4 to 21% [6].
systematic error in each type of study [4].
However, studies comparing treatment effects
among randomized and observational studies 31.1.2 Prospective Cohort Study
showed neither consistent nor systematic overes-
timation of treatment effects in nonrandomized Level 2 prospective cohort studies are useful for
studies [1, 5]. assessing outcomes in patients with different
Levels of evidence are a relatively new stan- treatments or characteristics [7]. As such, level 2
dard of classification in orthopedic research. prospective cohort studies have either a control
The Journal of Bone and Joint Surgery began group or second, distinct treatment group for
assigning all articles a level of evidence in 2003 comparison against the primary treatment group.
[17]. The Journal of Orthopedic Trauma intro- Prospective cohort studies can be therapeutic or
prognostic [9]. Prospective cohort studies require
the research hypothesis and question to be deter-
N. Roselaar · N. Marom · R. G. Marx (*) mined prior to recruitment and enrollment of par-
Hospital for Special Surgery, New York, NY, USA ticipants [16]. With the predetermined research
e-mail: roselaarn@hss.edu; maromn@hss.edu; question, investigators obtain data about the
marxr@hss.edu

© ISAKOS 2019 289


V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_31
290 N. Roselaar et al.

patient and intervention from the beginning of Prospective cohort studies also allow investi-
the study. Outcomes that occur over the course of gators to examine multiple outcomes simultane-
the study period are assessed [3]. ously [16].

Fact Box 31.1 31.2.2 Limitations


In level 2 prospective cohort studies, the
research question and hypothesis are estab- Prospective studies have the potential for
lished prior to recruitment and enrollment decreased internal validity because of suscepti-
of subjects. bility to selection bias. Unlike randomized con-
trolled trials, prospective cohort studies may use
patient or physician preference when determin-
ing treatment allocation. When assessing the dif-
31.2 Benefits and Limitations ferences in outcomes among treatment and
control groups, factors that went into deciding
31.2.1 Benefits which intervention each patient received must
also be considered to mitigate selection bias [3].
Prospective cohort studies are observational, Longitudinal studies, such as prospective
which makes them less difficult to execute than cohort studies, may have higher proportions of
randomized controlled trials. They also tend to be subjects lost to follow-up. This contributes to
less disruptive to patient flow within a clinic. bias if the reasons for loss of follow-up are related
Logistically, coordinating an observational study to the outcome [12]. A low follow-up rate that
requires less interference with standard clinical impacts study quality may decrease the level of
practices than does coordinating a study with evidence of the study [9]. For studies concerning
proper blinding and randomization [3]. rare diseases or diseases with long latency,
When designing prospective cohort studies, designing a prospective cohort study may be
investigators determine data collection methods impractical and/or unfeasible [16].
specific to their research question prior to
enrolling patients. This high level of specifica-
tion in the method of data collection gives 31.3 Controlled Prospective
investigators greater control over the data col- Studies
lected compared to retrospective studies. It also
ensures specificity in the data collected [16]. 31.3.1 Controlled Prospective Cohort
The likelihood of reporting relevant risk factors Study
and outcomes is higher in prospective cohort
studies [12]. With retrospectively collected In a controlled prospective cohort study, the treat-
data, the research question is limited by exist- ment group is compared to a control group or
ing data [16]. nontreatment group.
Prospective cohort studies minimize recall
bias because data is collected longitudinally, in
Clinical Vignette 1
real time. In prospective studies, the data collec-
“Telephone-Based Intervention to Improve
tion occurs as the outcome of interest progresses
Rehabilitation Engagement After Spinal
[12], whereas in retrospective case-control stud-
Stenosis Surgery” (JBJS, 2018).
ies, case (those with a disease) and control
(healthy) participants are susceptible to provid-
ing information with varying degrees of accuracy Investigators from the Departments of
on their exposure to the disease or outcome of Orthopaedic Surgery, Physical Medicine, and
interest [13]. Rehabilitation at the Johns Hopkins University
31  Level 2 Evidence: Prospective Cohort Study 291

School of Medicine designed an uncontrolled


prospective cohort study to assess rehabilitation Clinical Vignette 2
engagement following spinal stenosis surgery Treatment of Biceps Tendon Lesions in the
[14]. Their goal was to compare the effectiveness Setting of Rotator Cuff Tears – Prospective
of the “usual care” to a new telephone-based Cohort Study of Tenotomy Versus Tenodesis
counseling system on rehabilitation engagement. (AJSM 2010).
In this study, “usual care” included physical ther-
apy following surgical treatment and a follow-up
appointment with a physical exam and assess- compared postsurgical functional outcomes in
ment of radiographic imaging. The 60 patients patients presenting with rotator cuff tears and
that were prospectively enrolled into “usual care” biceps tendon lesions [11]. Of the 90 subjects
made up the control group. All subjects in both included in the study, 45 underwent rotator cuff
the control and intervention groups were enrolled repair with biceps tenotomy, and 45 were treated
before undergoing surgery. The intervention with rotator cuff repair and biceps tenodesis.
group comprised 65 patients who received 3 Subjects in both groups received the same post-
health behavior change counseling phone calls operative immobilization and rehabilitation. To
(one preoperatively and two postoperatively) in assess functional outcomes and satisfaction,
addition to physical therapy and a follow-up investigators administered physical exams and
assessment. According to the study, the health questionnaires to both intervention groups at fol-
behavior change counseling phone calls included low-­up appointments. A significant statistical dif-
“motivational interviewing strategies that elicit ference in the number of patients with Popeye
and strengthen motivation for change.” For deformity was found between those in the teno-
patients receiving this telephone-based interven- desis (9.3%) and tenotomy (26.8%) groups.
tion, the investigators hypothesized better health Functional outcomes, clinical scores, and total
outcomes related to rehabilitation. As expected, surgical times were similar for the two groups.
the intervention group showed better pain, dis- In this prospective cohort study, the outcomes
ability, and physical health outcomes at of two different surgical interventions (tenotomy
12 months, likely due to physical therapy engage- and tenodesis) for the same diagnosis (biceps
ment and attendance. During the second and third tendon lesions) were compared. Like the study of
years that all subjects were followed, differences telephone-based rehabilitation engagement,
in outcomes between the control and intervention patient groups were determined according to
groups decreased [14]. each subject’s enrollment date [11].
This level 2 prospective cohort study was an
appropriate method for assessing patient engage-
ment. Prior to beginning the study, investigators Fact Box 31.2
determined that subjects would be assigned to Controlled prospective cohort studies
either the control or intervention group. Once involve either a comparison between a
enrolled, all patients completed outcome ques- treatment group and a control group or
tionnaires, which allowed investigators to follow between two different treatment groups.
the patient outcomes over time. This was not a
randomized cohort study, because the group
assignments were decided by enrollment date.
Importantly, in this controlled study, the investi- 31.4 Prognostic Prospective
gators determined that the control group was pro- Cohort Study
vided sufficient care without receiving the
intervention. The care received by the control Prognostic studies assess the effect of a patient
group was standard of care. characteristic on the outcome of a disease [9].
Orthopedic surgeons from the Sungkyunkwan Patient characteristics can be behavioral, such as
University School of Medicine in Seoul, Korea, level of athletic training or smoking status; physi-
292 N. Roselaar et al.

cal, as in classes of obesity; or based on a genetic outcome of interest. Pre-existing injuries sus-
trait. Prognostic studies answer the question tained by participating athletes were disregarded
“What is the effect of a patient characteristic on in the injury rate calculation. The presence of
the natural history of the condition?” [9]. varying levels of exposure to injury at the begin-
ning of the study indicates level two evidence [9].

Clinical Vignette 3 Take-Home Message


No Effect of Generalized Joint Hypermobility • Uncontrolled or controlled, therapeutic or not,
on Injury Risk in Elite Female Soccer prospective cohort studies evaluate patient
Players  – A Prospective Cohort Study outcomes in real time over the course of the
(AJSM, 2017). study.
• Classified as level 2 evidence, a prospective
therapeutic cohort study is the most rigorous
In a 2016 study from the Netherlands, the research method aside from a randomized
effect of generalized joint hypermobility (GJH) clinical trial.
on injury rate was investigated in elite female
soccer players [2]. In this prognostic study,
hypermobility due to GJH is the patient charac- References
teristic, while injury rate measured in player inju-
ries per hours of soccer is the outcome of interest. 1. Benson K, Hartz A.  A comparison of observational
studies and randomized, controlled trials. N Engl J
Athletes were classified as hypermobile or non-­ Med. 2000;342(25):1878–86.
hypermobile after screening using the Beighton 2. Blokland D, Thijs KM, Backx FJG, Goedhart
score. With a Beighton score ≥ 4, 20 of the 114 EA, Huisstede BMA.  No effect of generalized
athletes enrolled in the study were classified as joint hypermobility on injury risk in elite female
soccer players: a prospective cohort study. Am
hypermobile. The number of injuries accrued by J Sports Med. 2017;45(2):286–93. https://doi.
standardized injury registration forms was org/10.1177/0363546516676051.
assessed, and non-musculoskeletal injuries such 3. Bryant DM, Willits K, Hanson BP.  Principles of
as concussions as well as non-soccer-related designing a cohort study in orthopaedics. J Bone Joint
Surg Ser A. 2009;91:10–4. https://doi.org/10.2106/
injuries were excluded. At the end of one soccer JBJS.H.01597.
season, investigators concluded that GJH was not 4. Burns PB, Rohrich RJ, Chung KC.  The levels of
a risk factor for injuries in elite female soccer evidence and their role in evidence-based medicine.
players. These results were maintained when the Plast Reconstr Surg. 2011;128(1):305–10. https://doi.
org/10.1097/PRS.0b013e318219c171.
Beighton score threshold for hypermobility was 5. Concato J, Shah N, Horwitz RI.  Randomized, con-
≥3, ≥4, and ≥5 [2]. trolled trials, observational studies, and the hierarchy of
In this case the comparison groups were deter- research designs. N Engl J Med. 2000;342(25):1887–
mined by dividing a spectrum (Beighton score) to 92. https://doi.org/10.1056/NEJM200006223422507.
6. Hanzlik S, Mahabir RC, Baynosa RC, Khiabani
produce two groups for comparison. In addition KT.  Levels of evidence in research published in
to GJH, classes of obesity are another example of The Journal of Bone and Joint Surgery (American
spectrum-based comparison groups. Comparison Volume) over the last thirty years. J Bone Joint
groups for prognostic studies also include dichot- Surg. 2009;91(2):425–8. https://doi.org/10.2106/
JBJS.H.00108.
omous patient characteristics, such as being 7. Hennekens C, Buring J.  Epidemiology in medicine.
hemophiliac vs not being hemophiliac [10]. 1st ed. Boston: Little, Brown; 1987.
Prognostic prospective cohort studies can be 8. Hentz R, Meals R, Stern P, Manske P. Levels of evi-
classified as level one if they are inception cohort dence and the journal of hand surgery. J Hand Surg
Am. 2015;30(5):891–2.
studies [9]. Inception studies require all subjects 9. JBJS Inc. Journals level of evidence. The Journal of
to be enrolled at the same point in their disease. Bone and Joint Surgery. https://journals.lww.com/
This was not the case in the above example. In jbjsjournal/Pages/Journals-Level-of-Evidence.aspx.
the study from the Netherlands, injuries were the Published 2015.
31  Level 2 Evidence: Prospective Cohort Study 293

10. Kapadia BH, Boylan MR, Elmallah RK, Krebs VE, 14. Skolasky RL, Maggard AM, Wegener ST, Riley Iii

Paulino CB, Mont MA. Does hemophilia increase the LH.  Telephone-based intervention to improve reha-
risk of postoperative blood transfusion after lower bilitation engagement after spinal stenosis surgery: a
extremity total joint arthroplasty? J Arthroplast. prospective lagged controlled trial. J Bone Joint Surg.
2016;31:1578–82. https://doi.org/10.1016/j. 2018;100:21–30. https://doi.org/10.2106/JBJS.17.00418.
arth.2016.01.012. 15. Slobogean G, Bhandari M. Introducing levels of evi-
11. Koh KH, Ahn JH, Kim SM, Yoo JC.  Treatment of dence to the journal of orthopaedic trauma: imple-
biceps tendon lesions in the setting of rotator cuff mentation and future directions. J Orthop Traumatol.
tears: prospective cohort study of tenotomy versus 2012;26(3):127–8.
tenodesis. Am J Sports Med. 2010;38(8):1584–90. 16. Song JW, Chung KC.  Observational studies: cohort
https://doi.org/10.1177/0363546510364053. and case-control studies. Plast Reconstr Surg.
12. Sedgwick P.  Prospective cohort studies: advantages 2010;126(6):2234–42. https://doi.org/10.1097/
and disadvantages. BMJ. 2013;347:f6726. https://doi. PRS.0b013e3181f44abc.
org/10.1136/bmj.f6726. 17.
Wright JG, Swiontkowski MF, Heckman
13. Sedgwick P.  What is recall bias? BMJ. 2012;
JD.  Introducing levels of evidence to the journal. J
344:e3519. Bone Joint Surg. 2003;85(1):1–3.
Level III Evidence: A Case-Control
Study
32
Andrew D. Lynch, Adam J. Popchak,
and James J. Irrgang

32.1 Introduction ilar to the case participants. Control participants


are chosen because they do not have the condi-
Observational research, including case-control tion of interest but are chosen from the same
studies, are important study designs that contrib- population as the cases. Because they come from
ute to both general knowledge and hypothesis the same population, it is assumed that differ-
generation [1]. A case-control study departs from ences in the exposure between the cases and con-
the standard process of enrollment and prospec- trols are associated with the development of the
tive observation in randomized controlled trials condition or outcome of interest [5]. Case-control
and observational cohorts. In a case-control studies are especially important when the
study, participants who are known to have the researcher cannot randomize a patient to a par-
condition or outcome of interest are chosen as ticular exposure based on their clinical judgment
cases. Potential contributors to the etiology of the (e.g., randomizing a patient to non-operative
disease are sought by looking backward in the treatment if the researcher does not believe this is
history of the participant to find out if something best for the patient) [8].
in their history may have caused the disease or It is important to differentiate a case-control
condition of interest [2, 6, 9, 12]. This has led study from a prospective cohort study because
some to consider a case-control study as “research both make the argument that an exposure causes
in reverse” [11]. The precipitating factor that may an outcome [5]. In a case-control study, partici-
have contributed to the development of the out- pants are chosen based on whether or not they
come is called an exposure (also independent have the outcome of interest, and their history is
variable, determinant, or predictor) [12]. The reviewed retrospectively to determine their his-
exposure may be a biologic exposure, some char- tory of exposure. In a cohort study, participants
acteristic of the individual, some behavioral are chosen based on their exposure status and fol-
event, or an intervention [11]. lowed prospectively to determine whether they
The distinct advantage of a case-control study develop the outcome of interest. A review of
over a simple case series is the inclusion of a manuscripts purported to report the results of
group of control participants who are largely sim- “case-control studies” found that about one-third
of medical/surgical “case-control” studies were
A. D. Lynch (*) · A. J. Popchak · J. J. Irrgang incorrectly labeled and 97% of rehabilitation
Departments of Physical Therapy and Orthopaedic reports were mislabeled [6]. The most frequent
Surgery, University of Pittsburgh, mislabeling was attributed to a cross-sectional
Pittsburgh, PA, USA study design, with intervention studies,
e-mail: adl45@pitt.edu

© ISAKOS 2019 295


V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_32
296 A. D. Lynch et al.

­ easurement studies, prognostic studies, and


m
cohort studies all being incorrectly labeled as Fact Box 32.2: Advantages and
case-control studies. Disadvantages of Case-Control Studies
Advantages:
• Retrospective study design decreases
Fact Box 32.1: Key Definitions the amount of time and money needed
to complete a study to determine the
Case: a research participant who is known role of an exposure in the etiology of a
to have the condition of interest. condition.
Control: a research participant who • Specific cohorts of individuals with a
does not have the condition of interest but rare disease or outcome can be recruited
is otherwise similar to the case participants to ensure a large enough sample size for
(e.g., selected from the same population as analysis.
the cases).
Exposure: some prior experience for Disadvantages:
both cases and controls that is hypothesized
to lead to the development of the condition • Significant potential for bias to be intro-
of interest. The primary hypothesis in a duced (see additional Fact Box 32.3).
case-control study is that cases and con- • Does not benefit from thorough pro-
trols will differ on their exposure history spective data collection, thus data is
and that difference contributes to the condi- subject to recall bias from participants
tion of interest. and the quality of the medical record.
• Because the outcome is rare, it is usu-
ally not possible to measure the true
incidence and prevalence of the
32.2 Advantages of Case-Control outcome.
Studies • There is significant risk of misclassifica-
tion of the condition when expensive or
A case-control study is a preferable study design invasive tests are not routinely used.
when considering outcomes that are either rare or
that take a long time to develop, which would
make a prospective study not feasible [8]. By 32.3 Disadvantages of Case-­
selecting cases who have already developed the Control Studies
outcome of interest, the researcher knows that
there will be a sample who has the outcome of The retrospective nature of case-control studies
interest. To prospectively study rare conditions, also presents disadvantages, namely, through sev-
the researcher would have to recruit a large num- eral sources of bias [7, 9, 10]. Bias occurs when
ber of participants and spend resources to follow something about the research design or conduct
them to identify those who develop the condition introduces error into the data set [12]. A case-con-
of interest. To prospectively study conditions trol study does not benefit from standardized data
which take a long time to develop, the researcher collection methods early in the process, but
would have to follow a cohort for a very long instead relies on medical chart review and patient
time. By choosing a study population who has recall to provide information about the develop-
already developed the outcome of interest, the ment of the condition. Medical records are sum-
researcher saves two precious resources—time maries of the interaction between provider and
and money [11]. patient and capture the details deemed relevant by
32  Level III Evidence: A Case-Control Study 297

the provider. Medical records are not exhaustive prospective cohort or randomized controlled
records that are monitored for completeness in the trial, but this is balanced by the inexpensive and
same fashion that prospective study case-report potentially rapid completion of the study. These
forms are monitored. Therefore, there may be should not be considered “fatal flaws” but should
information bias associated with the reliance on be factored into the analysis and interpretation of
the medical record as the source of data for a the data and results, as well as the overall discus-
study—more information may be available for a sion [12, 13]. These can be accounted for with
patient who is having negative repercussions good research practice, including the creation of
associated with treatment [11]. detailed medical chart abstraction forms, struc-
Additionally, anything that happens to the tured design of retrospective surveys, adhering to
patient outside of the medical encounter will not strict inclusion and exclusion criteria for select-
be documented in the medical record. For this ing cases and controls, and ensuring that data col-
reason, participants are frequently asked to com- lectors are blinded to case vs. control status
plete surveys about their experiences and expo- whenever possible [2, 11, 12].
sures. The recall of the cases and controls may be
biased differently—a case participant may have
thought intensely about the scenarios that con- Fact Box 32.3: Sources and Types of Bias
tributed to the development of the condition, Information/observation bias occurs
while the control participant may not, leading the when information is gathered differently
case subject to provide more detailed and accu- for the case participants and control partici-
rate information or information that has been pants. This may be due to differences in
overanalyzed by the case participant [2, 9, 11, interview techniques, chart review meth-
12]. Additionally, survey data relies on the par- ods, or survey design that introduces bias in
ticipants being willing to accurately report their support of the research hypothesis.
experiences. In both cases and controls, recall Selection bias occurs when case partic-
bias could introduce potential error into the data ipants and control participants are chosen
set [7, 11]. via different selection criteria. Specifically,
There is also the potential for selection bias case participants and control participants
on the part of the researcher. The most simple must be selected without consideration of
design to execute involves cases and controls their exposure history or any variable asso-
who have received care from a single surgeon or ciated with the exposure.
group practice. However, in selecting cases from Recall bias occurs when something
a single surgeon or group practice, the researcher particular to the cases or controls influ-
only has access to potential cases who chose to ences the recall such that more detailed
follow up with that surgeon or group. The information is gathered from one group.
researcher may not have access to a case who
chose to go elsewhere for further care. In the case
of a practice that specializes in the rare outcome
of interest, it is unlikely that the practice will 32.4 Sample Selection
have performed the original care and may not
have complete access to the medical record, lead- When selecting cases and controls, the researcher
ing to an incomplete data set. Cases and controls must select participants who are comparable in
must be selected from a similar population and terms of both baseline risk of developing the out-
not in a way that introduces unwanted bias into come and the potential to have a complete data
the data set. set related to that individual. It is important that
The case-control design clearly lacks the rig- they come from the same general population and
orous attention to detail in data collection of a have the same general characteristics.
298 A. D. Lynch et al.

32.4.1 Selecting Cases of interest in the population from which the


cases are selected—for example, when selecting
When selecting cases, the researcher should cases who develop a particular postoperative
attempt to identify a homogenous group. The complication, the cases should be selected from
condition of interest for those selected as cases all participants who had the same surgery but
should be diagnosed in a consistent fashion via who did not develop the complication [7, 11].
an appropriate combination of diagnostic tests This promotes the controls to have the same
and clinical findings, including the gold standard baseline risk as the cases. Any exclusionary cri-
as often as possible [11]. The diagnostic criteria teria applied to cases should be applied equally
for identifying a case should be established when to controls [7, 11, 12].
planning the study, and all potential avenues for In a matched case-control study, the investiga-
diagnosing the condition of interest should be tor matches the cases and controls for factors
considered. Misclassification of controls as cases about which there is potential concern for con-
(and vice versa) must be avoided when possible, founding [2, 7, 12]. In group or frequency match-
accounted for in the design phase and sample size ing, the proportion of controls with a given
assessment [4], or handled with statistical sensi- characteristic or trait is identical to proportion of
tivity analyses (see Sect. 32.5) [3]. cases with that trait. In a matched pairs design,
In orthopedics, cases are often selected from for each case selected, a control is selected who is
the practice of the researcher, but to identify a similar in terms of variables of concern (e.g., age,
large enough sample to make some meaningful sex, smoking status). These designs are more
inferences about the condition, the researcher tightly controlled but also present issues with
may need to seek out others to contribute cases to analysis and interpretation, especially when cases
the sample. Clearly defined diagnostic criteria and controls are overmatched. Overmatching
make this process more objective and the results (matching on too many variables) eliminates
more generalizable. potential exposures that may contribute to the
For chronic conditions, the researcher must outcome of interest [7]. As an example, matching
decide whether to include only incident cases cases and controls who have early-stage osteoar-
(newly or recently diagnosed) or whether preva- thritis for injury history related to the meniscus
lent cases (those who have been living with a would eliminate the potential contribution of the
condition for some time) may be included [2, 11]. meniscus injury as a contributor to development
For a condition such as osteoarthritis, there is a of osteoarthritis.
chronic, degenerative process that occurs. Because the case-control design is often used
Including both an individual with significant to study rare conditions, the sample of cases is
osteoarthritis of the knee joint that has been often small. This is often not the case for the con-
worsening for years and an individual who has trol subjects, so a matching of two or three con-
radiologic degeneration without symptoms can trol participants to each case participant may be
lead to substantially different reports of their used [7, 12]. This can increase the power of the
exposure history. study by including a more generalizable cohort
of controls. This also improves the chances that
the control group will have exposure similar to
32.4.2 Selecting Controls the case group. For studies in orthopedics specifi-
cally dealing with reinjury after primary surgery,
As previously mentioned, controls should be it should be quite easy to identify a large group of
selected to be similar to the cases in all respects patients with an index surgery who have not been
except having the outcome of interest [11]. reinjured. This will provide more variability in
Ideally, the control subjects should also be repre- the control dataset and allow for more robust
sentative of all individuals without the outcome comparisons.
32  Level III Evidence: A Case-Control Study 299

32.5 Statistical Analysis used in smaller samples where the true incidence
of the condition of interest is rare (generally less
A case-control study seeks to understand whether than 5%) but still produces an estimate of risk [4,
some exposure (disease, procedure, condition, or 11]. An odds ratio is determined with the follow-
patient characteristic) has any effect on the proba- ing formula:
bility of developing an outcome of interest [7, 9].
a
When reviewing the history of the cases and con- ad
trols, the presence or absence of an exposure OR = c =
b bc
should be obtained from the medical record or via
d
participant survey/interview. Then, each case (out-
come positive) and control (outcome negative) can Because the formula reduces down to a multi-
be classified as having been exposed (exposure plication of opposite corners, it is sometimes
positive) or not (exposure negative). This allows us referred to as a cross product. An odds ratio or
to create a simple 2 × 2 table or contingency table relative risk greater than 1 would indicate that
to illustrate the difference between the exposed exposure increases the risk of developing the out-
and unexposed (Fig. 32.1). We will discuss some come of interest, while an odds ratio or relative
simple statistics that can be used with a 2 × 2 table risk less than 1would indicate that exposure
but recommend the researcher follow-up with a decreases the risk of developing the outcome of
biostatistician for more complex, multivariable interest (i.e., the exposure is protective).
analysis. Additionally, methods for sensitivity
analysis and issues related to sample size estima-
tion are available but are beyond the scope of this 32.6 R
 eporting of Case: Control
chapter [3, 4]. Studies
Typically, contingency tables are used to calcu-
late a risk ratio or relative risk (RR) to convey how When reporting results of a case-control study,
exposure to a predictor puts someone at risk for and even in the planning of a case-control study,
developing the outcome of interest. This is done researchers are recommended to consider the
by dividing the population incidence of the condi- checklist from the Strengthening the Reporting
tion of interest in those exposed [i.e., the propor- of Observational Studies in Epidemiology
tion of exposed individuals who developed the (STROBE) consortium, a standardized guideline
condition (e.g., # of cases/# of exposed or a/a + b)] for the reporting of epidemiological studies in
by the population incidence of the condition of medicine [12, 13]. The STROBE statement has
interest in nonexposed individuals [the proportion guidelines for report titles, abstracts, introduc-
of unexposed individuals who developed the con- tion, methods, results, and discussion sections.
dition (e.g., # of cases/# of ­nonexposed or c/c + d)].
However, the relative risk is not appropriate in a
case-control study because the selection of cases 32.7 Summary
and controls does not accurately reflect the true
population incidence [11]. Case-control studies should be used to retrospec-
In the case-control study, an odds ratio (OR) tively determine the role of an exposure in the
can be used. Note that this is a less precise method etiology of an outcome or condition of interest

Fig. 32.1 Standard Has the Condition of Does not have the


2 × 2 table format Interest Condition of Interest
Exposed a b
Unexposed c d
300 A. D. Lynch et al.

that is rare or takes a long time to develop. 2009;91(Suppl 3):15–20. https://doi.org/10.2106/


JBJS.H.01570.
Because of the retrospective nature, case-control 3. Gilbert R, Martin RM, Donovan J, Lane JA, Hamdy F,
studies can be completed relative quickly and at a Neal DE, Metcalfe C. Misclassification of outcome in
smaller cost than a prospective observational case-control studies: methods for sensitivity analysis.
study. However, the retrospective nature may Stat Methods Med Res. 2016;25:2377–93. https://doi.
org/10.1177/0962280214523192.
introduce multiple types of bias into the data set, 4. Joseph L, Belisle P. Bayesian sample size determina-
and results must be considered in light of the tion for case-control studies when exposure may be
limitations of the retrospective study design. misclassified. Am J Epidemiol. 2013;178:1673–9.
Participants are chosen based on whether they https://doi.org/10.1093/aje/kwt181.
5. Mayo NE, Goldberg MS. When is a case-control study
have the condition of interest (cases) or do not a case-control study? J Rehabil Med. 2009;41:217–
(controls) without regard to exposure. The expo- 22. https://doi.org/10.2340/16501977-0341.
sure is ascertained via chart review, retrospective 6. Mayo NE, Goldberg MS. When is a case-control study
patient survey, or patient interview. An odds ratio not a case-control study? J Rehabil Med. 2009;41:209–
16. https://doi.org/10.2340/16501977-0343.
is used to determine whether exposure increases 7. Morshed S, Tornetta P 3rd, Bhandari M. Analysis of
risk (i.e., odds ratio greater than 1.0) or is protec- observational studies: a guide to understanding statis-
tive (i.e., odds ratio less than 1.0). Despite the tical methods. J Bone Joint Surg Am. 2009;91(Suppl
relative ease with which these studies can be 3):50–60. https://doi.org/10.2106/JBJS.H.01577.
8. Nesvick CL, Thompson CJ, Boop FA, Klimo P Jr.
completed, significant rigor should be put into Case-control studies in neurosurgery. J Neurosurg.
place when designing these studies. 2014;121:285–96. https://doi.org/10.3171/2014.5.
JNS132329.
9. Portney LG, Watkins MP.  Foundations of clinical
research : applications to practice. 2nd ed. Upper
32.8 Useful Resources Saddle River, NJ: Prentice Hall; 2000.
10.
Sackett DL.  Bias in analytic research. J
STROBE Statement Web Site—http://www. Chron Dis. 1979;32:51–63. https://doi.
strobe-statement.org org/10.1016/0021-9681(79)90012-2.
11.
Schulz KF, Grimes DA.  Case-control studies:
research in reverse. Lancet. 2002;359:431–4. https://
doi.org/10.1016/S0140-6736(02)07605-5.
References 12. Vandenbroucke JP, et al. Strengthening the reporting
of observational studies in epidemiology (STROBE):
1. Bhandari M, Morshed S, Tornetta P 3rd, Schemitsch explanation and elaboration. PLoS Med. 2007;4:e297.
EH.  Design, conduct, and interpretation of nonran- https://doi.org/10.1371/journal.pmed.0040297.
domized orthopaedic studies: a practical approach. 13. von Elm E, Altman DG, Egger M, Pocock SJ, Gotzsche
(All) evidence matters. J Bone Joint Surg Am. PC, Vandenbroucke JP, Initiative S. The strengthening
2009;91(Suppl 3):1. https://doi.org/10.2106/ the reporting of observational studies in epidemiology
JBJS.H.01747. (STROBE) statement: guidelines for reporting obser-
2. Busse JW, Obremskey WT. Principles of designing an vational studies. PLoS Med. 2007;4:e296. https://doi.
orthopaedic case-control study. J Bone Joint Surg Am. org/10.1371/journal.pmed.0040296.
Level 4 Evidence: Clinical Case
Series
33
Mitchell I. Kennedy and Robert F. LaPrade

33.1 Introduction studies include case-control or cohort (observa-


tional) and randomized controlled trials (con-
There are many approaches to conducting clini- trolled). Case reports and case series are
cal research for determining trends associated “descriptive,” observational studies that describe
with variable conditions, ranging from the inci- general disease characteristics associated with
dence, causality, and outcomes following treat- patients, places, and time [10]. The level of evi-
ment. Studies are assessed by the level of dence is primarily distinguished by the effects
evidence presented in their findings and proceed that variable biases inflict upon the overall valid-
as follows: controlled trials (Level I), cohort ity of the study’s findings.
studies (Level II), case-control studies (Level
III), case series (Level IV), and expert opinion
(Level V). Two broad variations in clinical Reporting Novel Strategies
research include all clinical research studies, • Aid in developing hypothesis for studies
being classified as either analytic or descriptive of greater level of evidence
[10]. Analytic studies are premised on confirm- • Exclusively report outcomes following
ing/disproving a hypothesis that aims to deter- a novel treatment procedure of a study
mine a causal relation between an exposure and population
the outcomes of a condition [10]. They can be
designed as observational or controlled, resulting
in either the direct selection or randomization of Case series are unlike cohort and case-control
treatment, respectively, by the principal investi- studies in that they do not test a hypothesis or
gator, for the patients of the study [10]. Analytic make use of a comparison group to determine the
efficacy of a treatment. Rather, case series follow
a group of patients over a period of time who
have a similar diagnosis or are being treated with
the same procedure [3, 10]. Case series are vital
for analyzing unusual occurrences of a disease or
designing a hypothesis for a potential prospective
M. I. Kennedy
study [9]. This chapter is structured for defining
The Steadman Philippon Research Institute,
Vail, CO, USA the process of clinical research using a case
e-mail: mkennedy@sprivail.org series, primarily focusing on the process of
R. F. LaPrade (*) design, reporting of outcomes, and the strengths
The Steadman Clinic, Vail, CO, USA and limitations of conducting research.
© ISAKOS 2019 301
V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_33
302 M. I. Kennedy and R. F. LaPrade

33.2 Design developing hypotheses for studies of a greater


level of evidence [10]. Instead of typical clinical
Before any study may begin, it must be approved research studies comparing the effectiveness of
in accordance with the privacy law with approval relative procedures, case series exclusively report
from the respective Institutional Review Board the outcomes following a novel treatment proce-
(IRB). This law essentially protects any individ- dure for a specific study population [10].
ual, living or deceased, of their protected health Repeatability is very important regarding case
information and the confidentiality of data that is series studies, because the incidence of novel
collected [12]. Additionally, the use of an indi- procedures subsequent to unusual disease pat-
vidual’s identifiable data may not be used with- terns is rarely standardized, and highly descrip-
out prior written approval [12]. A waiver may be tive techniques are necessary for comparisons/
granted for prior written approval required by the analyses of future prospective studies.
Privacy Rule if the following criteria are met: the Unambiguously stated inclusion/exclusion crite-
project poses a minimal risk to the privacy of the ria are also necessary for physicians of differing
individuals, there is an adequate plan to protect institutions to make comparisons with their own
identifiers, the identifiers will be destroyed at the patient populations; defined and short inclusion
earliest opportunity, the project cannot be practi- periods are highly recommended to reduce the
cally conducted without the specified protected incidence of known and unknown changes that
health information, and the project could not be may occur more frequently in patients seen over
conducted without a waiver [12]. The approval varying time periods [10]. Lastly, indications
by the respective local IRB usually comes fol- should be explicitly stated for purposes of creat-
lowing a thorough description of the approach ing a consistent patient group [10].
and ultimately the goals for conducting the study.
The key features of a case series that should be
explicitly stated and focused around are a note- Analyzing Disease Patterns
worthy collection of clinical occurrences that • Aid in formulating a reinforced hypoth-
potentially display a combination of signs/symp- esis for future prospective studies
toms and the subsequent novel treatment protocol • Provide information regarding natural
that may infer causality [7]. history, recovery, and prognostic
factors
• Postulate relevant measures of interest:
Repeatability sample size, relevant covariates, and
• Highly descriptive techniques are neces- length of follow-up
sary for comparison/analyses of future
prospective studies
• Unambiguously stated inclusion/exclu- In addition to a reinforced hypothesis, case
sion criteria for comparisons among dif- series can provide viable information for disease
fering institutions patterns regarding the natural history, recovery,
• Explicitly stated indications for creating and prognostic factors which can further postu-
a consistent patient group late relevant measures of interest including sam-
ple size, relevant covariates, and/or length of
follow-up [1, 10].
Case series are best utilized for reporting
novel diagnostic/therapeutic strategies when the
alterative option of delaying for comparative evi- 33.3 Reporting Outcomes
dence is less likely [10]. Due to the lack of
hypothesis or comparative groups, case series are By a retrospective approach, ease of accessibil-
unable to make causal inferences between a treat- ity to long-term follow-up and lack of a compari-
ment method and its outcomes, but it can aid in son group make the ideal outcomes primarily
33  Level 4 Evidence: Clinical Case Series 303

consistent of treatment safety and diagnostic patient expectations by range of motion, stiff-
accuracy [10]. Case series are descriptive, and ness, and/or pain, thereby failing to discriminate
therefore findings should be exclusively pre- between improved and worsened patients [13].
sented by descriptive statistics, because compar- System-specific outcomes are localized by a
ative tests supported by p-values are irrelevant in specific body region, while disease-specific out-
the matter and should be avoided, along with comes are localized by the specific disease
conclusive reports; the treatment of interest and encountered by the patient [13]. Frequently,
its relative efficacy would be supported by an patients can be evaluated by both approaches.
immaterial hypothesis. The variable potential for Patients suffering from osteoarthritis of the knee
bias is also important to explicitly state for future can be assessed by instruments related to well-­
prospective studies to ascertain validity of the being of the knee and/or instruments related to
treatment [10]. well-being of osteoarthritis, increasing the avail-
The most important outcomes that can be able comparative measures and sensitivity to
determined by clinical research are measures of change [13].
physical function and well-being. These are most General and overall health-related quality-of-­
often deemed significant by their measures of life instruments mainly pertain to the health and
validity and reliability. Validity and reliability happiness, respectively, of a patient to encounter
refer to the ability of the study to measure the daily activities. The former is a multifactorial
variable of interest and the extent that repeated concept comprising physical, mental, and social
measures display similar results, respectively. factors that deal with a broad range of activities
The Western Ontario and McMaster Universities including work, hobbies, and social interactions;
(WOMAC) Osteoarthritis index score is an these factors thus measure the patient’s ability to
example of an outcome that correlates well with carry out everyday life [13]. The latter instrument
previously described instruments of pain, stiff- is similar but primarily concerns the patient’s sat-
ness, and physical function [2]. isfaction with their ability to participate in daily
A classification scheme for variable measures activities [13].
of outcomes has previously been proposed by
Wilson and Cleary, in which the higher-level out-
comes were increasingly influenced by outside, 33.4 Level IV Case Series Example
uncontrollable factors of individuals, environ-
ments, and/or nonmedical factors that increase A great example of case series research is a
the difficulty in measurement and ultimately the study performed by Geeslin and LaPrade, in
definition for the patient population [16]. These which they aimed to report objective stability
five levels of outcomes were suggested as bio- and subjective outcomes for a prospective series
logical and physiological variables (Level I), of patients with an acute grade III posterolateral
symptom status (Level II), functional status corner (PLC) knee injury treated with anatomic
(Level III), general health perceptions (Level IV), repair and/or reconstruction of all injured struc-
and overall quality of life (Level V) [16]. tures [6]. At this time, there was a lack of cur-
Instruments for reporting outcomes among ortho- rent reports regarding surgical treatment and
pedic research can be utilized in many forms outcomes of acute PLC injuries, with most of
including mixed clinician-based and functional the literature having been published a decade
outcome; system-specific, disease-specific, and prior [5, 15].
general health-related quality of life; and overall Upon completion of this case series, the
health-related quality of life. authors were able to report significantly improved
Mixed outcomes occasionally yield clouded objective stability, relative to preoperative condi-
findings due to the obstacle of interobserver reli- tions, that resulted from treatment of grade III
ability across physical examinations [14]. PLC injuries by acute repair of avulsed fractures,
Additionally, the broad report of measures into a reconstruction of midsubstance tears, and con-
summarized score deters from the specificity of current reconstruction of any cruciate ligament
304 M. I. Kennedy and R. F. LaPrade

tears [6]. This case series adds significant value to patient follow-up measures, and measurement
to the literature in showing the high yield of out- bias may exist with the absence of a standardized
comes by acute repair of the PLC structures. protocol [10]. Patients are expected to display
worse outcomes when they have become
deceased or switched to another hospital, which
33.5 Strengths and Pitfalls creates an instance of selection bias [10].
Measurement bias also becomes prevalent, as dif-
Case series, and all observational studies, are fering methods of outcome measurements are
uncontrolled, meaning the physician does not used in a study [10].
choose the treatment for the research participant.
This method carries both a positive and negative Take-Home Message
aspect in the research process. Results of uncon- • Case series provide a basis for designing fur-
trolled studies more often closer resemble routine ther research studies.
clinical practice than randomized trials and can • In the instance of unusual occurrences of a
often be better applied to clinical practice [1, 8]. disease, case series are effective for designing
With a diverse range of patients, a high external hypotheses for future prospective studies,
validity brings greater clinical relevance to differ- although no hypothesis is tested within the
ing medical centers and can better represent the case series itself.
population of interest; strict inclusion criteria • It is important to remember that case series are
established by randomized controlled trials unable to make comparisons; rather they
severely reduce the extent that findings can be exclusively report on the outcome findings
epitomized with common practice [4, 11]. relative to the treatment of the patients
Furthermore, from the financial and ethical involved in the series through descriptive
standpoint, the lack of comparison and random- statistics.
ization yields a cost-effective study design, and
choice treatment by the patient and physician
maintains a consistency for common standards of References
orthopedic practice [10]. 1. Audige L, Hanson B, Kopjar B.  Issues in the plan-
However, an absent comparison group does ning and conduct of non-randomised studies.
provide limitations. Causal inferences cannot be Injury. 2006;37(4):340–8. https://doi.org/10.1016/j.
made, restricting findings to apparent relation- injury.2006.01.026.
2. Bellamy N.  Pain assessment in osteoarthritis: expe-
ships, because the study design lacks an indepen- rience with the WOMAC osteoarthritis index. Semin
dent variable to differentiate outcomes by Arthritis Rheum. 1989;18(4 Suppl 2):14–7.
treatment protocol rather than to patient charac- 3. Carey TS, Boden SD. A critical guide to case series
teristics [10]. Analyzing a condition of interest to reports. Spine (Phila Pa 1976). 2003;28(15):1631–4.
https://doi.org/10.1097/01.BRS.0000083174.84050.
a control group provides possible correlations for E5.
either the presence or degree of exposure to a 4. Dalziel K, Round A, Stein K, Garside R, Castelnuovo
specific risk factor and collectors may take mea- E, Payne L.  Do the findings of case series studies
surements unaware of the patient characteristics, vary significantly according to methodological char-
acteristics? Health Technol Assess. 2005;9(2):iii–v.
reducing the risk of bias [9]. 1-146.
Study design for case series studies that lack a 5. DeLee JC, Riley MB, Rockwood CA Jr. Acute
comparison group eliminates the potential for posterolateral rotatory instability of the knee. Am
confounding bias but increases the potential for J Sports Med. 1983;11(4):199–207. https://doi.
org/10.1177/036354658301100403.
selection and measurement bias which can be 6. Geeslin AG, LaPrade RF.  Outcomes of treatment of
dependent by the approach of a prospective ver- acute grade-III isolated and combined posterolateral
sus retrospective study. Prospective studies more knee injuries: a prospective case series and surgical
regularly embody a consistent manner of study technique. J Bone Joint Surg Am. 2011;93(18):1672–
83. https://doi.org/10.2106/JBJS.J.01639.
design ranging from inclusion and data collection
33  Level 4 Evidence: Clinical Case Series 305

7. Hanson BP.  Designing, conducting and reporting 12. McMaster WC, Sale K, Andersson GB, Bostrom MP,
clinical research. A step by step approach. Injury. Gebhardt MC, Trippel SB, Clark DC.  The conduct
2006;37(7):583–94. https://doi.org/10.1016/j. of clinical research under the HIPAA Privacy Rule.
injury.2005.06.051. J Bone Joint Surg Am. 2006;88(12):2765–70. https://
8. Hartz A, Marsh JL.  Methodologic issues in doi.org/10.2106/JBJS.F.00794.
observational studies. Clin Orthop Relat Res. 13. Poolman RW, Swiontkowski MF, Fairbank JC,

2003;(413):33–42. https://doi.org/10.1097/01. Schemitsch EH, Sprague S, de Vet HC.  Outcome
blo.0000079325.41006.95. instruments: rationale for their use. J Bone Joint Surg
9. Hess DR.  Retrospective studies and chart reviews. Am. 2009;91(Suppl 3):41–9. https://doi.org/10.2106/
Respir Care. 2004;49(10):1171–4. JBJS.H.01551.
10. Kooistra B, Dijkman B, Einhorn TA, Bhandari
14. Pynsent P, Fairbank J, Carr A.  Outcome measures
M. How to design a good case series. J Bone Joint Surg in orthopaedics and orthopaedic trauma. New  York:
Am. 2009;91(Suppl 3):21–6. https://doi.org/10.2106/ Oxford University Press; 2004.
JBJS.H.01573. 15. Stannard JP, Brown SL, Farris RC, McGwin G Jr,
11. Lloyd-Williams F, Mair F, Shiels C, Hanratty
Volgas DA. The posterolateral corner of the knee: repair
B, Goldstein P, Beaton S, Capewell S, Lye M, versus reconstruction. Am J Sports Med. 2005;33(6):
McDonald R, Roberts C, Connelly D.  Why are 881–8. https://doi.org/10.1177/0363546504271208.
patients in clinical trials of heart failure not like 16. Wilson IB, Cleary PD. Linking clinical variables with
those we see in everyday practice? J Clin Epidemiol. health-related quality of life. A conceptual model of
2003;56(12):1157–62. patient outcomes. JAMA. 1995;273(1):59–65.
How to Perform a Clinical Study:
Level 4 Evidence—Case Report
34
Andrew J. Sheean, Gregory V. Gasbarro,
Nasef M. N. Abedelatif, and Volker Musahl

34.1 The Case for Case Reports have been published in the last 10  years [2].
Considering the current state of orthopaedic lit-
The advent of evidenced-based medicine has rev- erature, one cannot help but to ask: “Is there still
olutionized orthopaedic clinical research. a place for case reports?”
Surgeons, now more than ever, seek answers to The utility of case reports is derived from the
diagnostic and therapeutic dilemmas in the pages novelty of both musculoskeletal conditions and
of medical journals, which have raised the bar for their treatments. New conditions are discovered,
what constitutes a “publishable” study. Study previously described conditions present them-
methodology is now routinely scrutinized with selves in unique ways, and modifications to exist-
various quality assessment tools, and, depending ing therapies or new therapies altogether can
upon the type of study, authors are required to enhance patient care. The publication of a well-­
describe their methods using standardized flow conceived case report need not be at odds with
diagrams and predetermined checklists. Better contemporary evidence-based medicine [1, 6].
methods yield more precise results, which sup- Rather, the case report should occupy a special
port stronger conclusions. These conclusions are place within the orthopaedic literature. The case
more likely to alter practice habits and improve report allows its author to alert readers to a novel
care. Moreover, authors recognize readers’ (and aspect of medicine, which can improve the clini-
journal editorial boards’) appetites for higher lev- cian’s diagnostic acumen and/or stimulate larger
els of evidence and now increasingly strive to scale initiatives to better understand the benefits
complete prospective initiatives that routinely of a particular innovation. The purpose of this
compare unique patient cohorts or different treat- chapter is to provide a blueprint for the publica-
ment strategies. These observations are substanti- tion of a case report that is focused, informative,
ated by a recent systematic review of available and capable of reaching the widest audience
randomized controlled trials (RCT) pertaining to possible.
anterior cruciate ligament (ACL) reconstruction,
which noted that 60% of the 412 relevant studies
34.2 Case Report Taxonomy

A. J. Sheean (*) · G. V. Gasbarro · V. Musahl The case report is a retrospective analysis of one,
Department of Orthopaedic Surgery, two, or three clinical cases. These types of stud-
University of Pittsburgh, Pittsburgh, PA, USA
ies in the orthopaedic literature can generally be
N. M. N. Abedelatif classified as one of three types based upon the
Beni Suef University, Beni Suef, Egypt

© ISAKOS 2019 307


V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_34
308 A. J. Sheean et al.

focus of the report: a novel musculoskeletal con-


dition, a novel presentation of a previously after sustaining a ­ hyperextension-flexion
described musculoskeletal condition, or a novel injury of the neck while riding a roller
treatment. The following clinical vignettes pro- coaster [3]. Magnetic resonance imaging
vide an example of each of the three types of case (MRI) of the cervical spine demonstrated a
reports and demonstrate how the authors success- synovial cyst associated with an os odon-
fully explain the implications of their toideum causing severe spinal stenosis.
observations. The patient subsequently underwent cyst
needle aspiration of the synovial cyst, C2–
C3 decompression, laminectomy, and
Clinical Vignette 1: A Novel Condition instrumented posterior spinal fusion from
A 22-year-old collegiate rugby player with C1 to C3. Synovial cysts in the spine com-
a history of recurrent, anterior glenohu- monly are found in the lumbar spine, and
meral instability and a large bony Bankart the authors pointed out the rarity of the
lesion was indicated for bony fixation and synovial cyst location at the atlanto-axial
capsulorrhaphy [7]. After an unsuccessful junction in this case. The authors empha-
attempt at arthroscopic fixation, an arthrot- sized an awareness for the possibility of
omy and open Bankart repair was per- such a lesion found in association with an
formed through a deltopectoral approach. os odontoideum and recommended that
During the exposure, an aberrant muscle those patients with os odontoideum be
belly originating from the long head of the screened for atlanto-axial instability and
biceps brachii and inserting on the lesser for subtle clinical findings suggestive of a
tuberosity, proximal to the subscapularis, cervical myelopathy.
was noted. Furthermore, the anterior
humeral circumflex vessels were found to
lie superficial to the aberrant muscle belly.
This variation was recognized, and fixation
of the bony Bankart lesion and capsulor- Clinical Vignette 3: A Novel Treatment
rhaphy were completed. The authors A 26-year-old woman with a history of two
emphasized an awareness for this anatomic anterior hip dislocations following an
variation for two reasons. First, exposure arthroscopic labral repair was treated with
and mobilization of the subscapularis was revision hip arthroscopy and capsular pli-
possible without releasing the anterior cation before experiencing a third anterior
humeral circumflex vessels. Second, ade- hip dislocation while jogging [4]. After a
quate visualization and management of the thorough radiographic work-up, her hip
subscapularis was contingent upon the instability was attributed to the combined
intraoperative recognition of the aberrant effect of relative acetabular anteversion and
muscle belly. focal anterior acetabular deficiency. She
subsequently underwent a periacetabular
osteotomy (PAO) to negate the effects of
what was described as an overzealous ante-
Clinical Vignette 2: A Novel Presentation of a rior acetabuloplasty. At 1  year postopera-
Previously Described Musculoskeletal tive, the patient had not experienced any
Condition further anterior hip dislocations and dem-
A 32-year-old man presented to the emer- onstrated markedly improved patient-­
gency department with involuntary loss of reported outcomes (PRO). The authors
bowel and bladder function with ­tetraparesis emphasized the importance of a thorough
34  How to Perform a Clinical Study: Level 4 Evidence—Case Report 309

manuscript suitable for publication. Incomplete


preoperative assessment of acetabular ver- or missing clinical notes, undocumented labora-
sion prior to undertaking arthroscopic ace- tory data, and/or imaging studies of poor quality
tabuloplasty and described the utility of can all significantly hamper the process of pre-
PAO to effectively treat iatrogenic anterior senting a case in an informative and thorough
hip instability. manner. For case reports describing a novel treat-
ment approach, clinical follow-up of at least
1 year is preferable, and the inclusion of pre- and
post-treatment patient-reported outcomes (PRO)
34.3 The Process strengthens conclusions pertaining to utility of
the proposed treatment. The patient’s consent
The process of writing a successful case report should be obtained in accordance with local insti-
should proceed in a stepwise fashion (Table 34.1) tutional review board standards. Furthermore,
with a review of the pertinent literature after the proof of consent is frequently required by jour-
recognition of a unique clinical scenario. Once it nals considering case reports for publication.
has been confirmed that the clinical scenario is of Prior to writing the case report, it is advisable
significant novelty to warrant further efforts, an to select a target journal to which the manuscript
appraisal of the medical record should be under- will be submitted. This should be done based
taken to confirm that all relevant details are avail- upon the relevance of the case report to the sub-
able and of sufficient quality to be included in a ject matter typically published by a particular
journal. The author should ask himself or herself:
Table 34.1  Steps for writing a successful case report “Would this case report be of interest to this jour-
1.  Identify a novel clinical scenario nal’s readership?” Additionally, case reports are
2.  Perform preliminary literature review not uniformly accepted for publication by all
3.  Obtain written patient consent journals, and it is important to select a target jour-
4. Apprise the available medical records for quality
and thoroughness
nal based upon a precedent for the publication of
5. Select target journal and review manuscript case reports. Fact Box 34.1 provides a synopsis
guidelines of the historical record of a collection of ortho-
6.  Draft manuscript paedic journals for publishing case reports.

Fact Box 34.1: Last Published Case Report in Orthopaedic Subspecialty Journals.
Impact Factor According to the 2016 Journal Citation Report (Clarivate Analytics)

Orthopaedic journal name Impact factor Last published case report


American Journal of Sports Medicine 5.673 April 2014
Journal of Bone and Joint Surgery 4.840 June 2012
Arthroscopy: Journal of Arthroscopic and Related Surgery 4.292 January 2012
Knee Surgery, Sports Traumatology and Arthroscopy 3.227 June 2017
Journal of Arthroplasty 3.055 March 2013
The Spine Journal 2.962 November 2017
Bone and Joint Journal 2.953 July 2010
The Journal of the American Academy of Orthopaedic Surgeons 2.782 August 2017
Clinical Orthopaedics and Related Research 2.765 February 2016
Journal of Shoulder and Elbow Surgery 2.730 December 2017
Journal of Orthopaedic Trauma 2.251 June 2015
Foot and Ankle International 1.872 February 2017
Journal of Pediatric Orthopaedics 1.695 September 2017
Journal of Hand Surgery 1.606 August 2017
310 A. J. Sheean et al.

34.4 The Structure Table 34.2  Key components of a case report


Abstract
Brevity is a critical feature of a successful case   – Summarize the clinical scenario
  – State the one or two purposes of the case report
reports, and the final manuscript should be a dis-
Introduction
tillation of the key features of a novel clinical   – Provide a context for the case report with a brief
scenario. Limitations on word count, number of synopsis of the relevant literature
figures, and number of references vary some-   – Describe the novel aspects of the clinical scenario
  – State the one or two purposes of the case report
what from journal to journal, which underscores
Case presentation
the importance of a complete review of manu-   – Provide chronologic description of case details
script guidelines prior to beginning to write   – Catalog pertinent physical examination details,
(Fact Box 34.2). Case reports are not uncom- laboratory values, and imaging results
monly limited to between 1000 and 1500 words   – Describe clinical and radiographic follow-up
Discussion
[1]. While the composition of the case report
  – Summarize key observations
varies based upon the target journal, the general   – Describe the novel aspects of the clinical scenario
structure follows a relatively consistent format   – Review the relevant literature
(Table 34.2.).   – Provide concluding remarks that restate the one or
two purposes of the case report

Fact Box 34.2: Example of Author Instructions for Journals That Have Published a Case Report
After January 2017 and Still Currently Accept Submissions

Orthopaedic
journal name Instructions for author
Knee Surgery, • Short description of one (or few) case(s) that are more or less unique in their presentation
Sports and/or a new unique form of treatment
Traumatology • Please note that we only publish unique and rare case reports and generally discourage
and them
Arthroscopy • Specifications: 100-word abstract, 1000-word text including references
• http://www.kssta.org/authors/instructions/casereport/
The Journal of • Pre-submission approval with a proposal or formal invitation is not required
the American • Summarize pertinent unusual or unexpected elements of case with discussion reviewing
Academy of scientific/educational value
Orthopaedic • Informed consent must be obtained from participating subject(s)
Surgeons • Specifications: 150-word abstract, 1500-word text, 3 figure panels, and 10 references
• http://edmgr.ovid.com/jaaos/accounts/authinst.pdf
Journal of • Encourage submission to JSES open access, a quarterly, online publication
Shoulder and • $750 submission fee
Elbow Surgery • Minimum 2-year follow-up is not required
• Specifications: No abstract, include keywords at the end of the introduction, and
2250-word text
• http://www.jshoulderelbow.org/content/authorinfo
Foot and Ankle • Very few case reports are accepted for publication
International • Case reports must offer either new information that has been previously unpublished and
offer completely new information or information that will change the current practice
patterns of our readers
• Entities that are unique in and of themselves bizarre, or common, will not be accepted as
case reports
• Sections should include introduction, case report, discussion, and summary/conclusion
• Specifications: no abstract
• https://us.sagepub.com/en-us/nam/journal/foot-ankle-international#CaseReports
34  How to Perform a Clinical Study: Level 4 Evidence—Case Report 311

Fact Box 34.2: (continued)


Orthopaedic
journal name Instructions for author
Journal of • To be worthy of publication, a case report must have extraordinary teaching value to the
Hand Surgery readers
• Typically, cases where two findings are associated are not accepted since the findings are
often coincidentally rather than causally related
• Sections should include introduction, case report, and discussion
• Specifications: one-paragraph description of manuscript contents, 150-word abstract,
1500-word text, and 10 references
• http://www.jhandsurg.org/content/authorinfo

34.4.1 Title described, and a more thorough review of the


existing literature can be reserved for the discus-
The title should be concise and informative. sion section.
Keywords relevant to the clinical scenario
should be included so as to maximize the likeli-
hood of being captured in subsequent literature 34.4.4 Case Presentation
searches.
The case should be presented in chronological
order, beginning with the patient’s initial presen-
34.4.2 Abstract tation. The patient’s subjective complaints, com-
ponents of the physical examination, pertinent
If allowed, abstracts are generally limited to laboratory values, and imaging studies should be
between 150 and 200 words and should provide a provided devoid of the authors’ analysis and/or
snapshot of clinical scenario. The emphasis of inferences so as to allow the reader to establish
the abstract should be on defining the novelty of his or her own conclusions about the case’s valid-
the case so as to capture the readers’ interest. The ity [5]. Given the frequent restrictions on figures
abstract should include the one or two “take-­ and/or graphics, only the most relevant clinical
home messages” of the case report. photographs, radiographs, pathology specimens,
or advanced imaging results should be included
for review and be of the requisite quality to allow
34.4.3 Introduction for the readers’ thorough interpretation. For case
reports detailing a novel treatment or surgical
Although not uniformly required, the introduc- technique, a thorough account of the operative
tion provides the author with an opportunity to technique should be included. Here again, clini-
elaborate on what the abstract has already stated cal photographs and diagrams may be of particu-
in terms of the novelty of the clinical scenario. lar utility to clearly explain unique intraoperative
The introduction should clearly explain what is findings or a novel surgical technique.
unique about the clinical scenario being pre- The importance of adequate post-treatment
sented and why the case report is worthy of the follow-up cannot be overstated for case reports
readers’ interest. The focus of the introduction presenting novel treatment approaches. The suc-
should be on efficiency as most of the manuscript cess of a treatment is best substantiated by PRO
should be comprised of the case presentation and and/or imaging results obtained at least 1  year
the discussion sections. References to prior rele- post-treatment. Without adequate data, it is diffi-
vant studies should only be made to underscore cult, if not impossible, to assert the utility of a
the unique features of the clinical scenario particular treatment. Moreover, case reports
312 A. J. Sheean et al.

espousing the utility of novel treatments without clinical presentations, and treatment approaches.
sufficient clinical follow-up are unlikely to com- Through a concise, focused description of a
pel readers to seriously consider the implementa- unique clinical scenario, authors can employ the
tion of the proposed therapy. case report to share important details about
diagnosis and treatment. The dissemination of
this information has the potential to sharpen
34.5 Discussion diagnostic practices and catalyze treatment
innovation. In these ways, the case report
The discussion section summarizes the key remains a worthwhile addition to the orthopae-
aspects of the clinical scenario that explain its dic literature.
worthiness of a case report. For anatomic varia-
tions, “normal” should be defined based upon Take-Home Message
prior epidemiologic descriptions, and the impli- • In the era of evidence-based medicine, the
cations of the variation should be discussed. For importance of case reports stems from their
entirely new conditions, the clinical presentation utility in describing new conditions, commu-
should be detailed and possible treatment strate- nicating a novel presentation of a previously
gies proposed. Novel presentations of known described conditions, or detailing new treat-
musculoskeletal conditions should be discussed ment approaches.
in relation to the modalities typically used to
make the diagnosis. Was there something unique
about the current case that obscured the diagno- References
sis, and, if so, how was the diagnosis made? For
new treatments, the rationale for a unique 1. Friedman JN.  The case for ... writing case reports.
approach should be explained. Does the new Paediatr Child Health. 2006;11:343–4.
2. Kay J, Memon M, Sa D, Simunovic N, Musahl V, Fu
treatment address deficiencies in more traditional FH, et  al. A historical analysis of randomized con-
therapies? Additionally, the proposed advantages trolled trials in anterior cruciate ligament surgery. J
of the new treatment should be clearly described. Bone Joint Surg Am. 2017;99:2062–8.
In each of these circumstances, a focused 3. Schmitz MR, Jenne J.  Acute tetraparesis caused by
a cervical spine synovial cyst associated with an os
review of the relevant literature is critical in order odontoideum: a case report. JBJS Case Connect.
to place the case report in a broader context. The 2012;2:e17.
literature review serves the purpose of explaining 4. Sheean AJ, Barrow AE, Burns TC, Schmitz
how the case report differs from what has already MR.  Iatrogenic hip instability treated with peri-
acetabular osteotomy. J Am Acad Orthop Surg.
been described. However, only the most pertinent 2017;25:594–9.
references should be cited, as most case reports 5. Sun Z.  Tips for writing a case report for the novice
are limited to including between 10 and 15 author. J Med Radiat Sci. 2013;60:108–13.
references. 6. Vandenbroucke JP. In defense of case reports and case
series. Ann Intern Med. 2001;134:330–4.
7. Warner JJ, Paletta GA, Warren RF. Accessory head of
the biceps brachii. Case report demonstrating clinical
34.6 Conclusion relevance. Clin Orthop Relat Res. 1992;179:181.

The publication of case reports facilitates the


discussion of novel musculoskeletal conditions,
Level 5: Evidence
35
Seán Mc Auliffe and Pieter D’Hooghe

Fact Box 35.1 Fact Box 35.3


Level 5 evidence within the levels of evi- The Delphi technique involves the gather-
dence framework refers to the use of opin- ing of expert opinion by means of struc-
ions of respected authorities, consensus tured or semi-structured rounds of
statements, descriptive studies, or reports questioning subsequently leading to a con-
of expert committees involving informa- sensus of opinion on a particular topic.
tion “without explicit critical appraisal or
based on economic theory or first princi-
ples” (Oxford Centre for Evidence-based
Medicine Levels of Evidence, May 2001). 35.1 Introduction

Clinicians are often faced with difficult decisions


and uncertainty when patients need a certain
Fact Box 35.2 treatment. They routinely rely on scientific litera-
Expert consensus is seen as a suitable ture in addition to their knowledge, experience,
method to validate hierarchies of evidence and patient preferences to make informed deci-
or indeed other components of the sions. However, what if no such scientific litera-
evidence-­based medicine enterprise, such ture exists, and what if there are no available
as the CONSORT Statement for reporting RCTs or cohort studies in order to aid their clini-
randomized controlled trials and principles cal decisions? Therefore, level 5 evidence meth-
for developing practice guidelines. ods provide a necessary and important starting
point for clinicians where other, more robust evi-
dence is unavailable, overcoming some of the
limitations within evidence-based medicine. In
essence level 5 evidence methods embody the
S. Mc Auliffe · P. D’Hooghe (*) principle of practice-based evidence by provid-
Aspetar Orthopaedic and Sports Medicine Hospital, ing results that are relevant to the local popula-
Doha, Qatar tion and culture and are readily implementable
Aspire Zone, Doha, Qatar within the healthcare system. This approach fits
e-mail: sean.auliffe@aspetar.com; with the call for more “practice-based evidence,”
pieter.dhooghe@aspetar.com

© ISAKOS 2019 313


V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_35
314 S. Mc Auliffe and P. D’Hooghe

where the evidence is gathered in real-life ­clinical c­riteria, protocol registration, and classification
settings and there is greater emphasis on the of research quality to name a few [7].
external validity of the evidence (generalizabil- Considering the obvious limitations of level 5
ity) rather than on its internal validity (validity of evidence-based methods, we can recommend the
causal inference) in order to develop better inter- utilization of these methods in contributing to
ventions [1, 2]. informed clinical decision-making and informing
Despite the obvious benefit of expert consen- clinical practice. An important counter-question
sus and guidelines, these methods are not without to ask is: How were some types of evidence
their limitations. assigned to be higher in the hierarchy than oth-
Firstly, the methods of developing expert ers? The answer is “expert consensus.” Expert
opinions or consensus statements are often not consensus is seen as a suitable method to validate
reported clearly, and thus, one cannot be certain hierarchies of evidence or indeed other compo-
how the evidence has been collected or assessed. nents of the evidence-based medicine enterprise,
For example, the recent “World Heart Expert such as the CONSORT Statement for reporting
Consensus Statement on Antiplatelet Therapy in randomized controlled trials and principles for
East Asian Patients with acute coronary syn- developing practice guidelines [8, 9]. More gen-
drome (ACS) or undergoing percutaneous coro- erally, consensus has an important role in the sci-
nary intervention (PCI)” summarized the latest entific process. New theories gain ground as
data on the use of antiplatelet in the East Asian more members of the scientific community see a
populations with ACS or for whom PCI is per- new theory as giving a better account of the evi-
formed [3]. An expert panel of a wide array of dence than older ones and thus act as a funda-
experiences and knowledge writes the document, mental starting point in the evidence-based
and yet, the methodology of the process summa- medicine approach.
rizing the evidence cited in this statement remains
unreported [4].
A second limitation of level 5 evidence is that 35.2 C
 onsensus Group Methods:
recommendations and consensus statements are Definition and Rationale
influenced by the opinions, clinical experience,
and composition of the group that develops them. Level 5 evidence-based methods may be
The beliefs and opinions to which experts sub- achieved through many formal methods, the
scribe can be based on misconceptions and per- most common of which are expert opinions or
sonal bias recollections that misrepresent consensus group methods such as the Delphi or
population norms or may not reflect the view of nominal group technique (NGT) [2]. The goal of
their respective profession as a whole [5, 6]. these methodologies is to establish how well
Finally in contrast to a formal evidence-based experts agree on a particular topic or issue, with
medicine (EBM) approach, consensus statements the premise that accurate and reliable assessment
and guidelines may be unduly influenced by can be best achieved by consulting a panel of
ascertainment bias (inclusion of subjects not rep- experts and generating a group consensus on that
resentative of the general population or selective particular issue. Consensus group methods have
analysis of groups at opposite extremes) or an demonstrated acceptable construct validity and
emotional case study (more recent or more publi- reliability [6, 10]. Utilizing these methods may
cized stories or successful patient encounters) to assist in the development of an evidence base
influence the final recommendations. Hence that can guide decisions, policy makers, and
higher-level evidence-based methodologies are practitioners, thus ensuring practitioners can
usually free of such influences attributed to move beyond relying on their own individual
requirements and guidelines for eligibility experiences by drawing on the accumulated
35  Level 5: Evidence 315

body of knowledge of a larger, expert group [2]. point, five-point, and seven-point scales have
Common features necessary for carrying out also been used [14–17]. Respondents are asked,
consensus group methods include anonymity, to rate both items, as well as to compose free-
iteration, controlled feedback, statistical group text comments that, for example, explain their
response, and structured interaction all of which rating and express disagreement with the state-
differentiate informal group meetings from for- ment’s relevance [18]. While systematic litera-
mal consensus methods [11]. Consensus tech- ture searches and focus groups are often used for
niques such as the NGT and Delphi technique the initial Delphi questionnaire, expert panel
are similar to focus groups. A key strength of members are given the opportunity to suggest
consensus methods is the balanced participation additional items when completing the question-
from group members, unlike a focus group, naire [19].
whereby the facilitator must control for, and Members of the research team analyze, col-
minimize the risk of, a dominant participant late, and compile an inclusive list of responses
influencing the discussion [12]. for subsequent resubmission to the expert
panel in the form of a second-round question-
naire. In all subsequent rounds of the Delphi
35.3 The Delphi Method process, the panel members are provided with
feedback pertaining to the individual responses
The Delphi technique involves the gathering of provided and those of the other panelists. The
expert opinion by means of structured or semi-­ process of questionnaire circulation and con-
structured rounds of questioning subsequently trolled opinion feedback is continued until
leading to a consensus of opinion on a particular consensus has been reached. Ordinarily the
topic [13]. The Rand Corporation first developed Delphi process requires between two and four
the Delphi technique in 1953 [14]. There is no rounds to gather a consensus of judgment,
standard method to calculate a panel size for the opinion, or choice.
Delphi technique with a sample of about 15 com- The results of the multiple rounds of a Delphi
monly accepted in the literature [12]. study may be difficult to communicate due to the
Carrying out a Delphi study involves a series complexity. One common method involves the
of steps and choices. The first step in the Delphi use of flowcharts showing the fate of items at
technique involves a sufficient review of the each round. The simplest output method from a
available literature in order to generate and clar- Delphi study is a list of accepted items or alterna-
ify a clear research question. The next step tively where the number of accepted items is
involves the selection of a suitable expert panel small, simply reporting the list may in fact be suf-
or individuals with expertise relevant to the ques- ficient [19].
tion. Following the formation of a suitable panel Some methodological concerns regarding the
of individuals, a questionnaire is constructed by Delphi method that should be taken under con-
the coordinator, which will then be used to gather sideration before carrying out such a study
opinions from the expert panel. include the definition of an expert panel member,
The expert panel is then provided with an ini- the potential bias of selecting panel members,
tial questionnaire (round one) that uses open-­ anonymity of individual responses, questionnaire
ended questions to generate a list of ideas or design, and scoring methods and, therefore,
concepts related to the research question. The should all be considered and addressed prior to
first-round questionnaire will present a series of undertaking the study [20].
statements that the respondent is asked to rate on A graphical representation of the multiple
a clearly defined Likert scale. Often a nine-point stages involved in the Delphi process is outlined
Likert scale is used for the rating, although three-­ in Fig. 35.1.
316 S. Mc Auliffe and P. D’Hooghe

Fig. 35.1  Delphi study


Clarify the research question or problem
protocol

Develop research protocol to be used

Recruit experts for Delphi panel

Collect and anlayse data (R1) – Explore

Collect and anlayse data (R2) – Evaluation

Collect and anlayse data (R3) – Evaluation

Document the research process

35.4 Nominal Group Technique ous rounds, as well as providing ample opportu-
(NGT) nity for a grouping step, where similar ideas are
grouped together with agreement from all par-
The NGT method incorporates a structured face-­ ticipants. Participants are also afforded the
to-­face group interaction. The NGT involves a opportunity to exclude, include, or alter ideas, as
four-stage process, namely, silent generation, well as generate grouping themes. Both the
round robin, clarification, and voting (ranking or round robin and clarification phases may take up
rating) [21]. to 30 min, respectively [12].
To initiate the NGT process, one to two ques- The next stage of the NGT method is known
tions are sent to participants in advance of a as the “voting or ranking” stage where partici-
face-­to-­face group meeting. Subsequently, at the pants are provided with a ranking sheet, requiring
beginning of the meeting, participants are given them to identify their top preferences from the
time to reflect or record their individual ideas in generated ideas in the previous stages with larger
response to a question, in a process known as numbers reflecting greater importance. The num-
“silent generation.” The next stage of the NGT ber of items chosen by participants may vary
process is known as the “round robin” process, according to the topic of interest, but the ranking
in which the facilitator requests one participant of five ideas is common in the literature [12, 22–
at a time to state a single idea to the group. 27]. The final stage of the NGT process is referred
Participants are afforded as much time as to as the “discussion stage,” where the scores for
required until no new further ideas are gener- each idea are summed and presented to the group
ated, with ideas recorded verbatim on, for exam- for discussion. The timing for this stage is likely
ple, a flipchart or white board. The third stage of to depend on a number of factors, including the
the NGT process is known as the “clarification complexity of the topic and how many items need
stage” [21]. This stage provides the opportunity to be prioritized. A graphical illustration of the
for clarification of the ideas generated in previ- NGT method is outlined in Fig. 35.2 [12].
35  Level 5: Evidence 317

Silent Generation
The consensus meeting which took place
Idea generation
on the 17th–18th of November 2017 at the
Round Robin (Literature or University of Pittsburgh (Fig.  35.3) repre-
Surveys) sents a culmination of the Delphi process
methodology which began in early 2017 and
Clarification ultimately aims to help clinicians provide
appropriate treatment of ankle articular carti-
lage injuries based on expert-level opinion.
Voting (Ranking)

Discussion
35.5 Conclusion

In the original group In essence, level 5 evidence methods embody the


Re-ranking process principle of practice-based evidence by providing
(one, two, or Survey results that are relevant to the local population and
until no changes) culture and are readily implementable within the
Forum event healthcare system. This approach fits with the call
for more “practice-based evidence,” where the evi-
dence is gathered in real-life clinical settings and
Fig. 35.2  NGT method there is greater emphasis on the external validity of
the evidence (generalizability) rather than on its
internal validity (validity of causal inference) in
order to develop better interventions.
Clinical Vignette: The Practical
Implementation of Level 5 Evidence in Take-Home Message
Clinical Orthopedic Medicine • By using expert opinion and consensus state-
To illustrate the use of expert consensus ments to develop an evidence base, practitio-
research methods in orthopedic medicine, ners can move beyond relying on their own
the International Society on Cartilage experience and can avail on an accumulated
Repair of the Ankle recently held a con- experience of a larger cohort of practitioners
sensus meeting in Pittsburgh, Pennsylvania, in their respective fields.
USA, in order to develop a consensus • Level 5 evidence may also facilitate the devel-
opinion on key topics within cartilage opment of clinical practice guidelines or in
repair of the ankle. Using the Delphi con- assisting healthcare stakeholders with deci-
sensus group method described previously, sion making, thus highlighting opportunities
this consensus meeting assembled ortho- and areas for future research.
pedic surgeons, physical therapists, radi- • Although at the lower end of the evidence-based
ologists, and basic scientists to provide medicine hierarchy, level 5 evidence methods
evidence-­based and/or expert recommen- have the added advantage of being feasible with
dations in the field on cartilage repair of limited resources where it may not be feasible to
the ankle. carry out randomized controlled trials, popula-
tion surveys, or cohort studies.
318 S. Mc Auliffe and P. D’Hooghe

Fig. 35.3  A practical example on the Delphi method used at a surgical meeting in Pittsburgh on cartilage repair of the
ankle
35  Level 5: Evidence 319

35.6 Website 12. McMillan SS, Kelly F, Sav A, Kendall E, King MA,
Whitty JA, et al. Using the nominal group technique:
how to analyse across multiple groups. Health Serv
• Oxford Centre for Evidence-based Medicine Outcomes Res Method. 2014;14:92–108.
Levels of Evidence (May 2001). http://www. 13. O’Neill S, Watson PJ, Barry S.  A Delphi study

cebm.net/oxford-centre-evidence-based- of risk factors for achilles tendinopathy- opinions
of world tendon experts. Int J Sports Phys Ther.
medicine-levels-evidence-march-2009/ 2016;11(5):684–97.
14. Linstone HA, Turoff M. The Delphi survey: method
techniques and applications. Reading: Addison-­
Wesley; 1975.
15. Cantrill JA, Sibbald B, Buetow S.  Indicators of the
References appropriateness of long term prescribing in general
practice in the United Kingdom: consensus develop-
1. Green LW.  Making research relevant: if it is an ment, face and content validity, feasibility and reli-
evidence-­based practice, where’s the practice-based ability. Qual Health Care. 1998;7:130–5.
evidence? Fam Pract. 2008;25(Suppl 1):20–4. 16. Cassar Flores A, Marshall S, Cordina M. Use of the
2. Minas H, Jorm AFO.  Where there is no evidence: Delphi technique to determine safety features to be
use of expert consensus methods to fill the evidence included in a neonatal and paediatric prescription
gap in low-income countries and cultural minori- chart. Int J Clin Pharmacol. 2014;36(6):1179–89.
ties. Int J Ment Health Syst. 2010;4:33. https://doi. 17. Chan A, Tan SH, Wong CM, Yap KY, Ko Y. Clinically
org/10.1186/1752-4458-4-33. significant drug–drug interactions between oral anti-
3. Levine G, Jeong Y, Goto S, Anderson J, Huo Y, Mega cancer agents and nonanticancer agents: a Delphi sur-
J, Taubert K, Smith S.  Expert consensus document: vey of oncology pharmacists. Clin Ther. 2009;31(Pt
World Heart Federation expert consensus statement on 2):2379–86; 59(4):334–41.
antiplatelet therapy in East Asian patients with ACS or 18. McMillan SS, King M, Tully MP.  How to use the
undergoing PCI. Nat Rev Cardiol. 2014;11(10):597– nominal group and Delphi techniques. Int J Clin
606. https://doi.org/10.1038/nrcardio.2014.104. Epub Pharmacol. 2016;38:655.
2014 Aug 26. 19. Jorm A.  Using the Delphi expert consensus method
4. Kwong JSW, Chen H, Sun X.  Development of in mental health research. Aust N Z J Psychiatry.
evidence-­ based recommendations: implications for 2015;49(10):887–97.
preparing expert consensus statements. Chin Med J 20. Thangaratinam S, Redman CW.  The Delphi tech-

(Engl). 2016;129:2998–3000. nique. Obstet Gynaecol. 2005;7:120–5. https://doi.
5. Kane RL. Creating practice guidelines: the dangers of org/10.1576/toag.7.2.120.27071.
over-reliance on expert judgment. J Law Med Ethics. 21. Delbecq AL, van de Ven AH, Gustafson DH. group
1995;23:62–4. techniques for program planning, a guide to nominal
6. Pacini D, Murana G, Leone A, Di Marco L, Pantaleo group and Delphi processes. Scott, Foresman and
A.  The value and limitations of guidelines, expert Company: Glenview; 1975.
consensus, and registries on the management of 22. Aljamal M, Ashcroft DM, Tully MP. Development of
patients with thoracic aortic disease. Korean J Thorac indicators to assess the quality of medicines reconcili-
Cardiovasc Surg. 2016;49(6):413–20. https://doi. ation at hospital admission: an e-Delphi study. Int J
org/10.5090/kjtcs.2016.49.6.413. Pharm Pract. 2016;24(3):209–16.
7. Echemendia RJ, Giza CC, Kutcher JS. Developing 23. Cross H. Consensus methods: a bridge between clini-
guidelines for return to play: consensus cal reasoning and clinical research? Int J Lepr Other
and evidence-­ based approaches. Brain Inj. Mycobact Dis. 2005;73(1):28–32.
2015;29(2):185–94. 24. Dening KH, Jones L, Sampson EL.  Preferences for
8. Schulz KF, Altman DG, Moher D. CONSORT 2010 end-of-life care: a nominal group study of people
statement: updated guidelines for reporting parallel with dementia and their family careers. Palliat Med.
group randomized trials. BMJ. 2010;340:32. 2012;27(5):409–17.
9. Shekelle PG, Woolf SH, Eccles M, Grimshaw 25. Kuhn TS. The structure of scientific revolutions. 2nd
J.  Developing clinical guidelines. West J Med. ed. Chicago: University of Chicago Press; 1970.
1999;170(6):348–51. 26. Sandrey MA, Bulger SM.  The Delphi Method: an
10. Hutchings A, Raine R, Sanderson C, Black N. A com- approach for facilitating evidence based practice in
parison of formal consensus methods used for devel- athletic training. Athl Train Educ J. 2008;3(4):135–42.
oping clinical guidelines. J Health Serv Res Policy. 27. Humphrey-Murto S, Varpio L, Gonsalves C, Wood
2006;11(4):218–24. TJ.  Using consensus group methods such as Delphi
11. Jones J, Hunter D.  Consensus methods for medical and Nominal Group in medical education research,
and health services research. BMJ. 1995;311:376–80. Medical Teacher; 2016.
Part VI
How to Perform a Review Article?
Type of Review and How to Get
Started
36
Matthew Skelly, Andrew Duong,
Nicole Simunovic, and Olufemi R. Ayeni

36.1 Introduction mation is available and can be useful in identify-


ing gaps in our current knowledge as well as
The advent of evidence-based medicine has potential avenues for future research.
greatly improved the quantity and quality of
research published today. However, with this
increased output of high quality research, aca- 36.2 Types of Review Articles
demics and clinicians are inundated with new
reports that require time and resources to appro- There are two main types of review articles: sys-
priately read and review [1, 10, 11]. This might tematic reviews and narrative reviews. Both types
result in many relevant research articles going aim to abstract information from available
unread. resources to answer research questions but differ
One answer to keeping abreast of this increase in their approach.
in literature is the synthesis of information in the Systematic reviews follow a planned and
form of a review. Reviews attempt to summarize reproducible process of searching and identifying
and present the available literature on a given relevant articles to answer a proposed question.
topic. In doing so, they establish whether treat- For example, in adults undergoing a primary hip
ment effects are consistent across studies, arthroscopy is the supine or lateral approach
strengthen the power and precision of estimated more effective at lowering post-operative pain,
treatment effects and eliminate biases that can be narcotic usage and improving hip function at
associated with individual studies. This provides 90 days following surgery [13]. They are explicit
healthcare professionals and policy makers a as to what types of resources they include; the
basis upon which they may make evidence-based exact procedures by which the primary studies
decisions. Reviews also take stock of what infor- were searched, screened and data abstracted; and
how their findings were reached [17]. It is often
recommended that researchers design their
research questions with the PICO framework
(Population, Intervention, Control, Outcome).
For the example above, the population is adults
undergoing primary hip arthroscopy, the inter-
M. Skelly · A. Duong · N. Simunovic · O. R. Ayeni (*)
vention and control are the supine and lateral
Department of Surgery, McMaster University,
1200 Main St W, 4E15, Hamilton, ON, Canada approach, and the outcomes include ­post-­operative
e-mail: duonga@mcmaster.ca; simunon@mcmaster.ca pain, narcotic usage and hip function at 90 days.

© ISAKOS 2019 323


V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_36
324 M. Skelly et al.

Narrative reviews, also known as literature confounding factors [19]. In 2004 a systematic
reviews, provide an overview of the current state review was published on the same topic. It spe-
of knowledge on a given topic. For example, a cifically explored factors impacting the health-­
narrative review may look at the broad topic of related quality of life of patients undergoing total
humeral fractures. They do not necessarily follow hip and knee arthroplasty [5]. It not only exam-
a systematic methodology and typically focus on ined which patient group had the best functional
key articles or even expert opinions in a given outcomes but also took into account confounding
topic area. Narrative reviews sometimes use non-­ factors. Among its conclusions were that patient
peer reviewed sources such as editorials, book age was not an obstacle to effective surgery, men
chapters and interviews. Narrative reviews are improved more than women from these total
not required to describe the methodology by arthroplasties, and after a follow-up time of
which authors assembled the literature cited in 1 year, there was no difference in health-related
the review, where findings can be heavily depen- quality of life between weight groups [5]. This
dent on the literature that authors chose to helped to dispel prior thoughts of refusing to per-
include. This might introduce a significant selec- form hip arthroplasty on obese patients on the
tion bias and creates the potential for indepen- basis of weight [5]. The systematic review also
dent authors to arrive at different conclusions, used its data to examine the effect of comorbidi-
despite seeking to answer the same research ties, levels of preoperative function, wait time for
question. surgery and procedure type [5]. The difference in
findings and the extent to which each factor was
explored can be directly attributed to method-
• Systematic reviews follow a stringent, ological and source differences between narrative
planned and reproducible process of and systematic reviews [8].
searching and identifying relevant arti- Overall it could be said that systematic reviews
cles to answer their proposed research fall on the more objective end of the spectrum,
question. while narrative reviews are often more subjective
• Narrative reviews do not necessarily fol- [8, 16, 17].
low a systematic methodology and typi-
cally focus on key articles or even expert
opinions in a given topic area. 36.3 W
 hy Conduct a Narrative
Review over a Systematic
Review?
36.2.1 Practical Example
Narrative reviews can be written relatively
In 1998, a narrative review was performed to quickly and provide readers with current knowl-
determine what patient-related factors affect the edge about a certain topic. They tend to be writ-
functional outcome of total hip arthroplasty [19]. ten by authors who are experts in the field and are
This review conducted a brief literature search of therefore able to elaborate on their conclusions
one database and included other additional arti- through their own personal experiences, theories
cles identified as relevant by the authors. The or models and educated opinions. This additional
authors concluded that the best functional out- insight can be invaluable in new areas of research
comes were reported by patients between the that are lacking a sufficient body of literature. In
ages of 45 and 75, who weighed less than 70 kg comparison, systematic reviews are time-­
and who had a better preoperative functional sta- intensive to perform, making them most useful
tus, with few to no baseline comorbidities [19]. when there is a large body of primary research
The review also indicated that women had better studies available to address a specific research
functional outcomes and prosthesis survival rates question (Table  36.1). In the event that the
than men but stated that this may be the result of reviewed studies share a common outcome, a
36  Type of Review and How to Get Started 325

Table 36.1  A comparison of narrative reviews and systematic reviews


Feature Narrative review Systematic review
Research question Broad overview of a topic Specific well-defined research question examining
a specific aspect of a greater topic
Searching for studies Searches are not exhaustive and do Attempt to capture all available published literature
not guarantee capture of all available and work in progress in a well-documented
literature process. Based on a predefined protocol
Study selection Reasons for the inclusion/exclusion Reasons for the inclusion/exclusion of studies are
of studies are not required and are not explicit and geared towards the research question
commonly explained
Assessment of the Quality of included studies is not Systematic assessment of quality of all included
quality of included typically evaluated studies using established scales, tools, or guidelines
studies
Interpretation and Based in part on the resources Based solely on the data gathered
conclusions gathered and in part on the author’s
intuition/opinion

meta-analysis may be performed. A m ­ eta-­analysis Most authors begin with a literature search.
involves the careful consideration of the quality PubMed, Embase, MEDLINE and Google
and strength of each study included in the sys- Scholar are databases for scientific articles where
tematic review to create a larger and more accu- most medical published studies can be found. In
rate pooled estimate of the effect of a treatment. addition, authors can reach out to subject matter
experts (SME) for their thoughts on the topic.
Although difficult to cite, SMEs have an under-
• Narrative reviews can be written quickly, standing of current evidence and the latest ongo-
they tend to by written by experts of the ing research and can provide feedback regarding
field they concern, and they are invalu- the planned search and direct one towards appro-
able in new areas of research with little priate resources.
available research. One often overlooked area for understanding
• Systematic reviews are more appropri- emerging topics of interest in a given field is
ate when there is a large body of evi- through the ‘grey literature’. This includes confer-
dence that could benefit from ence proceedings, reports and other documents
summarization or pooling of results to that are not published in scientific journals. In
answer the research question. most cases these sources are not peer-reviewed,
and thus their level of quality can be quite varied.
Conference proceedings can be reviewed prior to
acceptance, but it can be based on incomplete
36.4 How to Get Started data, as the authors can choose to present prelimi-
nary findings before the study is complete [12].
36.4.1 Background Research

The first step before choosing both a research 36.4.2 The Outline
topic/question and type of review to address it is
to get a preliminary sense of the available litera- After deciding on an appropriate topic, the next
ture. Background research can inform how best step is to design an outline for the review. There
to identify more sources of information and what is no set structure for designing a narrative
search terms may be relevant. The amount and review, but typically narrative reviews of medical
quality of literature can also help determine literature include introduction and discussion
whether one should write a narrative review or sections [7]. In contrast, systematic reviews have
systematic review. a well-established structure. They require a con-
326 M. Skelly et al.

densed abstract, an introduction, a reproducible Another strategy is the order in which the
description of the methodology, a summary of review is written. The important sections of a nar-
the available literature in the results and a discus- rative review of medical literature include the fig-
sion section drawing overall conclusions from ures and tables (when appropriate), the abstract,
the findings. Guidelines, such as the Preferred the title and the main text. The main text of litera-
Reporting Items for Systematic Reviews and ture reviews can be further broken down into the
Meta-Analyses (PRISMA) or the Consolidated introduction and discussion sections, with some
Standards of Reporting Trials (CONSORT), pro- authors choosing to include methods and/or
vide reporting standards that many journals now results sections. In comparison, systematic
require be implemented in a published systematic reviews are required to report all of these compo-
review to aid in their critical appraisal and inter- nents in order to remain transparent.
pretation [14, 15, 18]. More recently, journals Figures and tables can also be drafted early on
have encouraged the registration of systematic to help organize the structure of the review and
reviews. The Cochrane Library (http://cochraneli- focus the main findings. It is essential to create
brary.com) and PROSPERO (http://crd.york.ac. these early in the writing process, as they will be
uk/prospero/) offer repositories for ongoing and referred to throughout the review. Systematic
completed reviews. reviews typically include a flow diagram depict-
Regardless of how the review is structured, the ing the screening process (following PRISMA
outline should be as detailed as possible. It should guidelines), a study characteristics table and the
address the planned structure of the paper and detailed search terms in an appendix to ensure the
what information will be addressed in each sec- results can be reproduced. Narrative reviews may
tion. Creating an outline early on allows the paper or may not include these figures or tables.
to be revised without interruption to the literary
flow and to coordinate what key messages are to
be conveyed throughout each section. Ideally, 36.6 Reviews in the Orthopaedic
when using this outline to write the article, the Literature: A Cautionary Tale
author should be able to complete the paper with-
out having to conduct additional research. More Narrative reviews are not typically relied upon
details about conducting a systematic review can within orthopaedics to guide clinical decisions.
be found in the next chapter. This weight falls upon systematic reviews, which
are historically cited more than their nonsystem-
atic counterparts [2]. In the past, surgical litera-
36.5 Writing the Review ture, including orthopaedic literature, has been
found to be lacking in quality [4]. Bhandari et al.
With the outline completed, the next step is to applied the Detsky scale to assess the reporting
write the review. While this may seem like a dif- quality of 72 randomized controlled trials (RCTs)
ficult task at first, there are strategies that can published in the Journal of Bone and Joint
make the job easier. Surgery from 1988 to 2000. Only 32 (43%) of
The first is the use of reference management those studies were found to be of high quality [3].
software that can be found online [e.g. Mendeley The poor quality of this literature was attributed
(Elsevier, 2017), BibMe (Chegg, 2017)]. to two reasons. The first was a reliance by sys-
Systematic reviews in particular often require an tematic reviews on low-quality evidence, such as
extensive number of articles to be screened and case studies and case series. These are inherently
referenced in the final paper. The software can act limited by their retrospective nature, potential for
as an organized repository of information during bias and lack of comparative groups. The second
manuscript preparation. The majority of refer- is that high quality systematic reviews regularly
ence management software can also automate failed to sufficiently report the study parameters,
formatting of references to save time when it including methodological parameters of the
comes time to submit the review to a journal. study, their sources of funding and of potential
36  Type of Review and How to Get Started 327

conflicts, as well as the quality of their evidence 4. Chaudhry H, Mundi R, Singh I, Einhorn TA, Bhandari
M. How good is the orthopaedic literature? Indian J
[4]. In the last decade, there has been a dramatic Orthop. 2008;42:144–9.
growth in the number of randomized controlled 5. Ethgen O, Bruyère O, Richy F, Dardennes C,
trials and systematic reviews published in ortho- Reginster J-Y.  Health-related quality of life in total
paedic literature. Publications in orthopaedics on hip and total knee arthroplasty. A qualitative and sys-
tematic review of the literature. J Bone Joint Surg Am.
the whole are estimated to have doubled between 2004;86:963–74.
2000 and 2011 [9]. However, the quality of these 6. Gagnier JJ, Kellam PJ. Reporting and methodologi-
publications and reviews were still poor, despite cal quality of systematic reviews in the orthopaedic
publishing in top journals [6]. This reduces the literature. J Bone Joint Surg Am. 2013;95:e771–7.
7. Green B, Johnson C, Adams A. Writing narrative lit-
impact of published systematic reviews and their erature reviews for peer-reviewed journals: secrets of
ability to aid in clinical decision making. By the trade. J Chiropr Med. 2006;5:101–17.
using the strategies described in this chapter to 8. Hitchcock M.  Review vs systematic review
carefully plan any type of review, authors can vs ETC.  In: LibGuides Nurs. Resour. 2017.
http://researchguides.ebling.library.wisc.edu/c.
minimize bias and ensure adequate reporting to php?g=293229&p=1953452. Accessed 6 Oct 2017.
demonstrate higher quality. 9. Hui Z, Yi Z, Peng J. Bibliometric analysis of the ortho-
pedic literature. Orthopedics. 2013;36:e1225–32.
10. Hurwitz S, Slawson D, Shaunessy A.  Orthopaedic
information mastery: applying evidence-based infor-
36.7 Conclusion mation tools to improve patient outcomes while
saving orthopaedists’ time. J Bone Joint Surg Am.
Narrative reviews can be written relatively 2000;82(6):888–94.
quickly and often provide the most current and 11. Hussain N, Turvey S, Bhandari M.  Keeping up

with best evidence: what resources are available? J
up-to-date information on a chosen topic. They Postgrad Med Edu Res. 2012;46:4–7.
are commonly written by authors who are experts 12. Kay J, Memon M, Rogozinsky J, de Sa D,

in the discussed topic, allowing those experts to Simunovic N, Seil R, Karlsson J, Ayeni OR.  The
put forward their educated opinions and ideas. rate of publication of free papers at the 2008 and
2010 European Society of Sports Traumatology
Systematic reviews are conducted using a Knee Surgery and Arthroscopy congresses. J Exp
planned methodological structure, they attempt Orthop. 2017;4:15.
to encompass all available literature for a chosen 13. Miller LE, Gondusky JS, Bhattacharyya S, Kamath
topic, and they provide reproducible and typi- AF, Boettner F, Wright J.  Does surgical approach
affect outcomes in total hip arthroplasty through 90
cally less biased results. This chapter has eluci- days of follow-up? A systematic review with meta-­
dated the differences between these two types of analysis. J Arthroplasty. 2017;33(4):1296–302.
reviews and serves to provide a starting point for 14. Moher D. Consort: an evolving tool to help improve
any author thinking of conducting a narrative the quality of reports of randomized controlled trials.
JAMA. 1998;279:1489–91.
review of their own. 15. Moher D, Shamseer L, Clarke M, Ghersi D, Liberati
A, Petticrew M, Shekelle P, Stewart LA.  Preferred
reporting items for systematic review and meta-­
References analysis protocols (PRISMA-P) 2015 statement. Syst
Rev. 2015;4:1.
1. Alper B, Hand J, Elliott S. How much effort is needed 16. Riemsma RP, Pattenden J, Bridle C, Sowden AJ,

to keep up with the literature relevant for primary Mather L, Watt IS, Walker A. Systematic review of the
care. J Med Libr Assoc. 2004;92:429. effectiveness of stage based interventions to promote
2. Bhandari M, Montori VM, Devereaux PJ, Wilczynski smoking cessation. Br Med J. 2003;326:1175–7.
NL, Morgan D, Haynes RB, Hedges Team. Doubling 17. Rother ET.  Systematic literature review X narrative
the impact: publication of systematic review arti- review. Acta Paul Enferm. 2007;20:5–6.
cles in orthopaedic journals. J Bone Joint Surg Am. 18. Schulz KF, Altman DG, Moher D. CONSORT 2010
2004;86:1012–6. statement: updated guidelines for reporting parallel
3. Bhandari M, Richards RR, Sprague S, Schemitsch group randomised trials. BMC Med. 2010;8:18.
EH.  The quality of reporting of randomized trials 19. Young NL, Cheah D, Waddell JP, Wright JG. Patient
in the Journal of Bone and Joint Surgery from 1988 characteristics that affect the outcome of total hip
through 2000. J Bone Joint Surg Am. 2002;84:388–96. arthroplasty: a review. Can J Surg. 1998;41:188–95.
Part VII
How to Perform a Systematic Review or
Meta-analysis?
What Is the Difference Between
a Systematic Review
37
and a Meta-analysis?

Shakib Akhter, Thierry Pauyo, and Moin Khan

37.1 Hierarchy of Evidence ical impact [2, 3, 27]. Higher-quality research


evidence (such as level 1 and level 2) is readily
The hierarchy of evidence serves as the founda- brought from bench to bedside as their superior
tion for evidence-based practice and provides a methods increase study validity. But the question
top-down descriptive visualization of the best arises: where do systematic reviews and meta-­
available evidence. The level of evidence is pro- analysis fall in this hierarchy of evidence? The
portional to reliability, quality, and validity; the answer is that they each fall in every level. A sys-
higher these factors, the higher the study lies in tematic review and meta-analysis of well-done
the hierarchy of evidence [2, 27]. Clinicians and high-quality randomized controlled trials are
scientists seek higher levels of evidence as these considered the pinnacle of evidence-based
studies have the greatest potential impact for research. It is important to understand and appre-
clinical practice [3]. Experimental study designs, ciate that systematic reviews and meta-analyses
such as randomized controlled trials, occupy the are not always atop the hierarchy of evidence; if
highest level of evidence (level 1 evidence) in a researcher conducts a systematic review of
research, followed by observational study level 3 evidence, the review will be considered
designs, such as cohort studies (level 2 evidence) level 3 evidence. Conversely, a systematic review
[2, 3, 27]. These research designs are followed by of level 1 evidence will remain level 1 evidence.
case control studies (level 3 evidence), case stud- Accordingly, the study design and level of evi-
ies and case series (level 4 evidence), and finally dence are directly proportional. Systematic
expert opinions (level 5 evidence) [2, 3, 27]. reviews and meta-analyses remain confined to
Clinicians should cautiously apply low-quality the level of evidence they are used with but pro-
evidence such as the latter designs given their vide invaluable results and have tremendous clin-
poor reliability, reproducibility, validity, and clin- ical care implications.

S. Akhter (*) · M. Khan


Department of Orthopaedic Surgery, 37.2 W
 hy Perform a Systematic
McMaster University, Hamilton, ON, Canada Review or Meta-analysis?
Department of Health, Evidence, and Impact,
McMaster University, Hamilton, ON, Canada With over 50 million scholarly articles published
e-mail: akhterms@mcmaster.ca to date, difficulty exists among clinicians and sci-
T. Pauyo entists in organizing and understanding the vast
Department of Orthopaedic Surgery, amounts of available literature [10]. Arguably,
McGill University, Montreal, QC, Canada

© ISAKOS 2019 331


V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_37
332 S. Akhter et al.

the most effective and efficient method in synthe- e­ ligibility criteria, are reported. Narrative reviews
sizing available data in order to make evidence-­ employ a qualitative approach to critically
based decisions is by conducting systematic appraise literature from a specific context or the-
reviews and meta-analyses. These methods iden- oretical perspective. A narrative review is highly
tify, critically appraise, and evaluate several stud- susceptible to bias and lacks reproducibility.
ies pertaining to a single prespecified research Researchers supplement narrative reviews with
question [21]. Both methodologies allow expert intuitive and experiential evidence, reflect-
researchers to combine large portions of litera- ing the qualitative approach [20]. The lack of a
ture and produce results that are widely general- systematic process predisposes this type of
izable. Additional benefits include the ability to review to bias, specifically subjective selection
limit biases found in individual studies, which bias, which has implications on the validity and
increases the reliability of the findings. Although generalizability of the study.
the systematic review and meta-analysis method- A systematic review is governed by carefully
ologies synergistically analyse numerous studies, constructed stages which guide the cumulative
they both serve different yet complementary search method, screening, reviewing, catalogu-
roles. Systematic reviews are the cornerstone in ing, and reporting process of the selected studies
evidence-based medicine as the meticulous, to answer a research question of interest. The
exhaustive, systematic, and structured approach search is intended to provide an exhaustive
is effective in summarizing and critically analys- review of the current literature, which captures
ing the findings of relevant literature pertinent to all appropriate studies ensuring the research
a research questions. A meta-analysis may be question to be answered to its fullest extent. The
conducted in conjunction with a systematic carefully crafted and exhaustive methodology
review to enhance the credence of findings by minimizes bias and provides reliable and accu-
increasing the level of evidence. Through com- rate conclusions to be drawn. It has facilitated an
bining and analysing several studies, researchers improved dissemination of information to health-
aim to extrapolate results that more accurately care providers, the public, policymakers, and the
reflect true effects. Synthesizing data from mul- scientific community. A notable benefit is the
tiple studies allows researchers to achieve a resulting mitigation in the delay in bringing
greater level of statistical power, which is pre- research from bench to bedside leading to well-­
cluded in individual studies. Researchers practis- informed policy decisions and direction for future
ing this methodology are encouraged to follow research. Although systematic reviews rank high
criteria outlined by research quality improvement in the hierarchical chain of evidence, they are not
bodies, such as the Preferred Reporting Items for without flaws. Depending on the construction of
Systematic Reviews and Meta-analysis the search and screening criteria, a researcher
(PRISMA) statement [16]. may inadvertently exclude relevant studies.
However, given the systematic, explicit, and
comprehensive methodology of the review, the
37.3 Systematic Review risk of missing relevant studies is decreased in
comparison to a narrative review. Additionally,
37.3.1 What Is a Systematic Review? the generalizability of the findings may be lim-
ited as patients included in the review must be
Literature reviews are commonly classified as homogenous to the target population the
either a narrative or systematic review. A narra- researcher attempts to generalize. Heterogeneity
tive review is a synthesis of literature in a specific is a marked concern among researchers as a bal-
field constructed using a specific contextual or ance between internal and external validity must
theoretical point of view. The method is not be prioritized; although stringent eligibility crite-
nearly as robust or rigorous as no methodological ria produce homogenous data, generalizability is
approaches, such as the search strategy or affected as patients with differing characteristics
37  What Is the Difference Between a Systematic Review and a Meta-analysis? 333

may be excluded [28]. The systematic review may have significant implications to the validity
methodology is indicated in this context as it can of the review. An extensive search without lan-
provide a descriptive and analytical summary of guage restrictions is recommended [12]. A com-
heterogeneous and dissimilar studies, which usu- prehensive search often has three or more online
ally preclude the use of a meta-analysis [28]. databases included. Examples of commonly used
databases include MEDLINE, EMBASE,
CENTRAL, CINAHL, PUBMED, and others.
37.3.2 A Guide to Performing These platforms allow the researcher to conduct a
a Systematic Review comprehensive and exhaustive search of litera-
ture published on the World Wide Web, download
Six essential steps in conducting a systematic results, and perform a citation analysis [17]. A
review are described. critical tool for success, although not necessary,
Step 1: The Research Question—The first and but highly recommended, is seeking a profes-
foremost step in the process of performing qual- sional librarian to assist in the search strategy
ity research is formulating appropriate research development and implementation [25]. The terms
questions. Commonly underestimated, formulat- should also be reflective of the PICO format to
ing the research question requires careful consid- produce an effective search strategy that identi-
eration of a multitude of factors and can fies all relevant studies and balances sensitivity
potentially consume a notable amount of time. and specificity [25]. In this context, good sensi-
The research question should include the prob- tivity and specificity are reached when the search
lem or question of interest, population of interest, produces a high number of relevant studies and a
intervention, and comparator, along with the out- low number of irrelevant studies, respectively.
comes of interest [1]. Using the population, inter- With an increasing trend in technological domi-
vention, comparator, and outcomes (PICO) nance in research, many authors chose to solely
format helps ensure the question is direct and rel- search electronic databases, but searches of refer-
evant. Although sometimes loosely stated, ence lists on articles and specific journals can
authors should explicitly present the question in a also be done [25]. A search of not only published
structured format as it allows consumers to data via online databases but also of ‘grey litera-
directly understand the goal of the project [12]. ture’ decreases the risk for potential publication
Step 2: Constructing Eligibility Criteria— bias of results. Grey literature refers to scholarly
Selecting appropriate eligibility criteria is funda- literature that is not formally published and can
mental in ensuring the most relevant and include dissertations, policy documents, book
appropriate studies are included in the review. chapters, and conference abstracts, research
Researchers can include all key components reports, and unpublished research data [9].
needed to comprehensively answer their question Step 4: Screening, Selection, and Extraction—
if the criteria are reflective of the PICO points This process begins upon completion of the
identified in Step 1. Additionally, researchers search. A list of all article abstracts that appeared
should clearly identify which type of studies they in the search undergoes an abstract screening, in
are interested in (i.e. randomized trials, cohort which researchers make decisions about includ-
studies, etc.) and other operational factors rele- ing or excluding studies based on their relevance
vant to the studies including, but not limited to, to the eligibility criteria elicited from the
year published, language, and number of partici- abstract. A recommended practice to establish
pants [25]. inter-rater reliability and increase validity of the
Step 3: Constructing the Search Strategy— study is to have two independent reviewers con-
Using appropriate search terms is essential in duct this process independently. Often, review-
ensuring an exhaustive search of literature rele- ers may face disagreements whether to include
vant to the research question. An incomprehen- or exclude studies. Attempts to reach agreements
sive search can exclude important studies that between the two reviewers should be made, but
334 S. Akhter et al.

if ­unsuccessful, another study researcher should [28]. The data extraction p­ rocess also serves as
be consulted. Researchers should log all activi- the final screening process, as the extracted data
ties in this process, such as which studies were will guide final decisions regarding which stud-
included and excluded and the reasons for each ies will be included or excluded [25, 28].
[25]. This log will help researchers create a flow Researchers must also be prepared for when they
diagram to visually represent articles included. face studies that have missing or incomplete
Figure  37.1 is an example taken from the full- data, for which they must contact the individual
text article of Clinical Vignette 2. Once all titles study authors to obtain [16].
and abstracts are screened and a significantly Step 5: Quality Assessment—Appraising the
shorter list of studies remains, a second full-text study quality, although not required, is highly
screening process should be conducted by the recommended to produce a comprehensive and
same reviewers to ensure reliability and consis- highly valid systematic review. A standardized
tency in study methods. This process will operational definition of ‘study quality’ is elu-
exclude all remaining irrelevant studies and sive but generally refers to the confidence a
identify which have extremely high potential to researcher holds in the study’s design and meth-
be included in the review. Finally, researchers ods to minimizing bias [19, 21, 28]. In other
should create a data table or a standardized data words, a quality assessment is a critical appraisal
extraction form to systematically organize and to determine if the design, conduct, and methods
extract data. This table or form is unique to each of a study will reduce systematic error and bias,
systematic review and may include patient clini- which in turn has implications on its internal and
cal and demographic characteristics, author external validity [4, 28]. Bias is defined by the
names, type of study design, and outcomes. It is Cochrane Collaboration as a deviation from truth
most commonly electronic (i.e. a constructed or a systematic error that can lead to either an
table in excel) given its associated considerable under- or overestimation of the true effect [6].
efficiency and mitigation of data errors due to There are multiple types of bias (selection,
mismanagement, but paper also may be used detection, reporting, attrition, performance
biases) that range from small to substantial and
can markedly limit the generalizability of a
Articles identified
by electronic research study [6]. However, researchers should
search n = 944 be cautious in the interpretation of a quality
Duplicates excluded n = 410 assessment as the process is inherently limited
by factors such as a lack of information provided
Titles and
abstracts screened
by the author as well as the absence of a ‘gold
n = 534 standard’ to reference level of quality [21, 26].
Articles excluded n = 528 Nonetheless, recommended guidelines such as
• No meniscal pathology n = 146
• No arthroscopic intervention n = 88
the five-point Oxford Quality Rating Scale or the
• No nonoperative comparison n = 139 Consolidated Standards of Reporting Trials
• Not degenerative tears n = 13 (CONSORT statement) are available for com-
• Not RCTs n = 142
Full texts prehensive quality assessments and should be
screened n = 6
used when appropriate. It is also recommended
Manual search of references that at least two independent reviewers conduct
• Articles included n = 2 the quality assessment process and increase
inter-rater reliability and study validity [8]. A
Article excluded
• Lack of outcomes reported n = 1
consensus within available literature outlines
four main biases that affect study quality: perfor-
Studies
included n = 7
mance, selection, attrition, and detection bias [4,
13, 24, 28]. The quality assessment is a critical
Fig. 37.1  Article selection flow diagram step in the systematic review process as these
37  What Is the Difference Between a Systematic Review and a Meta-analysis? 335

biases can lead to under- or ­overestimations of strong external validity through inter-reviewer
the true effect, and the resulting spurious asso- agreement, reliability via the kappa coefficient,
ciations may have detrimental implications for and consistency via Cronbach’s alpha coefficient
clinical practice [12, 28]. [23]. The kappa coefficient is a commonly used
Quality assessment tools depend on what type statistical measure of inter-rater reliability, and
of evidence is being synthesized. The Cochrane Cronbach’s alpha coefficient elicits the consis-
risk of bias (ROB) tool is considered the gold tency and correlation of the items of an instru-
standard for the quality assessment of random- ment, such as a survey, to determine its overall
ized controlled trials, whereas the methodologi- reliability [18, 22].
cal index for non-randomized studies (MINORS) Providing an overall confidence in the esti-
is one of many available measures for assessing mate of effect of a particular treatment or inter-
observational studies. The Cochrane ROB tool vention is increasingly being performed utilizing
ensures a systematic and critical appraisal of all the Grading of Recommendations, Assessment,
possible bias domains including selection, per- Development and Evaluations (GRADE) criteria.
formance, detection, attrition, reporting, and This GRADE method provides a reliable esti-
other forms of biases [29]. Assessing for these mate of the effect of the evidence across all stud-
potential biases ensures the researcher takes into ies for outcomes, as opposed to individual study
account systematic differences in baseline char- evaluation. An example of one of the factors cri-
acteristics, provision of care, outcome ascertain- tiqued by GRADE is shown in Fig.  37.3.
ment, participant withdrawals, and reported and Methodological flaws, treatment effects, consis-
unreported findings within the studies. The tency, and generalizability comprise the main
MINORS is a 12-item tool that is used for, but assessment domains seen in this tool [5]. Great
not limited to, quality assessment of systematic care should be taken to select the appropriate tool
reviews that employ observational studies for quality assessment and should be reflective of
(Fig.  37.2) [23]. This tool has demonstrated the type of studies being reviewed.

Methodological items for non-randomized studies Score†

1. A clearly stated aim: the question addressed should be precise and relevant in the light of available literature
2. Inclusion of consecutive patients: all patients potentially fit for inclusion (satisfying the criteria for inclusion) have been
included in the study during the study period (no exclusion or details about the reasons for exclusion)
3. Prospective collection of data: data were collected according to a protocol established before the beginning of the study
4. Endpoints appropriate to the aim of the study: unambiguous explanation of the criteria used to evaluate the main outcome
which should be in accordance with the question addressed by the study. Also, the endpoints should be assessed on an
intention-to-treat basis.
5. Unbiased assessment of the study endpoint: blind evaluation of objective endpoints and double-blind evaluation of subjective
endpoints. Otherwise the reasons for not blinding should be stated
6. Follow-up period appropriate to the aim of the study: the follow-up should be sufficiently long to allow the assessment of
the main endpoint and possible adverse events
7. Loss to follow up less than 5%: all patients should be included in the follow up. Otherwise, the proportion lost to follow up
should not exceed the proportion experiencing the major endpoint
8. Prospective calculation of the study size: information of the size of detectable difference of interest with a calculation of
95% confidence interval, according to the expected incidence of the outcome event, and information about the level for
statistical significance and estimates of power when comparing the outcomes
Additional criteria in the case of comparative study
9. An adequate control group: having a gold standard diagnostic test or therapeutic intervention recognized as the optimal
intervention according to the available published data
10. Contemporary groups: control and studied group should be managed during the same time period (no historical comparison)
11. Baseline equivalence of groups: the groups should be similar regarding the criteria other than the studied endpoints. Absence
of confounding factors that could bias the interpretation of the results
12. Adequate statistical analyses: whether the statistics were in accordance with the type of study with calculation of confidence
intervals or relative risk

Each domain is scored from 0-2. Items will either be scored as 0 (not reported), 1 (reported but
inadequate), or 2 (reported and adequate) [29]. A score for non-comparative studies of 16 and
comparative studies of 24 is globally accepted [29].

Fig. 37.2  The validated and updated version of the MINORS questionnaire [29]
336 S. Akhter et al.

Risk of bias Across studies Interpretation Considerations GRADE


assessment of
study limitations

Low risk of Most information is Plausible bias No apparent limitations. No serious


bias. from studies at low unlikely to limitations, do not
risk of bias. seriously alter downgrade.
the results.

Unclear risk Most information is Plausible bias Potential limitations are No serious
of bias. from studies at low that raised some unlikely to lower limitations, do not
or unclear risk of doubt about the confidence in the estimate downgrade.
bias. results. of effect.

Potential limitations are Serious


likely to lower confidence limitations.
in the estimate of effect. downgrade one
level.

High risk of The proportion of Plausible bias Crucial limitation for one Serious
bias. information from that seriously criterion, or some limitations.
studies at high risk weakens limitations for multiple downgrade one
of bias is sufficient confidence in the criteria, sufficient to lower level.
to affect the results. confidence in the estimate
interpretation of of effect.
results.
Crucial limitation for one Very serious
or more criteria sufficient limitations,
to substantiallly lower downgrade two
confidence in the estimate levels.
of effect.

Adopted from the Cochrane Handbook of Systematic Reviews of Interventions, this is an extension
of factor 1 of 5 of GRADE that extends the risk of bias assessment to make conclusions regarding
the limitations relating to outcomes [6].

Fig. 37.3  Factor 1 of 5 in GRADE assessment [6]

A ­comprehensive quality assessment adds to the methods, biases, quality assessment, total num-
review’s generalizability and validity and there- ber of patients in treatment and control groups,
fore leads to a greater clinical impact. intervention(s), control, outcomes, and any other
Step 6: Data Analysis and Interpretation of relevant information. The inclusion of items in
Results—The final steps of data analysis and the table should reflect the research question
interpretation should be conducted following a [20]. Interestingly, these information-rich tables
comprehensive quality assessment [4, 13, 24, provide sufficient information for the researcher
28]. This step requires the researcher to create a to determine if statistically pooling the data for a
concise and simple descriptive summary of each meta-analysis is possible [20]. If the study
included study [28]. Many researchers choose to includes sufficient data to be meta-analysed, that
display study characteristics in a tabular format. approach will be adopted as it is a higher level of
An example can be found in Table 1 in the full evidence. As a result, a systematic review can be,
article of the Clinical Vignette 1. The descriptive and is commonly, accompanied by a meta-­
table is a comprehensive summary of study char- analysis (Clinical Vignette 2).
acteristics and therefore should include, but is not When interpreting results, the researcher
limited to, author name, year published, study should carefully extrapolate conclusions based
37  What Is the Difference Between a Systematic Review and a Meta-analysis? 337

on the summarization and analysis of the included their research question of interest in the back-
studies. In this section, the findings must be ground section (Step 1). The eligibility criteria
briefly reiterated, and the final impressions from and the use of two independent reviewers were
this synthesis of the best available evidence highlighted (Step 2). Four databases were
should be explicitly stated. Here, the researcher searched producing 283 articles, of which 29
should highlight the strengths and limitations of were included following the screening process
their study and indicate how these findings may (Step 4). Results and figures for study and
or may not have clinical implications. Suggestions patient characteristics, along with results for
for future directions of research should also be study quality and clinical outcomes, were
included as they are valuable statements in sys- reported (Step 5). This information was also
tematic reviews, provided this methodology sum- visually displayed using histograms. The results
marizes available evidence pertaining to a specific in each individual studies were reported cumu-
field and highlights lacking areas. Systematic latively as a single article of evidence, without
reviews directly improve patient care as they the use of statistical methods. The authors
provide information that is a requisite for
­ finally critically interpreted the results in the
evidence-­based decision-making. discussion section (Step 6). Evidently, the
Clinical Vignette 1 displays a systematic authors clearly report their methods allowing
review on the efficacy of antifibrinolytic therapy reproducibility and demonstrating a strong
in reducing patient transfusions in orthopaedic internal validity in their review.
surgery. The authors clearly and explicitly state

Clinical Vignette 1: Systematic Review [11]


338 S. Akhter et al.

meaningful results, for example, by detecting


Fact Check modest associations. This design also provides
–– A systematic review is a comprehensive researchers a comprehensive understanding of
and exhaustive review of relevant litera- the true effect. Clinical, methodological, and sta-
ture specific to a topic or research tistical heterogeneities are issues of variability
question. that impose limitations on the systematic review
–– Systematic reviews may employ quali- and meta-analysis methodologies. With limited
tative or quantitative methods in review- heterogeneity, more data provides a more precise
ing and reporting homogenous or estimate of effect. Moreover, the utilization of
heterogeneous studies. statistical methods addresses notable concerns of
–– Compared to the meta-analysis method, generalizability and bias, which are inherent in
a systematic review can be completed the systematic review alone. With respect to het-
readily and requires less formal training, erogeneity, the meta-analysis quantifies between
but is of a lower level of evidence. group differences (differences between studies)
–– The six steps in performing a compre- and also explains them (e.g. using tools such as a
hensive systematic review include (1) meta-regression). However, the presence of
formulating an appropriate research excessive heterogeneity may result in faulty and
question; (2) formulating appropriate misleading conclusions. Statistical methods to
eligibility criteria; (3) forming and con- test for heterogeneity are discussed below.
ducting an appropriate search strategy; Although this methodology occupies the top of
(4) screening, selection, and extraction the hierarchy of evidence, as every research
of relevant results; (5) quality assess- design, it has limitations [7]. The nature of the
ment of the included studies; and (6) meta-analysis requires a large number of studies
critical appraisal to summarize and in order for an effect to be seen. Additionally,
interpret the findings. although it can combine the data of various stud-
ies, the meta-analysis is not capable of adjusting
for poor methodology of the studies included
which may skew the outcomes. It is important for
37.4 Meta-analysis the researcher to create specific eligibility crite-
ria, so that the literature search and review pro-
37.4.1 What Is a Meta-analysis? cess is maximally exhaustive and produces
methodologically sound studies.
A meta-analysis (Clinical Vignette 2), much like
a systematic review and often an extension of
one, also hinges on a systematic and exhaustive 37.4.2 A Guide to Performing
search of the literature. A meta-analysis differs a Meta-analysis
from a systematic review in that instead of simply
collecting and analysing the data, it employs sta- The six steps in performing a meta-analysis are
tistical methods to quantitatively synthesize the the same in performing a systematic review. The
results from multiple studies [15, 28]. This design only difference is additional methods are utilized
aims to expose true effects buried within data by to perform a meta-analysis in the data analysis in
analysing patterns to compare and contrast find- Step 6. The steps follow this chronological order:
ings of several studies. A meta-analysis has many (1) formulating an appropriate research question;
inherent benefits compared with a systematic (2) formulating appropriate eligibility criteria;
review. A notable advantage is that pooling stud- (3) forming and conducting an appropriate search
ies increases statistical power, otherwise unat- strategy; (4) screening, selection, and extraction
tainable in individual studies, leading to more of relevant results; (5) quality assessment of the
37  What Is the Difference Between a Systematic Review and a Meta-analysis? 339

included studies; and (6) analysing results and r­ aters (data collectors) agree in their independent
interpreting the findings. measurements [18]. A multitude of statistical
Step 6: Data Analysis and Interpretation of methods to test for inter-rater reliability exist,
Results—Calculation of the effect sizes and including percent agreement, the contingency
reporting them with a 95% confidence interval coefficient, Pearson’s r, the correlation coeffi-
(CI) are common practice with meta-analyses cient, the concordance correlation coefficient,
[25]. Numerous statistical programmes includ- and the most commonly used Cohen’s kappa for
ing the Cochrane-endorsed Review Manager two raters or Fleiss kappa for three or more rat-
programme (RevMan), MIX 2.0, and MetaStat ers [18]. Although heterogeneity can be judged
that conduct the meta-analysis processes are from graphical representations of data, such as
widely available. The results (effect sizes and CI looking at the error bars in forest plots, research-
intervals) should be reported graphically and ers should conduct a statistical test of heteroge-
quantitatively [25]. A common graphical repre- neity to address concerns of dissimilarities in
sentation of meta-analysis results is a forest plot study results within the meta-analysis. This test
(Fig. 37.4). In Fig. 37.4, the forest plot visually determines if the variation in the study results is
depicts each study as a square where the middle due to genuine measurable differences (hetero-
is the effect size (SMD) and each end point of geneity) or chance alone (homogeneity) but has
the corresponding line represents the upper and the inherent limitation of sensitivity to the num-
lower CI limits. The right portion of the plot (>0) ber of included trials [8]. The I2 statistic is com-
favours the control or comparator, whereas the monly used to quantitatively measure the
left (<0) favours the intervention [25]. The large variability between results (effect sizes of each
diamond at the bottom represents the pooled study). This assessment of consistency is critical
effect of all the individual studies. As the left as it directly relates to generalizability; the more
side of the graph favours the intervention, consistency in studies, the more generalizable it
researchers hope to see this diamond, or pooled is [18]. The statistic Cochran’s Q is commonly
effect, below <0 indicating efficacy of the inter- used to evaluate the null hypothesis of all
vention. Calculation of inter-rater agreement and included studies and evaluate similar effects [8].
tests of heterogeneity are also done in this phase. This test statistic is calculated by weighing the
Measurement of inter-­rater reliability is a key sums of squared deviated from the individual
component in the validity of a study as it repre- studies and the overall pooled result [14]. Finally,
sents how well the data collected are accurate a chi-­squared (χ2) distribution with k-1 degrees
representations of the variables of interest by of freedom is compared to the results from the Q
quantitatively measuring the extent to which test statistic to obtain P values [9].

Surgical treatment Conservative treatment


No. patients No. patients Favours Favours
Mean ± SD or knees Mean ± SD conservative surgical
Study or knees SMD (95% Cl)
Herrlin et al.38 93.5 ± 20 47 90 ± 11.9 49 0.21 (–0.19 to 0.61)
Katz et al.39 80.9 ± 17.8 161 80.7 ± 17.9 169 0.01 (–0.20 to 0.23)
Sihvonen et al.40 82.2 ± 16 70 83.4 ± 13.8 76 –0.08 (–0.40 to 0.24)
Vermesan et al.43 36.1 ± 3.6 60 34.7 ± 3.8 60 0.38 (0.01 to 0.74)
Yim et al.41 83.2 ± 12 50 84.3 ± 10.5 52 –0.10 (–0.49 to 0.29)

Overall 388 406 0.07 (–0.10 to 0.23)


Heterogeneity: l 2 = 20%
–1 –0.5 0 0.5 1
SMD (95% Cl)

Fig. 37.4  Meta-analysis forest plot


340 S. Akhter et al.

With interpreting the results, the researcher an electronic extraction form. The authors
should practise similar caution outlined in the employed statistical methods to report interob-
systematic review process. Summarizing server agreement and outcomes. All steps of the
findings, making evidence-based conclusions,
­ statistical methods are expertly reported and can
highlighting patient care implications, and sug- be readily reproduced. Test of heterogeneity
gesting future directions for research should be was also performed to determine if variability
included. was a result of chance or inter-study heteroge-
Clinical Vignette 2 [14] displays a meta-­ neity. Subgroup analyses were outlined a priori,
analysis on arthroscopic surgery for degenera- and sensitivity analyses were done to elucidate
tive tears of the meniscus. Authors systematically the consequences of missing data and studies at
searched three databases to ensure all relevant risk for bias. Figures of the study selection pro-
literature is captured. With a focus on random- cess and study descriptions were provided.
ized controlled trials, the search was conducted Results for the search, each individual outcome,
and studies screened and assessed for eligibility adverse events, and sensitivity analysis were
by two independent reviewers. To ensure a sys- reported and interpreted. Authors followed the
tematic and consistent process, the same review- PRISMA statement for reporting findings. The
ers independently conducted risk of bias authors concluded with limitations, implica-
assessments followed by data extraction using tions, and a conclusion.

Clinical Vignette 2: Meta-analysis


37  What Is the Difference Between a Systematic Review and a Meta-analysis? 341

• Lastly, researchers should appraise current


Fact Check reviews and meta-analyses for their methods
–– A meta-analysis involves a comprehen- as well as adhere to best practice guidelines to
sive and exhaustive review of relevant produce research that maximizes both internal
literature specific to a topic or research and external validity, resulting in clinically
question and can be viewed as an exten- relevant implications.
sion to a systematic review.
–– Meta-analyses employ statistical meth-
ods to quantitatively synthesize the References
results of pooled studies.
–– Performing a meta-analysis is more dif- 1. Aslam S, Emmanuel P.  Formulating a research-
ficult and time-consuming than per- able question: a critical step for facilitating good
clinical research. Indian J Sex Transm Dis AIDS.
forming a systematic review but is of a 2010;31(1):47.
higher level of evidence. 2. Brighton B, Bhandari M, Tornetta P, Felson
–– The six steps in performing a compre- DT. Hierarchy of evidence: from case reports to ran-
hensive systematic review and meta- domized controlled trials. Clin Orthop Relat Res.
2003;413:19–24.
analysis are the same with one addition. 3. Burns PB, Rohrich RJ, Chung KC. The levels of evi-
Only step 6 involves an additional data dence and their role in evidence-based medicine. Plast
analysis via statistical software such as Reconstr Surg. 2011;128(1):305–10.
the Cochrane Collaboration-endorsed 4. Egger M, Davey-Smith G, Altman D.  Systematic
reviews in health care: meta-analysis in context.
‘RevMan’. Somerset: Wiley; 2013.
5. GRADE Working Group. Grading quality of evi-
dence and strength of recommendations. BMJ.
2004;328(7454):1490.
6. Green S, Higgins JP. Preparing a cochrane review. In:
Take-Home Message Cochrane handbook for systematic reviews of inter-
• Both the systematic review and meta-analysis ventions; 2012. p. 11–30.
are considered the highest level of research 7. Guyatt GH, Sackett DL, Sinclair JC, Hayward R,
Cook DJ, Cook RJ.  Users’ guides to the medical
evidence. literature. IX. A method for grading health care rec-
• A meta-analysis employs statistical methods ommendations. Evidence-Based Medicine Working
to quantitatively synthesize results of different Group. JAMA. 1995;274(22):1800–4.
studies and, when pooling high-quality ran- 8. Higgins JPT, Thompson SG, Deeks JJ, Altman
DG. Measuring inconsistency in meta-analyses. BMJ.
domized controlled trials, is considered the 2003;327(7414):557–60.
highest level of evidence. 9. Hopewell S, Mcdonald S, Clarke M, Egger M. Grey
• This design recognizes and adjusts results for literature in meta-analyses of randomized trials of
biases, increasing the internal and external health care interventions. Cochrane Database Syst
Rev. 2007;(2):MR000010.
validity of the study. 10. Jinha AE. Article 50 million: an estimate of the num-
• Conversely, a systematic review is a nonstatis- ber of scholarly articles in existence. Learned Publ.
tical, formal, and structured process governed 2010;23(3):258–63.
by carefully constructed stages which guide 11. Kagoma YK, Crowther MA, Douketis J, Bhandari M,
Eikelboom J, Lim W. Use of antifibrinolytic therapy
the cumulative search method, screening, to reduce transfusion in patients undergoing orthope-
reviewing, cataloguing, and reporting pro- dic surgery: a systematic review of randomized trials.
cesses of selected studies to answer a research Thromb Res. 2009;123(5):687–96.
question of interest. 12. Khan KS, Kunz R, Kleijnen J, Antes G.  Five steps
to conducting a systematic review. J R Soc Med.
• Researchers must carefully consider which 2003;96(3):118–21.
method is indicated for the question of interest 13. Khan KS, Kunz R, Kleijnen J, Antes G.  Systematic
as specific factors preclude specific designs, reviews to support evidence-based medicine: how
as seen with heterogeneity precluding the use to review and apply findings of healthcare research.
London: Royal Society of Medicine Press; 2003.
of a meta-analysis.
342 S. Akhter et al.

14. Khan M, Evaniew N, Bedi A, Ayeni OR, Bhandari ent reference values: workshop summary. Rockville:
M. Arthroscopic surgery for degenerative tears of the U.S. Dept. of Health and Human Services, Agency for
meniscus: a systematic review and meta-analysis. Can Healthcare Research and Quality; 2009.
Med Assoc J. 2014;186(14):1057–64. 22. Santos JRA. Cronbach’s alpha: a tool for assessing the
15. Lau J, Ioannidis JP, Schmid CH.  Quantitative syn- reliability of scales. J Ext. 1999;37:2.
thesis in systematic reviews. Ann Intern Med. 23. Slim K, Nini E, Forestier D, Kwiatkowski F, Panis Y,
1997;127(9):820. Chipponi J. Methodological index for non-­randomized
16. Liberati A, Al tman DG, Tetzlaff J, Murlow C,
studies (MINORS): development and validation of a
Gøtzsche PC, Clarke M, et al. The PRISMA statement new instrument. ANZ J Surg. 2003;73(9):712–6.
for reporting systematic reviews and meta-­analyses 24.
Torgerson C.  Systematic reviews. London:
of studies that evaluate health care interventions: Continuum; 2003.
explanation and elaboration. Ann Intern Med. 25. Uman LS.  Systematic reviews and meta-analyses. J
2009;151(4):W65–94. Can Acad Child Adolesc Psychiatry. 2011;20(1):57–9.
17. Matthew EF, Eleni EP, George AM, Georgios
26. Verhagen AP, Vet HCD, Bie RAD, Boers M, Brandt
P. Comparison of PubMed, Scopus, Web of Science, PAVD.  The art of quality assessment of RCTs
and Google Scholar: strengths and weaknesses. Fed included in systematic reviews. J Clin Epidemiol.
Am Soc Exp Biol. 2015;20 Sep 2007. 2001;54(7):651–4.
18. Mchugh ML. Interrater reliability: the kappa statistic. 27. Weil RJ. The future of surgical research. PLoS Med.
Biochem Med. 2012;22(3):276–82. 2004;1(1):e13.
19. Moher D, Jadad AR, Nichol G, Penman M, Tugwell 28. Wright RW, Brand RA, Dunn W, Spindler KP. How
P, Walsh S. Assessing the quality of randomized con- to write a systematic review. Clin Orthop Relat Res.
trolled trials: an annotated bibliography of scales and 2007;455:23–9.
checklists. Control Clin Trials. 1995;16:62–73. 29. Zeng X, Zhang Y, Kwong JS, Zhang C, Li S, Sun F,
20. Pae C-U. Why systematic review rather than narrative et al. The methodological quality assessment tools for
review? Psychiatry Investig. 2015;12(3):417. preclinical and clinical studies, systematic review and
21. Russell RM.  Issues and challenges in conducting
meta-analysis, and clinical practice guideline: a sys-
systematic reviews to support development of nutri- tematic review. J Evid Based Med. 2015;8(1):2–10.
Reliability Studies and Surveys
38
Kelsey L. Wise, Brandon J. Kelly,
Michael L. Knudsen, and Jeffrey A. Macalena

38.1 Reliability Studies reliability is better able to evaluate the usefulness


of a scale, measure, or tool [22]. No survey or
38.1.1  Introduction measurement instrument has perfect reliability.
The reliability of measurements must be scruti-
Measurements and their reliability comprise an nized when critically evaluating studies [26].
essential part of orthopedic research [5, 10, 13, Reliability is vital for an instrument to be clin-
23, 24, 32, 36, 49]. The value of a measurement ically useful and applicable. Once reliability has
lies in its ability to be compared. There are no established validity, feasibility and acceptability
perfect measurement instruments, and each has a must be assessed. Validity is characterized by
certain amount of error. Measurement error refers how well the instrument achieves the intended
to how well a particular instrument performs measure. Feasibility refers to time, availability of
within a given population. Less measurement required resources, sample size, and lack of fatal
error yields more precise data. The primary fac- flaws in study design [34]. Acceptability is the
tor in determining acceptable measurement error utility of the instrument in clinical practice. A
is the expected range of measurements [22, 26]. reliability study is only useful if the subjects, rat-
Reliability refers to the reproducibility of a ers, and testing conditions in the study mirror, or
scale and is defined as the relationship between are at least similar to, those variables in clinical
the expected distribution of measurements, the practice or research [22].
actual distribution of measurements, and the
resulting measurement error [26, 34]. Reliability
and agreement differ. If a test is performed and the Fact Box 38.1
result does not change regardless of rater, subject, Key definitions
or other variables, there would be perfect agree- Term Definition
ment [22]. However, this test is of little value to Reliability Reproducibility of scale
Agreement Reproducibility of measurement
the clinician. Reliability evaluates the spectrum of
Validity Instrument’s ability to achieve
measurements [22]. The difference between intended measure
agreement and reliability is important because Feasibility Resources, availability, time,
sample size, study design flaws
Acceptability Utility of instrument in clinical
K. L. Wise · B. J. Kelly · M. L. Knudsen ·
practice
J. A. Macalena (*)
Department of Orthopaedic Surgery, Measurement Instrument’s performance in given
University of Minnesota, Minneapolis, MN, USA error population
e-mail: maca0049@umn.edu

© ISAKOS 2019 343


V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_38
344 K. L. Wise et al.

38.1.2  Types of Reliability Studies 38.1.2.1  Internal Consistency


Internal consistency indicates the uniformity of a
In a reliability study, investigators must choose scale, measuring the similarity of an individual’s
the type of measurement, the tool used to responses across several items [34]. It is a gauge of
obtain measurement, and how the measure how well different items measure the same thing
will be used in a clinical or research setting [26]. Cronbach’s coefficient alpha is a numeric
[22]. Internal consistency reliability, test- value of internal consistency [26]. It is a statistic that
retest reliability, intraobserver reliability, illustrates the homogeneity of the scale [26]. It is
interobserver reliability, and alternate-form derived by associating the score range of each com-
reliability are the most utilized measures in ponent scale with each independent observer score
reliability studies [22]. and then comparing to the total variance of all items.

Clinical Vignette 1
Dr. Sessions measures the internal consistency of a newly formulated survey that includes five
questions about the quality of life of patients with below-knee amputations. The hypothesis is
that individuals of similar age, gender, mental functioning, and pain ratings have comparable
level of function, self-image, mental health, and pain ratings at a mean follow-up of 3 years.
For responses, yes = 1 and no = 0.
Patient Item 1 Item 2 Item 3 Item 4 Item 5 Summed scale score
1 1 0 1 1 1 4
2 0 1 0 1 0 2
3 1 1 1 1 1 5
4 0 1 0 1 0 2
5 1 1 1 0 1 4

Percentage positive 3 = 0.6 4 = 0.8 3 = 0.6 4 = 0.8 3 = 0.6


5 5 5 5 5
The sample mean and sample variances are calculated first.
The sample mean is:
( 4 + 2 + 5 + 2 + 4 ) / 5 = 3.4
The sample variance is:

( 4 - 3.4 ) + ( 2 - 3.4 ) + ( 5 - 3.4 ) + ( 2 - 3.4 ) + ( 4 - 3.4 ) / 5 = 1.44


2 2 2 2 2


Coefficient alpha for a series of dichotomous items is

é k ù ê å 1 ( %Positve )( % Negative ) ú
é k
ù
ê k - 1ú ê1 -
ë û s 2
ú
ë û
where k is the number of items in the scale.
Coefficient alpha is

é 5 ù é ( 0.6 )( 0.4 ) + ( 0.8 )( 0.2 ) + ( 0.6 )( 0.4 ) + ( 0.8 )( 0.2 ) + ( 0.66 )( 0.4 ) ù
ê 5 - 1 ú ê1 - ú = 0.35
ë û ë 1.44 û
The internal consistency, measured with Cronbach’s coefficient alpha, is 0.35, indicating a
very poor correlation between items.
38  Reliability Studies and Surveys 345

For group comparison, reliability statistics A major drawback of internal consistency is


measured should exceed 0.70, and for individual that it only requires a single administration of an
comparison, they should exceed 0.90 [34]. instrument; thus possible differences between rat-
Internal consistency can be improved by adding ers, timing, and situation can result in measure-
questions or measurements, clarifying existing ment error [22]. Test-retest reliability, intraobserver
questions to reduce response bias, and reducing reliability, and interobserver reliability mitigate
variability in the respondents [26]. this source of measurement error by incorporating
multiple measurements into the calculations [22].

38.1.2.2  Test-Retest
Fact Box 38.2 Test-retest reliability measures response stability
Internal consistency reliability statistic over time [34]. Participants respond to a survey at
interpretation two distinct time points to measure response sta-
Cronbach’s
bility. Subsequently, correlation coefficients
Population coefficient alpha Interpretation
Individual ≥0.90 Acceptable (r values) are determined to compare responses
comparison agreement [26, 34]. R values are considered strong if they
Group ≥0.70 Acceptable equal or exceed 0.70 [26]. A source of error in this
comparison agreement study is a change in characteristics of the test set-
ting with multiple administrations.

Clinical Vignette 2
Dr. Siljander wishes to evaluate postoperative pain scores using a visual analog scale (VAS) in
patients undergoing open reduction internal fixation (ORIF) of the distal radius. Fifteen patient
responses are assessed at 2 and 4 h postoperatively. Responses to the VAS are scaled 0–100, and
the response at 2 h postoperatively (time 1) is compared to 4 h postoperatively (time 2). The
correlation coefficients of the two sets of data are compared. The r value is calculated to be 0.98,
which indicates an excellent test-retest reliability at 2 and 4 h postoperatively.
Time 1 Time 2
Patient 1 71 68
Patient 2 75 70
Patient 3 78 70
Patient 4 84 83
Patient 5 81 77
Patient 6 85 88
Patient 7 90 83
Patient 8 22 21
Patient 9 44 36
Patient 10 52 48
Patient 11 50 44
Patient 12 66 61
Patient 13 38 30
Patient 14 84 82
Patient 15 85 73
Σ 1005 934
Correlation coefficient, r:

n éë å ( Time1)( Time 2 ) ùû - éë å ( Time1) ùû éë å ( Time 2 ) ùû


r=
é n å ( Time12 ) - å ( Time1)2 ù - é n å ( Time 22 ) - å ( Time 2 )2 ù
ë û ë û

r = 0.98
346 K. L. Wise et al.

38.1.2.3  Intraobserver the evaluation of imaging modalities [17, 35, 46].


Intraobserver reliability, while similar to test- Intraobserver reliability tends to result in a higher
retest reliability, measures reliability of measure- reliability compared to the test-retest and interob-
ments on the same data. This test has been used in server study designs, as time is the only variable.

Clinical Vignette 3
Dr. Todhunter desires to evaluate the intraobserver reliability of three-dimensional computed
tomography (3D CT) for measuring femoral torsion. He performs this measurement five times
a week, for 5 weeks, on the same imaging study. Dr. Todhunter calculates the variance between
all 25 images to be 3.72° and variance between weekly samples to be 5.30°. With an instrument-
recorded measurement error of 1.33°, he derives an intraobserver reproducibility of 0.96:
between _ subjectSD 2 + between _ observerSD 2
between _ subjectSD 2 + between _ observerSD 2 + measurement _ errorSD 2

( 3.72 ) + ( 5.30 )
2 2

= 0.96
(
3.72 ) + ( 5.30 ) + (1.33)
2 2 2

38.1.2.4  Interobserver ability is high, this test is able to stand alone.


Interobserver reliability measures agreement of However, with low interobserver reliability, it can
two or more raters performing the same measure- be helpful to include test-retest reliability and/or
ments. This is the best test for reliability, as this is intraobserver reliability to provide information
the broadest of the five. When interobserver reli- on potential sources of error.

Clinical Vignette 4 (Continued from Previous Vignette)


Dr. Todhunter also evaluates the interobserver reliability of 3D CT femoral torsion measurements
by having ten of his colleagues perform this measurement five times. He then compares each of
their set of five measurements with his final weeks’ set of five measurements. He calculates a
variance of 3.11° among all 55 images, 1.11° between the 11 observers, and 1.33° for the instru-
ment-recorded measurement error. He denotes an interobserver reliability measure of 0.76.

between _ subjectSD 2
between _ subjectSD 2 + between _ observerSD 2 + measurement _ errorSD 2

( 3.11)
2

= 0.76
( 3.11) + (1.11) + (1.33)
2 2 2

38.1.2.5  Alternate Form same group of ­individuals at different times,


Alternate-form reliability offers a solution to and correlation coefficients are calculated.
the problem of the practice effect, as it Changing the order of answer choices in a
involves rewording items to measure the survey is a simple way to test alternate-form
same element. Items are administered to the reliability [26].
38  Reliability Studies and Surveys 347

38.1.3  Participant Selection


Clinical Vignette 5 and Rating Assignment
Dr. Williamson investigates how often
patients performed their home physical 38.1.3.1  Rater Selection
therapy exercises after total shoulder Individual raters can introduce variation in mea-
arthroplasty. Two different surveys are surement. In a self-administered survey, the rater
utilized. is also the subject. When more than one rater
Item 1: How many times per day did contributes measurements, there are two points
you complete home exercises in the last of variability, rater expertise and rater practice
week? setting. In general, more reliable ratings are seen
with more experienced raters. However, diversity
1 . 1 time per day among raters with regard to level of training and
2. 2 times per day practice setting is valuable in situations where
3. 3–4 times per day raters using the measuring tool have different lev-
4. 5–8 times per day. els of expertise and practice settings. Rater exper-
tise and practice setting must be disclosed in
Item 2: During the last week, how often reliability studies [22].
did you complete daily home exercises?
38.1.3.2  Subject Selection
1. Every 24 h. Study subjects should be representative of clini-
2. Every 12 h. cal practice. To strengthen reliability, include
3. Every 6–8 h. subjects with a wide range of clinical conditions
4. Every 2–5 h. or variables for the measurement focus [22].
Homogenous groups will have stronger raw
agreement, whereas heterogeneous groups will
have stronger reliability because of increased
Fact Box 38.3
sample variability [22]. Reliability is increased
Reliability studies with greater subject variability relative to the
Types of
measurement error [22].
reliability Measure Statistic
Internal Homogeneity of Cronbach’s
consistency scale coefficient alpha
Test-retest Stability of scale Correlation Fact Box 38.4
coefficient (r
Participant selection
value)
Rater selection
Intraobserver Response stability Intraclass
• Rater expertise and practice setting should
of individual rater correlation
reflect audience
coefficient
• Rater expertise improves reliability
Interobserver Response stability Intraclass
Subject selection
of separate raters correlation
• Heterogeneous groups have stronger reliability
coefficient
• Homogenous groups have stronger raw
Alternate Stability of Intraclass
agreement
form response to survey correlation
variation coefficient
348 K. L. Wise et al.

38.1.4  Data Evaluation independent of chance [22]. Despite being useful


when the distribution of responses is skewed, it is
There are many techniques and statistical tests to used infrequently.
describe or measure reliability. Calculating or
reporting more than one statistical estimate may 38.1.4.5  Pearson Correlation
be beneficial. The Pearson correlation coefficient (PCC), also
referred to as Pearson’s “r,” measures linear corre-
38.1.4.1  Categorical Data lation with a range of +1 (perfect correlation) to −1
Categorical data are discrete and qualitative (perfect but negative correlation) and 0 value signi-
­variables. For example, categories are nominal fying no relationship [2]. It is often used with con-
(e.g., “absent,” “present”) or ordinal (e.g., “mild,” tinuous data. The limitation of Pearson correlation
“moderate,” or “severe”) [43]. is that two groups of measurements may still be in
poor agreement, despite perfect correlation. As a
38.1.4.2  Continuous Data result, in reliability studies, the Pearson correlation
In some situations, ratings are on a continuous may poorly describe two-variable relationships.
scale. Examples would be hip range of motion or
the number of steps a patient can take. 38.1.4.6  Intraclass Correlation
Coefficient
38.1.4.3  Kappa Coefficient Reliability represents the reproducibility of a scale,
The kappa coefficient is used for categorical data. and intraclass correlation coefficients (ICC) deter-
It measures agreement and compares observed mine values closest to this definition of reliability.
agreement with possible agreement beyond Intraclass correlation coefficients are calculated
chance [12, 43]. Kappa is the most frequently through repeated measures of analysis of variance
reported statistic in reliability studies examining [16]. The intraclass correlation coefficient is statis-
orthopedic fractures [4]. It takes the form: tically defined as the ratio of the variance of interest
over the sum of variance plus error [41]. Continuous
k = observed agreement - chance agreement /
and categorical data can be analyzed with the intra-
1 - chance agreement
class correlation coefficient, but it is most com-
In symbols, this is: monly used with continuous data.

k = PO - PC / 1 - PC

Fact Box 38.5
PO is the proportion of observed agreements,
Data evaluation
and PC is the proportion of agreements expected
Data types Variables Example
by chance [43].
Categorical Discrete or “Absent” or
The value for perfect agreement is 1.0, and the data qualitative “present”
value for no agreement beyond chance is 0.0. Continuous Numeric or Radiographic
Negative values indicate that the measured agree- data quantitative measurements
ment is worse than chance alone [22]. Kappa is Statistic Ideal data type Measurement
often used in interobserver reliability (two or Kappa Categorical data Agreement
coefficient
more clinicians rate same patient) or intraob- Phi statistic Categorical data Agreement
server reliability (single clinician rates same independent of
patient two or more times). Kappa is not as useful chance
when skewed distribution of responses exists Pearson Continuous data Linear
correlation correlation
because agreement above chance is less likely.
Intraclass Continuous (most Reproducibility
correlation common) or
38.1.4.4  Phi Statistic coefficient categorical data
The phi statistic can also be used for categorical (ICC)
data, and it has the benefit of measuring ­agreement
38  Reliability Studies and Surveys 349

38.1.5  Sample Size 38.1.7  Summary

The number of raters and subjects is controlled in Reliability testing of scales and items is a valu-
reliability studies. To increase precision, increasing able tool in orthopedic surgery, as it provides
the number of subjects carries more weight than quantitative data on the performance of an instru-
increasing the number of raters (particularly when ment [26]. Types of reliability studies include
there are more than four raters) [22]. However, internal consistency, test-retest, intraobserver,
increasing either subjects or raters will narrow the interobserver, and alternate form. Categorical
confidence interval. In an effective study design, data typically use the kappa statistic, while con-
the number of raters is selected first based on study tinuous data tend to utilize the intraclass correla-
generalizability and feasibility, and then subject tion coefficient. A correlation coefficient, or r
number for desired precision is selected [22]. value, with a value of 0.70 or greater is generally
Generalizability of the raters is determined by accepted and indicates good reliability.
their characteristics. Feasibility of the raters or
ratings depends more on the subjects. Radiographs
can be rated multiple times, whereas patients 38.2 Surveys
may not tolerate more than one examination.
Sample size calculations are performed once the 38.2.1  Introduction
number of raters is decided.
For intraclass correlation coefficients, the Surveys are useful instruments in orthopedic
number of subjects is determined by the mini- research for gathering data on the demographics,
mum acceptable reliability [48]. Alternatively, practices, or ideas of a group of individuals at a
the desired precision of the reliability estimate specific time or to compare changes over time
can be used to determine subject number [18]. [11, 14, 21, 33, 38, 44]. Constructing surveys
involves writing questions that may ultimately be
converted into numbers to allow for statistical
38.1.6  Interpretation of Results analysis [30]. The results of surveys give insight
into the adoption of existing literature and help
The majority of reliability statistics use a 0.0–1.0 guide clinical practice and future research [44].
scale. A value of 1.0 indicates all variability is Surveys are administered by telephone, mail, fax,
due to true subject differences; a value of 0.0 or electronically.
indicates all variability is due to error [22]. Surveys allow for convenient and inexpensive
Table 38.1 shows the agreement associated with research that can cover a large population over a
certain kappa values. In reality, most studies have short period of time [21]. Surveys may be the
reliability that falls between 0.3 and 0.7 [4]. only method of conducting certain types of
Ultimately, readers must determine how applica- research, such as understanding current view-
ble study design, raters, subjects, and measure- points or attitudes of orthopedic surgeons.
ment tools are to their practices. Additionally, surveys may act as preliminary
studies for successive research ideas [21].
Survey validity is jeopardized by low response
Table 38.1  Interpretation of the agreement with kappa
coefficient rates. Response rates have been as low as 15%
among surgeons [44]. This has been attributed to
Kappa value Agreement
0 Absent
various factors, including busy work schedules,
>0–0.20 Slight an increasing number of commercial requests,
0.21–0.40 Fair increased paperwork, and low priority [6, 7, 25,
0.41–0.60 Moderate 40, 44]. Low responses can result in nonresponse
0.61–0.80 Good bias, as individuals who respond can have mean-
0.81–1.00 Excellent ingful differences from those who do not. This
350 K. L. Wise et al.

can lead to a nonrepresentative sample and a veyed. The “sampling frame” comprises the tar-
subsequent loss of survey credibility [44]. Biases get population for the survey, while the “sampling
tend to be limited by achieving response rates of element” includes the respondents from whom
at least 70% [33]. In addition to increasing the data is gathered and evaluated [1].
validity of surveys, a higher response rate may Sample selection may be random (probability
also eliminate the costs needed for resending design) or nonrandom (nonprobability design)
surveys [37]. [1]. Probability designs include simple random
A systematic approach to survey design, sampling, systematic random sampling, stratified
development, testing, administration, and consid- sampling, and cluster sampling [11].
ering strategies for elevating response rate are
imperative to decrease bias and increase general- –– Simple random sampling: Each person has the
izability [11]. same probability of being selected. Selection
takes place through techniques such as a lot-
tery process or a random-number generator
38.2.2  Survey Design [45]. This technique requires little prior
knowledge of a population but may not cap-
38.2.2.1  Determining the Objective ture certain groups or be efficient [1].
A well-defined objective is important for a suc- –– Systematic random sampling: A starting point
cessful survey [11]. This involves careful consid- is arbitrarily chosen on a list. Subjects are then
eration of the topic and target audience [21]. selected in a methodical manner (e.g., every
Strong research questions are specific, simple, fifth subject) [11]. This technique has high
meaningful, interesting, and answerable [33]. precision and allows ease of analyzing data
and measuring sampling errors [1]. However,
38.2.2.2  Develop the Survey the ordering of the list of subjects in the sam-
Instrument ple frame can create biases, certain groups
After a clearly stated research question has been may be excluded, and there may be a lack of
formulated, investigators may modify an existing efficiency [1].
survey or develop a new instrument [33]. The –– Stratified random sampling: Possible respon-
survey instrument acts as the interface between dents are divided into defined groups. Within
study objectives and responses of individuals sur- these groups, individuals are randomly sam-
veyed [21]. Strategies to identify item content pled by either simple or systematic sampling
include conducting focus groups of potential sub- [11]. This technique allows disproportionate
jects, discussing with an expert panel, or utilizing sampling to be possible and has high precision
the Delphi technique, a process by which items [1]. A disadvantage is that the advanced
are selected and ranked by a group of experts knowledge of a population can result in more
until a consensus is reached [33]. Regardless of complex analysis.
which strategy is employed, a team of colleagues –– Cluster sampling: The population is broken
should be involved in the development process to down into heterogeneous clusters, and specific
lend face validity to the instrument before it is clusters are sampled [11]. This method has
tested. lower costs and allows sampling of groups if
individuals are not available [1]. However, this
38.2.2.3  Identifying the Sampling method is more complex when analyzing data
Frame and evaluating sample errors, and it has low
It is difficult to administer a survey to all indi- precision.
viduals within a goal population as a result of the
population size or the hardship of identifying and In nonprobability sampling, individuals in a
contacting all possible respondents [11, 38]. As a population have an unequal chance of being
result, a fraction of the target population is sur- involved in the sample [11]. Nonprobability sam-
38  Reliability Studies and Surveys 351

pling designs include purposive sampling, quota


sampling, chunk sampling, and snowball sam- Sample selection
pling [1]. Sample Design
selection (probability or
type nonprobability) Features
–– Purposive sampling: Selected participants
Cluster Probability Population separated
meet specific criteria (e.g., they are hand sur- sampling into heterogeneous
geons) [11]. clusters and specific
–– Quota sampling: A selected number of respon- clusters are sampled
dents are selected based on specific character- Purposive Nonprobability Criteria-based
sampling selection
istics (e.g., a researcher sets a quota of 20%
Quota Nonprobability Select number of
female spine surgeons between the ages of 30 sampling respondents based on
and 50 for the sample) [11]. specific qualities
–– Chunk sampling: Participants are chosen Chunk Nonprobability Individual availability-
based on convenience (e.g., attendees of a sampling based selection
Snowball Nonprobability Selected individuals
conference) [11].
sampling who meet the criteria
–– Snowball sampling: Investigators find indi- select more
viduals who meet certain criteria, who then go participants
and find more individuals who meet the
­criteria [11].
38.2.3  Development
The degree of generalizability of a survey is
determined by similarity of respondents to nonre- 38.2.3.1  Question Stems
spondents [11]. It is rarely feasible to obtain data Each question stem included in a survey should
on how respondents and nonrespondents differ. be relevant to the overarching purpose of the sur-
The best way to overcome this setback is to strive vey and ideally relate to recent events or common
for a high response rate [11]. knowledge to produce high-quality survey data
[44]. Questions must be succinct and clear and
contain 20 or fewer words [11, 33]. Question
Fact Box 38.6 types that should be avoided include double-bar-
Study Design reled questions (combining two questions), the
Develop Identify halo effect (referencing influential sources that
Determine
Objective
Survey Sampling can influence a response), loaded questions, dou-
Instrument Frame
ble negatives, and a question with no comparator,
Sample selection modifiers (i.e., “almost everyone” or “usually”),
Sample Design or complex vocabulary [33, 38, 42]. Other tech-
selection (probability or niques for providing proper question stem struc-
type nonprobability) Features ture include using complete sentences and
Simple Probability All individuals equal
softening the impact of potentially controversial
random chance of selection
sampling questions [1, 44].
Systematic Probability Individuals selected at
random predetermined interval 38.2.3.2  Question Responses
sampling from list Responses should be concise and impartial.
Stratified Probability Individuals categorized
Response formats can be either open or closed [11,
random and randomly sampled
sampling through simple or 33]. Open responses allow subjects to answer
systematic random questions in free text. This format offers the advan-
sampling tage of minimizing limitations in responses.
However, free responses are more time-consum-
352 K. L. Wise et al.

ing, and results are more difficult to convert into stated, the most applicable, and noteworthy [44].
analyzable data [33]. Closed response formats Items can be grouped based on content to help
include nominal, ordinal, interval, binary, and ratio with respondents’ thought process and memory
measurements [11]. [33]. The filter technique allows responders to
omit questions based on prior responses, thus
–– Nominal responses: These responses involve a limiting frustration with irrelevant or non-appli-
list of mutually exclusive items (e.g., physi- cable questions [50].
cians, nurses, medical students) that reflect
qualitative distinctions [11]. 38.2.3.4  Survey Length
–– Ordinal responses: These responses imply a Surveys should be succinct. One study found sig-
ranked order. The traditional Likert scale, nificantly lower response rates to a threshold
with answers ranging from “strongly dis- level of 1000 words [20]. Though this threshold
agree” to “strongly agree,” is commonly used was not directly aimed at orthopedic surgeons,
in surveys [33]. This format allows the group- this study suggests that response rate may be con-
ing of like-minded attitudes and viewpoints. siderably affected by minor modifications in
An odd number of points allows a neutral questionnaire length.
response, while an even number of points
forces a commitment [33]. In a phenomenon 38.2.3.5  Survey Format
called the “floor” or “ceiling” effect, respon- The format of a survey is a vital part of presen-
dents tend to choose responses clustered at tation. There should be explicit instructions
the bottom or top of a scale. Increasing the throughout, as opposed to solely at the begin-
number of points on a scale may exacerbate ning of surveys. Arrows or symbols to provide
this phenomenon [33]. visual navigation are beneficial for assistance
–– Interval and ratio measurements: These mea- with survey completion [44]. The size and style
surements show continuous responses. A ratio of the font should be easy to read [11].
scale has an absolute zero (e.g., height and Emphasizing important words or phrases,
weight), while intervals do not have a true numbering questions, providing appropriate
zero (e.g., interval time of day; 1 PM to 2 PM spacing, and listing answers vertically instead
is the same as 5 PM to 6 PM because both are of horizontally are all helpful techniques [1,
1 h increments). 44]. It is helpful to create a polished, yet
unique, appearance that allows a survey to
It is helpful to involve a biostatistician to stand out from other questionnaires responders
ensure that answer choices are in a format where may receive.
the data can easily be analyzed.
Question responses that should be avoided 38.2.3.6  Cover Letters
include vague answer choices (e.g., “other” or Cover letters provide the initial opportunity to
“unknown”), absolute terms (e.g., “always” or persuade readers to complete surveys [44, 50].
“never”), abbreviations, complex answers, and The letter should clearly state the purpose of the
asking respondents to rank responses [11, 21]. survey and why participants were selected [33].
There should be an equal number of positive and Confidentiality and optional completion of the
negative option choices for scaled questions. survey should be conveyed [1, 50]. Details should
be included regarding when the survey should be
38.2.3.3  Chronology completed and who to contact if participants have
The chronology of the questions can have a major questions. Finally, the cover letter should contain
influence on the response rate of surveys. It is an affirmation that the recipient’s participation is
helpful to begin with demographic questions, as vital to the success of the survey and an expres-
these are simple and nonthreatening [33]. The sion of gratitude to the recipient for his or her
first non-demographic question should be clearly time [37].
38  Reliability Studies and Surveys 353

38.2.5.2  Postal
Fact Box 38.7 Postal delivery is the traditional method of sur-
Survey development vey administration. Postal surveys allow
Survey parts Features responders to complete questionnaires in pri-
Question stem Succinct, clear, avoid vacy, and interviewer distortion is minimized
objectionable questions
[6, 44, 50]. Postal surveys may also increase
Question “Open” (free text) or “closed”
response (structured) validity by giving responders time to formulate
Chronology Demographics first, personal answers and utilize various sources [44].
questions last Disadvantages of postal surveys include the
Survey length Concise (as short as possible to cost of supplies, the time necessary for the
convey idea)
labor of collecting responses, and the potential
Survey format Polished and unique with
instructions throughout decrease in accuracy when manually recording
Cover letter Recruits and attracts participants responses as opposed to responses automati-
cally being entered into a database [29, 44].

38.2.5.3  Fax
38.2.4  Pilot Testing Respondents of fax surveys may return the sur-
vey through fax or by postal mail. Data can be
Pilot testing allows trial on methods of sam- collected manually or with utilization of optical
pling, data collection, and survey administra- character recognition by some fax machines [29].
tion. Investigators ask colleagues or other Character recognition may enhance the accuracy
participants who are similar to desired respon- of the recorded data. Costs of fax surveys are
dents to evaluate questions and provide feed- similar to costs of postal surveys [44].
back [11]. This is important to evaluate for
mistakes, gauge the time needed for comple- 38.2.5.4  Electronic
tion, understand whether the survey conveys The current mainstay of survey administration is
the proposed message, and test to see if the sur- through electronic distribution [9, 21, 25, 47].
vey grabs the reader’s interest [50]. Electronic surveys may be web-based, where
respondents fill out a questionnaire on a website.
Alternatively, surveys can be conducted through
38.2.5  Methods of Survey email, as an attachment, or embedded in the text
Administration of the email.
Advantages of electronic surveys include
Surveys can be given over the phone, through ease of completion, ability to reach a large
postal mail, via fax, or electronically. The selected audience, cost-effectiveness, and the immedi-
administration technique depends on the targeted ate availability of responses [9, 21, 25, 29, 44,
audience, the type of information desired, finan- 47]. Simple descriptive statistics usually are
cial limitations, and whether test properties were embedded, allowing concurrent analysis for
constructed. researchers, while more complex statistical
analysis is generally accomplished by export-
38.2.5.1  Telephone ing data to statistical software [47]. This
Telephone surveys offer the advantage of increas- reduces time and resources, as well as the pos-
ing accuracy, as the administrator can ensure sibility of human error. Electronic surveys
question comprehension and complete response allow control of question order, thus prevent-
from the responder. Downfalls include the ing respondents from changing prior answers
expense, difficulty in achieving a high response [9]. Finally, email surveys have been showed to
rate, and the susceptibility to distortion and inter- be associated with a lower number of unan-
view bias [33, 44]. swered questions [29].
354 K. L. Wise et al.

There are many possible sources of weak-


ness with electronic surveys. Potential selec- Fact Box 38.8
tion bias is a challenge, as online surveys may Methods of survey administration
not necessarily be suitable for certain groups Method Advantage Disadvantage
of participants. This could lead to an inaccu- Telephone Increased accuracy of Difficult to
rate or incomplete cross section of the popula- response obtain high
response rate
tion [30]. Email surveys may be dismissed as Postal Confidentiality, Time to collect,
junk mail [29, 47]. It can be difficult accessing anonymity, cost of supplies,
desired mailing addresses. McPeake et al. [30] convenience, decreased
found that almost 10% of emails sent using a minimize interviewer accuracy from
distortion manual entry
1-year-old contact list were returned as
Fax Confidentiality, Time to collect,
undeliverable. anonymity, decreased
VanDenKerkhof and colleagues [47] discuss convenience, accuracy from
the following strategies as methods to aid with minimize interviewer manual entry
distortion
ease of online surveys:
Electronic Ease, cost, immediate Selection bias,
responses junk mail
1. Provide drop-down lists with “select one” as
the default option rather than an actual
response, as the latter may create bias.
2. Even if only one response is required, partici- 38.2.6  Response Rate
pants should be given the option of selecting
multiple responses or providing a comment in High response rates increase validity, enhance
a text box. the precision of parameter estimates, and reduce
3. A progression bar should be provided for
the risk of selection bias. Low response rates
respondents to gauge progress. increase the chance that respondents and nonre-
4. One question per screen simplifies the
spondents differ in a meaningful way, which may
visual field and makes the survey undermine the results of survey responses [11].
user-friendlier. Investigators can report the actual response rate
5. With a rising number of email surveys,
or the analyzable response rate. The actual
response can increase by sending an initial response rate includes respondents with partially
contact (i.e., a postcard). completed questionnaires and opt-out responses,
mirroring the sampling element. However, the
Overall, the accessibility and quick data analyzable response rate reflects a proportion of
collection of electronic surveys make these the sampling frame based on partial or full com-
favorable options for survey administrators. pletion of questionnaires [11]. A response rate of
However, postal and fax surveys have been at least 70% has been viewed as acceptable for
reported to have higher response rates than external validity [50]. However, response rates
telephone or electronic surveys [19, 25, 28, between 60% and 70% and sometimes less than
29]. Possible reasons for lower response rates 60% (e.g., for controversial topics) may be
with electronic surveys include lack of famil- acceptable [42]. There are a number of tech-
iarity with the Internet, inconsistent Internet niques that enhance the response rate of surveys.
access, lack of trust submitting confidential
information over the Internet, and survey satu- 38.2.6.1  Convenience
ration [30]. Using mix-mode designs can lead Orthopedic surgeons are busy individuals; thus
to increased response rate [1]. For example, an enhancing the convenience of survey participa-
electronic survey could be distributed first, tion is crucial for increasing participation. The
and a postal mail survey could be utilized for estimated time for completion of a survey should
the second distribution. be included in the cover letter (i.e., “less than 10
38  Reliability Studies and Surveys 355

min”) [30]. For email surveys, embedding the electronic surveys [15, 39]. If known, the current
link of the survey in the text rather than including response rate should be included in each reminder,
the survey as an attachment is likely to increase acting as an incentive for readers to participate.
response rate. If utilizing postal surveys, provid-
ing stamped return envelopes, as opposed to 38.2.6.6  Principal Investigator Contact
franked return envelopes or no envelope at all, is Finally, a personal gesture that may increase
a simple, low-cost, and effective method of response is having the principal investigator (PI)
increasing response rate [40]. reach out to subjects directly.

38.2.6.2  Precontacts
Informing participants of an upcoming survey
Fact Box 38.9
may enhance questionnaire participation. There
How to enhance response rate?
have been mixed reports on increasing response
• Convenience: decrease time and increase
rate with prenotification letters [1, 40, 50], though availability
it is thought that a prenotification is particularly • Precontacts: build anticipation and awareness
important with email surveys for these question- • Multiple contacts: Dillman approach [35] for
naires to avoid being viewed as “junk mail” and Internet and postal surveys
deleted [47]. A prenotification letter should be • Incentives: infrequently cost-effective but may
increase responses
personalized, concise, and positive. It should also • Reminders: increased responses in dose-dependent
aim at building anticipation as opposed to provid- relationship
ing excessive detail about the survey [1]. • PI contact subject her/himself

38.2.6.3  Multiple Contacts


Dillman et  al. described a method of contacting
nonrespondents numerous times with the intention 38.2.7  Reporting Surveys
of increasing response rate [1]. Each approach is
slightly different in this method, allowing multiple Study findings are reported with an introduction,
opportunities to catch more responders. For exam- methods, results, and discussion. The hypothesis
ple, multiple attempts may be made with a mail should be clearly stated. A brief summary of pre-
survey before attempting a telephone survey. testing, instrument revision, and method admin-
istration should be included. It is very important
38.2.6.4  Incentives to include the response rate. If applicable, com-
A multitude of incentives, such as money or mentary should be given on how respondents and
prizes, have been used with the aim of increasing nonrespondents differ, limitations, and sources of
survey completion [3, 8, 19]. The efficacy of potential bias [21]. Information gathered from
incentives does not have strong support in the lit- the survey and how this data relates to the origi-
erature; thus the cost-effectiveness of this strat- nal hypothesis should be discussed. Finally, sig-
egy is questionable [44]. nificance of results should be highlighted, as well
as directions for future research and implications
38.2.6.5  Reminders for research, education, and clinical care.
Reminders can increase response rates to surveys
[31]. With postal surveys, each additional mailed
reminder may increase response up to 30–50% 38.2.8  Costs Associated with Surveys
from the initial response rate [42]. Dillman and
colleagues [1] proposed sending up to 3 follow-up Survey fees include labor costs, pretesting, sur-
reminders for nonrespondents—1 sent a week vey administration, repeat survey administration
after the initial mailing and 2 sent between 3 and for nonresponders, and data analysis [44]. Some
7  weeks after the initial mailing. The use of of these costs are eliminated with the utilization
reminders has also been shown to be effective with of Internet surveys.
356 K. L. Wise et al.

38.2.9  Ethical Considerations Take-Home Message


• Reliability studies gauge the reproducibility
Research should be approved by an ethics review of a measuring instrument, while surveys
committee prior to the administration of a survey allow rapid and cost-effective data collection
[27]. The voluntary nature of a survey and confi- of a large population. An understanding of the
dentiality should be emphasized in the cover let- components and administration of both stud-
ter [42]. Confidentiality can be obtained by using ies will allow optimal utilization and appraisal
codes, destroying questionnaires quickly after of these study types within orthopedic
data collection, or removing identifying informa- literature.
tion on the questionnaire. Furthermore, partici-
pants should be given comprehensive information
regarding the purpose of the study, the research
sponsors, and who to contact for further ques- References
tions [1, 27].
1. Aday LA, Cornelius LJ.  Designing and conducting
health surveys: a comprehensive guide. 3rd ed. San
Francisco: Jossey-Bass; 2006.
38.2.10  Limitations of Surveys 2. Adler J, Parmryd I. Quantifying colocalization by cor-
relation: the Pearson correlation coefficient is supe-
rior to the Mander’s overlap coefficient. Cytometry A.
Surveys are limited as research tools. They rely 2010;77(8):733–42.
on respondents’ memory, honesty, and under- 3. Asch DA, Christakis NA, Ubel PA. Conducting physi-
standing of the survey text, all components which cian mail surveys on a limited budget. A randomized
trial comparing $2 bill versus $5 bill incentives. Med
are difficult to measure [21]. Surveys, by nature,
Care. 1998;36(1):95–9.
can be restrictive, unless interviews or open- 4. Audigé L, Bhandari M, Kellam J.  How reliable are
ended questions are used. However, free reliability studies of fracture classifications? A sys-
responses and personal interview have the set- tematic review of their methodologies. Acta Orthop
Scand. 2004;75(2):184–94.
back of more difficulty with the conversion of
5. Avery DM, Matullo KS. Distal radial traction radio-
data into an analyzable form. While surveys can graphs: interobserver and intraobserver reliability
establish a relationship between variables, a fur- compared with computed tomography. J Bone Joint
ther downside is the inability of surveys to estab- Surg Am. 2014;96(7):582–8.
6. Baron G, De Wals P, Milord F. Cost-effectiveness of a
lish causality [21]. Finally, surveys are limited by lottery for increasing physicians’ responses to a mail
the response rate, which can potentially cast survey. Eval Health Prof. 2001;24(1):47–52.
doubt on the accuracy of survey data. 7. Bergk V, Gasse C, Schnell R, Haefeli WE.  Mail
surveys: obsolescent model or valuable instrument
in general practice research? Swiss Med Wkly.
2005;135(13–14):189–91.
38.2.11  Summary 8. Bhandari M, Devereaux PJ, Swiontkowski MF,
Schemitsch EH, Shankardass K, Sprague S, et al.
A randomized trial of opinion leader endorse-
A survey is a relatively easy, quick, and inexpen- ment in a survey of orthopaedic surgeons: effect
sive form of research that has the capability of on primary response rates. Int J Epidemiol.
reaching a large population of individuals. 2003;32(4):634–6.
Surveys directed at orthopedic surgeons can pro- 9. Braithwaite D, Emery J, De Lusignan S, Sutton
S.  Using the Internet to conduct surveys of health
vide valuable contributions to the literature, as professionals: a valid alternative? Fam Pract.
they assist in furthering understanding of current 2003;20(5):545–51.
orthopedic ideas and practices. Each stage of sur- 10. Bruinsma WE, Guitton TG, Warner JJP, Ring D,

vey development and administration should be Science of Variation Group. Interobserver reli-
ability of classification and characterization of
carefully planned to ensure study objectives are proximal humeral fractures: a comparison of two
answered, bias is minimized, and generalizability and three-dimensional CT.  J Bone Joint Surg Am.
is increased. 2013;95(17):1600–4.
38  Reliability Studies and Surveys 357

11. Burns KEA, Duffett M, Kho ME, Meade MO,


vey reliability and validity. Thousand Oaks: Sage;
Adhikari NKJ, Sinuff T, et al. A guide for the design 1995. p. 5–32.
and conduct of self-administered surveys of clini- 27. Mailey SK.  Increasing your response rate for mail
cians. CMAJ. 2008;179(3):245–52. survey data collection. SCI Nurs. 2002;19(2):78–9.
12. Cohen J.  Weighted kappa: nominal scale agreement 28. Mavis BE, Brocato JJ.  Postal surveys versus elec-
with provision for scaled disagreement or partial tronic mail surveys. The tortoise and the hare revis-
credit. Psychol Bull. 1968;70(4):213–20. ited. Eval Health Prof. 1998;21(3):395–408.
13. Corona J, Sanders JO, Luhmann SJ, Diab M, Vitale 29. McMahon SR, Iwamoto M, Massoudi MS, Yusuf HR,
MG.  Reliability of radiographic measures for infan- Stevenson JM, David F, et al. Comparison of e-mail,
tile idiopathic scoliosis. J Bone Joint Surg Am. fax, and postal surveys of pediatricians. Pediatrics.
2012;94(12):e86. 2003;111(4 Pt 1):e299–303.
14. Duffett M, Burns KE, Adhikari NK, Arnold DM,
30. McPeake J, Bateson M, O’Neill A.  Electronic

Lauzier F, Kho ME, et al. Quality of reporting of sur- surveys: how to maximise success. Nurse Res.
veys in critical care journals: a methodologic review. 2014;21(3):24–6.
Crit Care Med. 2012;40(2):441–9. 31. Nakash RA, Hutton JL, Jørstad-Stein EC, Gates
15. Fischbacher C, Chappel D, Edwards R, Summerton S, Lamb SE. Maximising response to postal ques-
N. Health surveys via the Internet: quick and dirty or tionnaires—a systematic review of randomised tri-
rapid and robust? J R Soc Med. 2000;93(7):356–9. als in health research. BMC Med Res Methodol.
16. Fisher R. Statistical methods for research workers. 5th 2006;6:5.
ed. Edinburgh: Oliver and Boyd Ltd.; 1925. 32. Pappas N, Lawrence JT, Donegan D, Ganley T, Flynn
17. Gaumétou E, Quijano S, Ilharreborde B, Presedo
JM.  Intraobserver and interobserver agreement in
A, Thoreux P, Mazda K, et  al. EOS analysis of the measurement of displaced humeral medial epi-
lower extremity segmental torsion in children condyle fractures in children. J Bone Joint Surg Am.
and young adults. Orthop Traumatol Surg Res. 2010;92(2):322–7.
2014;100(1):147–51. 33. Passmore C, Dobbie AE, Parchman M, Tysinger

18. Giraudeau B, Mary JY.  Planning a reproducibility J.  Guidelines for constructing a survey. Fam Med.
study: how many subjects and how many replicates 2002;34(4):281–6.
per subject for an expected width of the 95 per cent 34. Penson DF, Wei JT. Clinical research methods for sur-
confidence interval of the intraclass correlation coef- geons. 1st ed. Totowa: Humana Press; 2006.
ficient. Stat Med. 2001;20(21):3205–14. 35. Pomerantz ML, Glaser D, Doan J, Kumar S, Edmonds
19. Hocking JS, Lim MSC, Read T, Hellard M.  Postal EW.  Three-dimensional biplanar radiography as a
surveys of physicians gave superior response rates new means of accessing femoral version: a com-
over telephone interviews in a randomized trial. J Clin parative study of EOS three-dimensional radiogra-
Epidemiol. 2006;59(5):521–4. phy versus computed tomography. Skelet Radiol.
20. Jepson C, Asch DA, Hershey JC, Ubel PA.  In a
2015;44(2):255–60.
mailed physician survey, questionnaire length had a 36. Richards BS, Sucato DJ, Konigsberg DE, Ouellet

threshold effect on response rate. J Clin Epidemiol. JA. Comparison of reliability between the Lenke and
2005;58(1):103–5. King classification systems for adolescent idiopathic
21. Jones D, Story D, Clavisi O, Jones R, Peyton P. An scoliosis using radiographs that were not premea-
introductory guide to survey research in anaesthesia. sured. Spine (Phila Pa 1976). 2003;28(11):1148–56;
Anaesth Intensive Care. 2006;34(2):245–53. discussion 1156–7.
22. Karanicolas PJ, Bhandari M, Kreder H, Moroni A, 37. Roberts LM, Wilson S, Roalfe A, Bridge P.  A ran-
Richardson M, Walter SD, et  al. Evaluating agree- domised controlled trial to determine the effect on
ment: conducting a reliability study. J Bone Joint Surg response of including a lottery incentive in health
Am. 2009;91(Suppl 3):99–106. surveys [ISRCTN32203485]. BMC Health Serv Res.
23. Lee KM, Chung CY, Park MS, Lee SH, Cho JH, Choi 2004;4(1):30.
IH. Reliability and validity of radiographic measure- 38. Rubenfeld GD. Surveys: an introduction. Respir Care.
ments in hindfoot varus and valgus. J Bone Joint Surg 2004;49(10):1181–5.
Am. 2010;92(13):2319–27. 39. Schleyer TK, Forrest JL.  Methods for the design

24. Lee KM, Lee J, Chung CY, Ahn S, Sung KH, Kim TW, and administration of web-based surveys. J Am Med
et al. Pitfalls and important issues in testing reliability Inform Assoc. 2000;7(4):416–25.
using intraclass correlation coefficients in orthopaedic 40. Shiono PH, Klebanoff MA. The effect of two mailing
research. Clin Orthop Surg. 2012;4(2):149–55. strategies on the response to a survey of physicians.
25. Leece P, Bhandari M, Sprague S, Swiontkowski MF, Am J Epidemiol. 1991;134(5):539–42.
Schemitsch EH, Tornetta P, et  al. Internet versus 41.
Shrout PE, Fleiss JL.  Intraclass correlations:
mailed questionnaires: a controlled comparison (2). J uses in assessing rater reliability. Psychol Bull.
Med Internet Res. 2004;6(4):e39. 1979;86(2):420–8.
26. Litwin MS.  How to measure survey reliability and 42. Sierles FS. How to do research with self-administered
validity. In: Litwin MS, editor. How to measure sur- surveys. Acad Psychiatry. 2003;27(2):104–13.
358 K. L. Wise et al.

43. Sim J, Wright CC.  The kappa statistic in reliability 47. VanDenKerkhof EG, Parlow JL, Goldstein DH, Milne
studies: use, interpretation, and sample size require- B.  In Canada, anesthesiologists are less likely to
ments. Phys Ther. 2005;85(3):257–68. respond to an electronic, compared to a paper ques-
44. Sprague S, Quigley L, Bhandari M. Survey design in tionnaire. Can J Anaesth. 2004;51(5):449–54.
orthopaedic surgery: getting surgeons to respond. J 48. Walter SD, Eliasziw M, Donner A. Sample size and
Bone Joint Surg Am. 2009;91(Suppl 3):27–34. optimal designs for reliability studies. Stat Med.
45. Sudman S.  Applied sampling. In: Rossi PH, Wright 1998;17(1):101–10.
JD, Anderson AB, editors. Handbook of survey 49. Wright RW, MARS Group. Osteoarthritis clas-

research. San Diego: Elsevier; 1983. p. 145–94. sification scales: interobserver reliability and
46. Thelen P, Delin C, Folinais D, Radier C. Evaluation of arthroscopic correlation. J Bone Joint Surg Am.
a new low-dose biplanar system to assess lower-limb 2014;96(14):1145–51.
alignment in 3D: a phantom study. Skelet Radiol. 50. Zelnio RN. Data collection techniques: mail question-
2012;41(10):1287–93. naires. Am J Hosp Pharm. 1980;37(8):1113–9.
Registries
39
R. Kyle Martin, Andreas Persson, Håvard Visnes,
and Lars Engebretsen

Clinical Vignette: Part I Sally responds by saying, “I had a feeling it


Sally is a 17-year-old female who has sus- might be my ACL. I spoke with two of my
tained an isolated injury to her non-domi- friends who had their ACLs reconstructed;
nant knee and presents for a surgical one had her hamstring used as a graft and
consultation with Dr. Hansen. Examination the other used her patella tendon. Which
and imaging reveal an isolated complete one do you recommend? Is there a differ-
tear of her anterior cruciate ligament ence in failure rate?”
(ACL). She is an elite handball player who
hopes to return to her sport in the future.
Her diagnosis is explained, and surgical
reconstruction of her ACL is recommended.
39.1 Background/Introduction

Information from registry data can be used to


guide and improve patient care and help answer
This chapter will explore the expanding role of registries questions like these that are commonly seen in
in orthopaedic surgery with a focus on why these data- clinical practice [16]. In medicine, a registry is a
bases are so important. It will include examples from the
Norwegian Knee Ligament Register and tips on how to
maximize the usefulness of a register.
R. K. Martin (*) · L. Engebretsen H. Visnes
Division of Orthopedics, Department of Orthopaedic Division of Orthopedics, Department of Orthopaedic
Surgery, Oslo University Hospital, Oslo, Norway Surgery, Oslo University Hospital, Oslo, Norway
Faculty of Medicine, University of Oslo, Oslo Sports Trauma Research Center, The Norwegian
Oslo, Norway School of Sport Sciences, Oslo, Norway
Oslo Sports Trauma Research Center, The Norwegian The Norwegian Knee Ligament Registry, Orthopedic
School of Sport Sciences, Oslo, Norway Department, Haukeland University Hospital,
e-mail: ummart36@myumanitoba.ca Bergen, Norway
A. Persson Department of Orthopaedic Surgery, Haukeland
Division of Orthopedics, Department of Orthopaedic University Hospital, Bergen, Norway
Surgery, Oslo University Hospital, Oslo, Norway
The Norwegian Knee Ligament Registry, Orthopedic
Department, Haukeland University Hospital,
Bergen, Norway

© ISAKOS 2019 359


V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_39
360 R. K. Martin et al.

standardized database prospectively collecting Group [26, 27]. The working group outlined the
information on a patient population with a com- rationale for the inclusion of PROMs in the
mon disease or intervention and followed over arthroplasty registries and noted that several have
time [6]. incorporated PROMs into their data collection
The first known medical registry was the [27]. They also made several recommendations
National Leprosy Registry of Norway which was regarding the use of these outcome measures in
established in 1856 [14, 15]. More than a century arthroplasty registers [26].
later, the Swedish Knee Arthroplasty Registry
became the first national registry in orthopaedic
surgery in 1975 [17]. Finland (1980), Norway Fact Box 39.1
(1987), and Denmark (1995) also created arthro- Drolet and Johnson developed a definition
plasty registers and were soon expanded to of a medical registry based on the presence
include all joint replacements. The primary objec- of five characteristics which they called
tive of the first arthroplasty registers was the early MDR-OK. The information must be merge-
detection of inferior results based on implant revi- able (M) into a centralized data set, use stan-
sion data. This proved successful in 1995 when dardized data (D), and must follow a
two studies identified inferior implants at an early protocol outlining rules (R) for systematic
stage, a determination made possible through the and prospective data collection. Additionally,
arthroplasty registry [8, 11, 12]. the observation (O) of patient data is fol-
Today, the use of registries in orthopaedic sur- lowed over time to collect knowledge (K)
gery has expanded to include many ­subspecialties about patient outcomes and results. The
other than arthroplasty, and several national, authors state that all five characteristics need
regional, and local registries have been devel- to be present in order to differentiate a medi-
oped. Building on the early experience of the cal data registry from a non-registry data-
arthroplasty registers, important changes have base [6].
also been made to the data collection and out-
come measures that are captured. The first knee
ligament surgery registry was created in Norway
in 2004, followed in 2005 by Sweden and Clinical Vignette: In the Operating Room
Denmark [9]. While revision surgery and conver- Dr. Ng is just beginning a standard ACL
sion to total knee arthroplasty were determined to reconstruction in a 26-year-old soccer
be important outcome measures, it was also rec- player. The initial injury occurred 6 months
ognized that inferior results and graft failures ago, and the clinical evaluation and MRI
may not always go on to further surgery and both suggested an isolated ACL rupture. In
therefore may go undetected by the registry [8]. the preoperative waiting room, the patient
To account for this, the knee ligament registries stated that he had a giving-way recently
also include patient-reported outcome measures with immediate joint swelling. Upon inser-
(PROMs), specifically the Knee Injury and tion of the arthroscope, Dr. Ng finds a grade
Osteoarthritis Outcome Score (KOOS) [28], pre- four cartilage lesion which likely devel-
operatively and at standard postoperative time oped during the recent instability episode.
points. Thus, the ability to detect inferior results He has previously managed similar lesions
and early failures is improved without relying on with microfracture treatment, but he recalls
subsequent surgery as the sole end point. a recent registry study that found micro-
In 2014 the International Society of fracture at the time of ACL reconstruction
Arthroplasty Registries created a PROM Working
39 Registries 361

the study of several end points and exposures at


was associated with worse patient-reported the same time.
outcomes than debridement [29]. He there- While randomized control trials (RCTs) are
fore chooses to change his clinical practice considered the gold standard research design for
and performs a debridement of the lesion evaluating and comparing interventions, they are
and then continues on with ACL often not practical or possible to perform owing
reconstruction. to ethical, financial, procedural, or other barriers.
RCTs also often have stringent inclusion and
exclusion criteria which limit the external valid-
ity and the generalizability of the results to
39.2 Importance of Registries patients seen in clinical practice. In contrast,
well-designed observational studies from registry
The goal of orthopaedic registries is to improve data can offer similar results regarding treatment
healthcare delivery via prospective surveillance effects while largely avoiding the hurdles faced
of surgical outcomes. Continuous feedback to by RCTs. These studies can be complementary to
hospitals and surgeons allows comparison to RCTs by assessing real-world applicability of the
national averages and can identify best clinical experimental findings and may serve to generate
practices [8]. This, in turn, encourages improve- new ideas for future RCTs [1, 4, 5, 13, 22, 24].
ment by setting and continually updating the Registries in orthopaedic surgery also have
standard of care [19, 20]. The detection of proce- their limitations. Nonrandomized cohort studies
dures and devices that result in early failure can are subject to bias from confounding variables
be identified by following revision or reoperation which must be corrected for, either by selection
rates and deterioration in patient-reported out- of homogeneous subgroups or through multiple
come measures [8, 31]. Prognostic variables regression analysis [10]. Even when possible risk
associated with good and poor outcomes can be factors are considered in a regression model,
ascertained via large cohort studies performed on there is always a risk of unmeasured confounding
the data contained within the registry [8]. in observational studies due to variables that are
Epidemiological trends and the burden of disease either not considered in the model or not yet rec-
can also be followed for changes over time. ognized or collected. Compliance is essential for
One of the biggest advantages of national reg- an effective registry database and can be difficult
istries is the ability to include a high volume of to achieve, especially in the initial stages if no
data over time. This creates a large database of prior database existed. Further, the need for a
short- and long-term follow-ups from which high-response rate can conflict with the goal of
cohort studies can be undertaken. This confers optimizing the amount of useful data that is to be
several important strengths: (1) there is little or collected. To encourage form completion by the
no selection bias to influence these large data surgeons and patients, details regarding demo-
sets; (2) the data already exists at the onset of a graphics, diagnosis, surgical procedure, and
study, making the analysis less time-consuming implants, along with objective and subjective
and more cost-effective; (3) if subsequent surgery outcome measures, must be streamlined to avoid
on included patients can be linked with a per- survey fatigue and non-compliance. Finding the
sonal health identification number, there is no optimal balance between details collected and the
risk for attrition bias; (4) data is collected inde- surgeon’s precious time can prove to be difficult.
pendently of future research questions—there is Consequently, the relevance of every requested
no differential misclassification; and (5) the high input has to be carefully considered by the steer-
volume of cases and data collected also allows ing board members to keep compliance high.
362 R. K. Martin et al.

Selecting which outcome measures and vari-


Clinical Vignette: The Surgeon’s Perspective ables to record has been eluded to above. Revision
Dr. Zhang has just left the operating room surgery and consequent conversion to arthro-
following a routine ACL reconstruction plasty are two end points that are clear and indis-
with quadrupled hamstring tendon auto- putable. In addition, patient-reported outcome
graft. With him, he has the barcode stickers measures offer a subjective end point useful in
from the femoral suspension fixation identifying inferior results that may not proceed
device and for the interference screw he to further surgery. Selection of patient-based sub-
used on the tibia. He sits down with these jective outcome measures to include should take
stickers and begins filling out the one-page several variables into consideration. These
National Knee Ligament Registry form include validation for use given the target popula-
(Fig. 39.1). The stickers are affixed to the tion, availability in multiple languages, cost, and
corresponding locations on the back of the completion should be self-explanatory and fast
form. The completed form will be for- (ideally less than 10 min) [8, 26].
warded to the central registry database, and Data recorded by the surgeon should be mini-
the written patient consent and a copy of mal and necessary, generally limited to a one-
the form will remain archived locally at the page reporting system. As it can be beneficial to
patient’s hospital. The entire process took compare data between registries, it is also impor-
less than 2 min, and Dr. Zhang is ready to tant to use a core minimum data set that is in use
see his next patient. across several different registries. In the
­development of the NKLR, this data was chosen
based on the following three criteria [8]:
39.3 Research Questions
1. Can the question addressed be clearly speci-
When creating a registry, it is important to under- fied and justified?
stand the research questions that can be answered 2. Is the question clinically relevant?
through the database as this will dictate what 3. Can the item be completed postoperatively
information should be recorded. Generally speak- while dictating the surgery notes, not needing
ing, registries can be used to track epidemiology to seek information from other sources?
and provide quality assessment of treatment pro-
tocols, devices, and outcomes for a defined Finally, as medical practice is dynamic and
patient population. Regional variations within a ever-changing, variables should be re-evaluated
country can also be identified. on a regular basis and changes made according to
The Norwegian Knee Ligament Registry current practice and evolving literature.
(NKLR) provides an example of how this infor-
mation can be used to provide quality control.
Epidemiological data gives an actual number of
ACL reconstructions and revision surgeries per- Fact Box 39.2
formed on an annual basis. Using this data, the Avoid the desire to collect as much infor-
NKLR established that a failure of a specific mation as possible. An inverse relationship
device is suspected if only 14 patients with this between the amount of information
device are identified as having failed, based on requested and the quality of the data
recorded outcome measures [8]. This early warn- obtained has been demonstrated.
ing system is ongoing, and problems may be Completeness and accuracy of the data are
identified long before they would have been where the value lies, so do not make the
uncovered by traditional methods such as RCTs. data set exhaustive [25]. Incomplete data is
While causality of failure may not always be evi- often useless data.
dent in these cases, it raises flags and directs fur-
ther assessment and research.
39 Registries 363

Fig. 39.1  Norwegian National Knee Ligament Registry Perioperative surgeon registration form
364 R. K. Martin et al.

is obtained from the patient and from the surgeon at


Clinical Vignette: The Patient Perspective prespecified time points. In general, baseline sub-
Robert is a 23-year-old student who rup- jective data is obtained from the patient preopera-
tures his ACL playing recreational basket- tively and repeated at defined intervals
ball and is referred to see an orthopaedic postoperatively. Information related to the diagno-
surgeon. His diagnosis and options are sis, surgical findings, procedure(s), implants, and
discussed, and he agrees to proceed with complications is recorded by the surgeon immedi-
ACL reconstruction. On the day of sur- ately following the surgery [8] (Fig. 39.2). Barcode
gery, he is given two forms for completion stickers from implanted devices can often be
in the preoperative holding area. The first scanned or included with the report form to ensure
is an informed consent form for inclusion accurate tracking [8, 10]. Some registries also
in the knee ligament registry. It includes include clinical and/or radiological follow-up data
information about the NKLR, the type of at subsequent postoperative visits [18, 21].
information recorded, data protection, and All documents are sent to a central location to
the follow-up procedure and informs be checked for completeness and stored in the reg-
Robert that he may be invited to partici- istry database. Incomplete forms are returned to the
pate in future research projects. He also sender for completion to assure data quality and
learns that he may withdraw from the completeness. A copy of the form is also retained
NKLR at any time and that confidentiality in the patient’s hospital chart. In Norway, a small
is ensured. The second form is a Knee staff is responsible for the day-to-day operations of
Injury and Osteoarthritis Outcome Score the NKLR, overseen by an advisory board. This
form, which he completes in approxi- includes a secretary, a computer engineer, and an
mately 10 min. administrative head of the registry. Each participat-
After surgery, Robert progresses ing hospital employs secretarial assistance, and the
through his rehabilitation and experiences NKLR also has access to experienced statisticians
a regular postoperative course. At 2, 5, and for registry studies [8]. In 2017, the operating bud-
10 years after surgery, he receives a KOOS get for the central NKLR office was 1,800,000
form in the mail or e-mail for completion. NOK (approximately $218,000 USD) and includes
Three additional questions are included at the salary of additional staff involved in NKLR-
the end of the form inquiring (1) whether based research projects.
he has sustained any new injuries or had Orthopaedic registries are most often publicly
further surgery on the operated knee; (2) if financed either through direct funding or through
so, what was injured; and (3) if he re-rup- grants. The Norwegian, Swedish, and Danish
tured his ACL, how was the diagnosis made knee ligament registries are financed through the
(MRI, arthroscopy, physician examination, national public health system. Similarly, the
or examination by other healthcare profes- Canadian Joint Replacement Registry receives
sional). If Robert forgets to submit his funding from federal and provincial government
KOOS, he will receive a reminder after sources, via the Canadian Institutes for Health
3 months. Information [3]. Privately owned and financed
registries also exist, for example, in New
Zealand. Collaboration with the local and
national orthopaedic association is important in
39.4 Structure of a Registry all cases to advocate for continued financing and
support ongoing activities of the registry. In gen-
The registry collects prospective information on all eral, it is advisable for registries to avoid indus-
patients in a defined population. Patient consent to try sponsorship to maintain objectivity and avoid
participate may or may not be required, based on bias which may affect research and clinical
the specific national privacy legislation. Information practice.
39 Registries 365

Surgery form Statistics Norway


Surgeon Emigration/death
Incomplete form
data
Database Norwegian Patient Registry
KOOS
Data for compliance
Patient assessment
Reminders

Requested Data request


data2 for research1

Researchers Annual reports


Analyses in brief

1
Protocol approved by the NKLR board members; subject to regional ethics evaluation and applicable laws.
2
Deidentified

Fig. 39.2  Norwegian National Knee Ligament Registry flow diagram

39.5 Data Reporting then compared with the information received by


the registry over the same time period to deter-
Data contained in the registry can be released for mine the compliance rate. Any discrepancies are
evaluation via annual reports, hospital-specific included in the annual report as hospital-specific
data requests, or more extensive research proj- compliance rates.
ects. Annual reports present descriptive informa- Registries may report only national and
tion including compliance rates and are often regional data, keeping surgeons unidentified, or
broken down into regional subsets for compari- they may use the information to produce surgeon-
son to the national and historical data. Tables and specific reports [8]. In one Canadian province,
figures are used to present overall and regional annual surgeon-specific reports are generated for
epidemiology, survivorship curves, complica- all hip and knee arthroplasties, hip fractures,
tions, and other information (Fig.  39.3). These rotator cuff repairs, and knee meniscectomies.
reports are frequently published online for public These report cards allow hospitals and individual
viewing. surgeons to compare their data with that of their
Since attaining and maintaining a high com- peers. Further, the regional Standards and Quality
pliance rate are paramount to the success of a Committee also reviews the reports and notifies
registry, it follows that accurate determination of surgeons that fall below the regional averages. If
this rate is of equal importance. Hospital, outcomes remain below the average for 2 or more
regional, or national databases that capture infor- years, the committee meets with the surgeon to
mation related to the number of procedures per- review the data and to develop a plan to correct
formed are often used as the denominator in the identified issues. This initiative is possible
determining the compliance rate of registries. owing to the mandatory nature of regional regis-
For example, in Norway, hospitals track their try participation, strong leadership of the steering
procedures and send the information to the cen- group, and legislation protecting the reports from
tral office for reimbursement according to the court subpoena [2, 30]. Concerns over surgeon
diagnosis-related groups schedule. This data is confidentiality, legal ramifications, and the poten-
366 R. K. Martin et al.

Fig. 39.3 Kaplan- Primary Isolated ACL Reconstruction


Meier survivorship 100
curve of primary ACL
reconstruction from the
Norwegian National
Knee Ligament Registry 90

Percent Without Reoperation


2017 annual report

80

70 2004-2006

2007-2009

2010-2016
60

0 2 4 6 8 10 12 14
Years After Primary Operation

tial effect on compliance rate limit the effective-


ness of this model in other jurisdictions. One Clinical Vignette: Obtaining Data for a Study
must also be aware that using end points in a reg- Dr. Perrin wishes to compare the revision
ister as an indicator of healthcare quality might rates between ACL reconstructions per-
be influenced by differences in patient popula- formed using autograft and allograft over
tions between hospitals/clinics that might affect the past 5 years, using the national registry
the outcome measure. For example, surgical cen- data. To request this data, a written applica-
tres that perform more complex procedures or tion is made to the steering committee. In
treat high-risk patients may have outcomes that it, Dr. Perrin includes:
differ from the national average.
Access to information in addition to the annual 1. A complete project proposal, including
report is also encouraged. Requests for hospital- the name of the principle investigator
specific data often require approval by the advi- who bears responsibility for the project
sory board and are made by the official hospital 2. A description of the problem
contact person. This potentially patient- and sur- 3. The data selection requested and the

geon-identifiable information can be used by the variables to be used in the analysis
hospital for evaluation and local quality improve- 4. The publication plan and timeline
ment projects. More extensive requests for infor-
mation to be used in research projects can be Dr. Perrin also submits an application to
applied for in writing to the governing body, and the local Health Research Ethics Board.
additional Health Research Ethics Board approval The steering committee reviews the appli-
may also be required [8].
39 Registries 367

related to the necessity of obtaining consent must


cation and determines if the purpose of the ultimately involve regional Heath Research Ethics
study is aligned with the purpose of the Boards and established legislation.
national registry. The application is further Legislation can influence compliance with
evaluated on the professional relevance of national registries in other ways. In Denmark, the
the project, the capacity of the investigator hospitals do not receive reimbursement for cruci-
to produce a satisfactory analysis using the ate ligament surgeries that have not been reported
data, and with consideration of other ongo- to their national knee ligament registry. Creating
ing projects that may be similar. Once sat- a user-friendly data submission environment for
isfied, approval is granted and the data is the surgeon and introducing policy mandating
released for the study. compliance is a powerful way to maintain a high
rate of data completion on a national level.
Perception of the registry in the eyes of the
39.6 Pearls public and the orthopaedic community also plays
a role in compliance. This begins on the first
The strength of a registry lies in the completeness encounter with each patient which should include
and accuracy of the data [25], and therefore com- a discussion on the importance of the registry and
pliance is crucial. This can be difficult to achieve an overview of their contribution. It is also an
for several reasons that have been described previ- opportunity to build a good rapport that may
ously. Some ways to improve compliance include influence their future compliance. The regular
the way in which data is collected, rules and laws publication of reports highlighting trends and
governing its collection, and perception regarding important findings, both positive and negative,
the importance and uses of the registry. provides constant reminder that the data is being
Whether paper-based or electronic forms are collected for a purpose. Major publications that
used, they should be user-friendly and not time- may change clinical practice further reinforce the
consuming. Only the necessary data should be importance of participation in the registry.
included, and if possible, as much of the data Finally, data should be used to foster best medi-
should be recorded in advance (i.e. operating cal practice, rather than seeking to identify and
room nurses can enter the operative information punish individual surgeons who may fall below
and scan the implant barcodes during the surgery this current standard. A supportive community
[30]). Missing data should be flagged immedi- approach to the registry should be sought to
ately and a notice sent to the surgeon to address ensure that all surgeons feel comfortable partici-
the deficiency before too much time has passed pating and remain open to the feedback it
after the surgery. provides.
The collection of personal information is and
always should be an important discussion. Several
jurisdictions require informed consent prior to 39.7 Future Directions
enrollment in a registry, while others do not.
Clearly this can play a role in compliance, with Moving forward, registries will continue to play
one national registry reporting that 31% of the an important role in orthopaedic surgery. As more
submitted forms were missing the corresponding national registries are established, there is a
consent [3]. Strict adherence to confidentiality vision to create a common international knee
standards including secure data storage and lim- ligament registry in Europe. The development of
ited authorized access may bridge the gap between a common software program to collect and store
privacy concerns and data collection. Decisions the data is an additional goal that could be used
368 R. K. Martin et al.

by those nations who will not join an interna- Take-Home Message


tional database for legal or other reasons. • Registries are an indispensable research tool
Standardizing data collection across several in orthopaedic surgery that can improve
nations would have several ­ benefits including patient care and lead to significant changes in
increased power for large studies and the ability clinical practice.
to directly compare data from one part of the • A thorough understanding of the strengths and
world with another. Finally, there is ongoing limitations of these databases is important for
effort to expand registries to include the nonop- those involved in registry-based study design
erative management of orthopaedic conditions and the interpretation of results.
such as ACL deficiency which are not currently • In the future, increased international collabo-
captured in most databases [7]. ration, focus on patient-reported outcome
measures, and the inclusion of non-surgical
management strategies will further enhance
the value of these registries.
Clinical Vignette: Part II
Sally is still in Dr. Hansen’s office and he is
reflecting on her questions. Regarding the References
differences between hamstring tendon and
patella tendon autografts, he discusses the 1. Benson K, Hartz AJ. A comparison of observational
pros and cons of each with respect to donor studies and randomized, controlled trials. N Engl J
site morbidity, anterior knee pain, skin Med. 2000;342:1878–86. https://doi.org/10.1056/
NEJM200006223422506.
incisions, and complications including 2. Bohm ER. Personal communication with Dr. E. Bohm,
infection rates. He advises her that either Professor of Surgery, University of Manitoba; Chair,
one is a perfectly acceptable option but also Canadian Joint Replacement Registry Advisory
mentions a recent review of the Norwegian Committee; Chair, Manitoba Provincial Orthopaedic
Standards and Quality Committee. 2017.
Knee Ligament Registry suggesting a revi- 3. Bohm ER, Dunbar MJ, Bourne R.  The Canadian
sion rate of hamstring tendons that was Joint Replacement Registry—what have we
twice that of patella tendon grafts [23]. As learned? Acta Orthop. 2010;81:119–21. https://doi.
Dr. Hansen is a surgeon in Norway, his org/10.3109/17453671003685467.
4. Comber H, Perry IJ. Observational studies for inter-
patients were included in that cohort study. vention assessment. Lancet. 2001;357:2141–2.
Since its publication he now recommends https://doi.org/10.1016/S0140-6736(00)05219-3.
patella tendon autograft to patients like 5. Concato J, Shah N, Horwitz RI.  Randomized, con-
Sally while still presenting all options. trolled trials, observational studies, and the hierarchy
of research designs. N Engl J Med. 2000;342:1887–
Sally is satisfied and agrees to proceed with 92. https://doi.org/10.1056/NEJM200006223422507.
ACL reconstruction using a patella tendon 6. Drolet BC, Johnson KB.  Categorizing the world
autograft. of registries. J Biomed Inform. 2008;41:1009–20.
https://doi.org/10.1016/j.jbi.2008.01.009.
7. Engebretsen L, Forssblad M, Lind M.  Why regis-
tries analysing cruciate ligament surgery are impor-
tant. Br J Sports Med. 2015;49:636–8. https://doi.
org/10.1136/bjsports-2014-094484.
39.8 Useful/Inexpensive 8. Granan L-P, Bahr R, Steindal K, Furnes O, Engebretsen
Resources L.  Development of a national cruciate ligament sur-
gery registry: the Norwegian National Knee Ligament
Registry. Am J Sports Med. 2008;36:308–15. https://
1. National Institutes of Health: https://www.
doi.org/10.1177/0363546507308939.
nih.gov/health-information/nih-clinical- 9. Granan L-P, Forssblad M, Lind M, Engebretsen
research-trials-you/list-registries L.  The Scandinavian ACL registries 2004-2007:
2. Norwegian National Advisory Unit on
baseline epidemiology. Acta Orthop. 2009;80:563–7.
https://doi.org/10.3109/17453670903350107.
Arthroplasty and Hip Fractures: http://nrlweb. 10. Havelin LI, Engesaeter LB, Espehaug B, Furnes O,
ihelse.net/eng/ Lie SA, Vollset SE.  The Norwegian Arthroplasty
39 Registries 369

Register: 11 years and 73,000 arthroplasties. of revision with hamstring tendon grafts compared
Acta Orthop Scand. 2000;71:337–53. https://doi. with patellar tendon grafts after anterior cruciate
org/10.1080/000164700317393321. ­ligament reconstruction: a study of 12,643 patients
11. Havelin LI, Espehaug B, Vollset SE, Engesaeter
from the Norwegian Cruciate Ligament Registry,
LB. The effect of the type of cement on early ­revision 2004-2012. Am J Sports Med. 2014;42:285–91.
of Charnley total hip prostheses. A review of eight https://doi.org/10.1177/0363546513511419.
thousand five hundred and seventy-nine primary 24. Pocock SJ, Elbourne DR. Randomized trials or obser-
arthroplasties from the Norwegian Arthroplasty vational tribulations? N Engl J Med. 2000;342:1907–
Register. J Bone Joint Surg Am. 1995;77:1543–50. 9. https://doi.org/10.1056/NEJM200006223422511.
12. Havelin LI, Espehaug B, Vollset SE, Engesaeter
25.
Robertsson O.  Knee arthroplasty registers. J
LB.  Early aseptic loosening of uncemented femo- Bone Joint Surg Br. 2007;89:1–4. https://doi.
ral components in primary total hip replacement. A org/10.1302/0301-620X.89B1.18327.
review based on the Norwegian Arthroplasty Register. 26. Rolfson O, Bohm E, Franklin P, Lyman S, Denissen
J Bone Joint Surg Br. 1995;77:11–7. G, Dawson J, Dunn J, Eresian Chenok K, Dunbar
13.
Horan FT.  Judging the evidence. J Bone M, Overgaard S, Garellick G, Lübbeke A, Patient-
Joint Surg Br. 2005;87:1589–90. https://doi. Reported Outcome Measures Working Group of
org/10.1302/0301-620X.87B12.17247. the International Society of Arthroplasty Registries.
14. Irgens LM.  The origin of registry-based medical
Patient-reported outcome measures in arthroplasty
research and care. Acta Neurol Scand. 2012;126:4–6. registries: Report of the Patient-Reported Outcome
https://doi.org/10.1111/ane.12021. Measures Working Group of the International Society
15. Irgens LM, Bjerkedal T. Epidemiology of leprosy in of Arthroplasty Registries Part II. Recommendations
Norway: the history of The National Leprosy Registry for selection, administration, and analysis. Acta
of Norway from 1856 until today. Int J Epidemiol. Orthop. 2016;87:9–23. https://doi.org/10.1080/17453
1973;2:81–9. 674.2016.1181816.
16. Ivers N, Jamtvedt G, Flottorp S, Young JM, Odgaard- 27. Rolfson O, Eresian Chenok K, Bohm E, Lübbeke A,
Jensen J, French SD, O’Brien MA, Johansen M, Denissen G, Dunn J, Lyman S, Franklin P, Dunbar
Grimshaw J, Oxman AD. Audit and feedback: effects M, Overgaard S, Garellick G, Dawson J, The Patient-
on professional practice and healthcare outcomes. Reported Outcome Measures Working Group of the
In: The Cochrane Collaboration, editor. Cochrane International Society of Arthroplasty Registries.
Database of Systematic Reviews. Chichester: Wiley; Patient-reported outcome measures in arthroplasty
2012. registries: Report of the Patient-Reported Outcome
17. Knutson K, Lewold S, Robertsson O, Lidgren L. The Measures Working Group of the International Society
Swedish knee arthroplasty register. A nation-wide of Arthroplasty RegistriesPart I. Overview and ratio-
study of 30,003 knees 1976-1992. Acta Orthop Scand. nale for patient-reported outcome measures. Acta
1994;65:375–86. Orthop. 2016;87:3–8. https://doi.org/10.1080/174536
18. Lind M, Menhert F, Pedersen AB.  The first results 74.2016.1181815.
from the Danish ACL reconstruction registry: epi- 28. Roos EM, Roos HP, Lohmander LS, Ekdahl C,

demiologic and 2 year follow-up results from 5,818 Beynnon BD.  Knee Injury and Osteoarthritis
knee ligament reconstructions. Knee Surg Sports Outcome Score (KOOS)—development of a self-
Traumatol Arthrosc. 2009;17:117–24. https://doi. administered outcome measure. J Orthop Sports
org/10.1007/s00167-008-0654-3. Phys Ther. 1998;28:88–96. https://doi.org/10.2519/
19. Malchau H, Herberts P, Eisler T, Garellick G,
jospt.1998.28.2.88.
Söderman P. The Swedish total hip replacement regis- 29.
Røtterud JH, Sivertsen EA, Forssblad M,
ter. J Bone Joint Surg Am. 2002;84-A(Suppl 2):2–20. Engebretsen L, Årøen A. Effect on patient-reported
20. Maloney WJ. National Joint Replacement Registries: outcomes of debridement or microfracture of
has the time come? J Bone Joint Surg Am. concomitant full-thickness cartilage lesions in
2001;83-A:1582–5. anterior cruciate ligament-reconstructed knees:
21. Mygind-Klavsen B, Grønbech Nielsen T, Maagaard a nationwide cohort study from Norway and
N, Kraemer O, Hölmich P, Winge S, Lund B, Lind Sweden of 357 patients with 2-year follow-up.
M.  Danish Hip Arthroscopy Registry: an epidemio- Am J Sports Med. 2016;44:337–44. https://doi.
logic and perioperative description of the first 2000 org/10.1177/0363546515617468.
procedures. J Hip Preserv Surg. 2016;3:138–45. 30. Singh J, Politis A, Loucks L, Hedden DR, Bohm
https://doi.org/10.1093/jhps/hnw004. ER.  Trends in revision hip and knee arthroplasty
22. Naylor CD, Guyatt GH.  Users’ guides to the medi- observations after implementation of a regional joint
cal literature. X.  How to use an article reporting replacement registry. Can J Surg. 2016;59:304–10.
variations in the outcomes of health services. The https://doi.org/10.1503/cjs.002916.
Evidence-Based Medicine Working Group. JAMA. 31. de Steiger RN, Miller LN, Davidson DC, Ryan P,
1996;275:554–8. Graves SE.  Joint registry approach for identification
23. Persson A, Fjeldsgaard K, Gjertsen J-E, Kjellsen AB, of outlier prostheses. Acta Orthop. 2013;84:348–52.
Engebretsen L, Hole RM, Fevang JM. Increased risk https://doi.org/10.3109/17453674.2013.831320.
Part VIII
How to Perform an Economic Health Care
Study?
How to Perform an Economic
Healthcare Study
40
Jonathan Edgington, Xander Kerman, Lewis Shi,
and Jason L. Koh

40.1 Introduction ies. This chapter reviews the methods of perform-


ing economic healthcare studies and relates them
Research in orthopedic surgery focuses on find- to clinical cases in orthopedic surgery.
ing treatments and diagnostic tests that provide
the best outcomes for patients, but the concept of
what is best for the patient is constantly evolving. 40.1.1 What Is Healthcare
The focus has broadened from objective outcome Economics?
measures to include patients’ subjective appreci-
ation of those outcomes and is expanding to Economics is a social science that studies pro-
include the costs of the interventions as well. duction and distribution of goods and services
Rising healthcare costs are an impetus to study in the context of scarcity. Broadly, there are two
how healthcare dollars are spent with a goal of types of economics: positive economics and
delivering high-quality healthcare [3, 7], espe- normative economics [7]. Positive economics is
cially as rapid advances in orthopedic technology objective and primarily concerned with evaluat-
with hope of improved outcomes create questions ing the present (e.g., the United States spends
of whether advances provide improved value. For $8.2 billion on orthopedic surgery annually)
these reasons, it will become increasingly impor- [14], while normative economics is subjective,
tant for orthopedic surgeons to understand how to inherently prescriptive, and forward-looking
appropriately perform economic healthcare stud- (e.g., the United States should spend less than
$8.2 billion on orthopedic surgery annually).
This distinction is important as healthcare eco-
J. Edgington (*) · L. Shi nomic studies often focus on positive econom-
Department of Orthopaedic Surgery and ics in order to objectively study the relationships
Rehabilitation Medicine, University of Chicago between costs, outcomes, and distributions in
Medical Center, Chicago, IL, USA
e-mail: Jonathan.Edgington@uchospitals.edu; the delivery of healthcare. These subjects are
lshi@bsd.uchicago.edu important for assessing the efficiency, value,
X. Kerman and equity of medical care and can then be used
Pritzker School of Medicine, The University to make informed normative economic
of Chicago, Chicago, IL, USA conclusions.
e-mail: Alexander.Kerman@uchospitals.edu
J. L. Koh
NorthShore Orthopaedic Institute, NorthShore
University HealthSystem, Evanston, IL, USA

© ISAKOS 2019 373


V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_40
374 J. Edgington et al.

40.1.2 What Do Efficiency, Value,


and Equity Mean Fact Box 40.1: Healthcare Economic Terms
in Healthcare Economics? Defined

Technical efficiency: producing maximal output


Several core economic definitions are relevant to from the minimal quantity of inputs
healthcare economics. Technical efficiency Production efficiency: a situation when
means that no additional output can be achieved production goods cannot be increased without
decreasing production of another good
given a certain amount of inputs (i.e., no surgical
Allocative efficiency: the point at which
materials wasted). Production efficiency means marginal cost of production and additional good
that inputs have been optimized for a given level equals marginal benefit of the good
of production (i.e., operating rooms consistently Value: the perceived benefit of a good or service
at capacity but not over/under-booked). Allocative Equity: resource distribution that is fair among a
group
efficiency means that resources have been dis-
tributed to those who will benefit the most from
those resources, thus maximizing utility (i.e.,
patients who most need surgery receive surgery Economic studies in orthopedics should not
first). Generally, the term efficiency in healthcare limit themselves to simply identifying the
economics refers to the cost of care to achieve a cheapest intervention, but should rather focus
certain measure of quality. on understanding the costs of achieving
Value is a closely related but crucially differ- improved outcomes for patients, payors, and
ent concept from efficiency, because it measures society by combining clinical and economic
a stakeholder’s subjective appreciation of an out- data. With value data in hand, physicians can
come. This stakeholder is generally the patient, provide better care for their patients.
but value can also be considered from the per-
spectives of payors, providers, or society as a
whole. A common definition of value in health- 40.2 Research Question
care is Value = (Outcomes)/(Costs) [7, 9].
Equity concerns the allocative efficiency of As in any research project, the first step of an
distribution of finite resources among a group of economic orthopedic study is to identify a study
people. Equity may not line up with efficiency in question and develop a hypothesis. The problem
which case these conflicting concepts must be or question often arises in one of two ways: as
reconciled with the help of economic healthcare follow-up on previous published work or as orig-
studies [12]. These definitions are further sum- inal concepts following an observation. Before
marized in Fact Box 40.1. initiating a study, the concepts in the following
section should be explored to establish a clear
overview of the study plan.
40.1.3 What Are the Goals
of Healthcare Economic
Studies in Orthopedics? 40.2.1 What Is the Question?

Orthopedic care often focuses on increasing The research question is the foundation of all parts
quality of life or increasing patient mobility of the study, and it is worth putting additional time
through improving function of the musculoskel- and effort into crystallizing before beginning
etal system. These tasks have proved costly, work. The question may arise out of personal
with Medicare estimates of orthopedic expendi- interest, gaps in literature, or translation of topics
tures surpassing $8.2 billion annually [14]. from other fields. Often deep thought about the
These high costs provide an incentive for clini- topic leads to a plethora of further questions or
cians to prove the value of their services. ambiguity about the purpose of the study. When
40  How to Perform an Economic Healthcare Study 375

dealing with potentially ambiguous topics like should be a theory or prediction of what the out-
costs and benefits in economic healthcare studies, come of the study may show. The hypothesis
it is important to clearly define the ­question in should be established well before data is col-
order to guide both planning and execution. lected or analyzed, meaning researchers should
avoid collecting data and then retrospectively
developing the hypothesis. It should be obvious
40.2.2 What Is the State of Current how the hypothesis will be tested given the cir-
Knowledge on the Topic? cumstances of the study. The hypothesis should
not only be valuable if found to be true but also
Knowing what other researchers have learned on should be valuable if found to be false. With a
the topic is important in order to prevent wasted clear research question and hypothesis, one can
effort duplicating results and to further focus the proceed with appropriate selection of an eco-
research question. While mainstream medical nomic evaluation for a given study.
journals increasingly publish healthcare economic
studies, economic- and quality improvement-
focused publications often feature these types of
40.3 Economic Evaluation
projects as well. Finding literature that addresses
and Application to Research
the same question does not preclude the investiga-
Question
tor from performing a study but is useful for com-
parison and learning from previous shortcomings.
Once a research question and a hypothesis have
Review of current literature will focus the question
been identified, the next step is identifying the
and justify prioritizing this question.
appropriate economic analytic method (Table
40.1). As seen in Table 40.2, there are four com-
40.2.3 Why Is the Question mon analytic approaches in healthcare econom-
Important? ics, each with advantages and disadvantages:
cost-minimization (CMA), cost-effectiveness
Having a solid understanding of the current lit- (CEA), cost-utility (CUA), and cost-benefit
erature and knowledge helps define the deficien- (CBA) analyses. All four rely on accurately quan-
cies in evidence and establish the importance of tifying costs but vary in their considerations of
the question. This will add purpose and meaning outcomes. Deciding on which best suits the study
to the study. Answering the question why would it is based on the research question, hypothesis, and
be helpful to know the answer to this question? purpose of the study, as well as the data
establishes how the study will help future clini- available.
cians, researchers, and policymakers. Cost may be quantified in a variety of ways
depending on the perspective of interest.
Common perspectives include total costs to soci-
40.2.4 Hypothesis ety (societal costs), costs to patients (patient
costs), and costs to third-party payors (payor
With this information the clinician can form an costs). Direct costs may measure medical or non-
explicit and a concise hypothesis. The hypothesis medical costs associated with treatment. Other

Table 40.1 Potential More effective Less effective


results following
cost-effectiveness Higher Consider this option Do not choose this intervention
analysis Cost

Lower cost Choose this intervention Consider this option


376 J. Edgington et al.

Table 40.2  Multiple sources of possible financial and cost data with examples in parenthesis
Source of cost or financial data
Individual patient records
Practice records
Hospital records
Healthcare system records (Kaiser Permanente)
Healthcare consortium databases (e.g., the National Surgical Quality Improvement Project, Vizient)
State or regional databases (New York State, California)
National government databases (e.g., in the United States, the National Inpatient Survey)
Private insurance databases (PearlDiver)

Fact Box 40.2: Methods of Performing Economic Analysis

Outcomes Cost
Cost-minimization analysis (CMA) Assumed equivalent Dependent on
Cost-effectiveness analysis (CEA) Natural units (objective, quantity, OR quality) perspective of
Cost-utility analysis (CUA) Utility scale (subjective + objective, quantity, AND interest (patient,
quality) payor, societal)
Cost-benefit analysis (CBA) Monetary scale (subjective + objective, quantity,
AND quality)

intangible costs, such as opportunity costs, may ­ easurement is the monetary value that patients
m
be considered in the quantification depending on would be willing to pay to achieve different out-
what is pertinent to the research question [7]. For comes and costs can be seen in Fact Box 40.2.
example, when a patient elects to undergo sur-
gery, the time spent recovering cannot be spent
earning income, and therefore lost wage income 40.3.1 Cost-Minimization Analysis
is an opportunity cost associated with the
surgery. Cost-minimization analysis (CMA) is used when
Outcomes may be quantified in a variety of the evaluation targets a difference in cost of inter-
ways as well. Objective clinical outcomes can be ventions with the assumption that outcomes are
defined very broadly based on the research topic, equivalent. In order for this technique to be valid,
ranging from functional benefits, improved out- equivalent outcomes must be clearly demon-
come scores, to morbidity. Subjective clinical strated comprehensively. The advantage of a
outcomes can also be measured in a variety of cost-minimization analysis is simplicity. The dis-
ways using validated instruments to assess advantage of CMA is establishing complete
patients’ perspectives of their health (e.g., equivalency is often difficult or impossible. Even
patient-reported outcome tools). Objective and if equivalent outcomes were felt to be true, there
subjective outcomes can also be combined is always the possibility that equivalence may
together in a variety of ways. One commonly actually be a failure to detect a difference in two
used combined measure is the quality-adjusted interventions. If outcomes cannot be proved
life year (QALY), which provides a standardized equivalent, cost-minimization analysis is not an
score based on not only remaining quantity of life appropriate economic evaluation technique.
but also the quality, or health state, of those years. Some economists argue that CMA techniques are
A variety of instruments exist to broadly measure very seldom appropriate, with possible excep-
health state on a 0 to 1 scale with 0 being death tions in rare cases where randomized control tri-
and 1 being perfect health. Another form of com- als have already demonstrated equivalent
bined subjective and objective outcome outcomes [2]. CMA are most frequently relevant
40  How to Perform an Economic Healthcare Study 377

for policy or third-party payor decisions on index that attempts to quantify all health out-
resource allocations where concerns of specific comes associated with an intervention. Most
outcomes come secondary to cost considerations. often quality-adjusted life years (QALY) is the
If one assumed the outcomes of hyaluronic acid index of choice which attempts to objectify
injections and corticosteroid injections were length of life as well as quality of life related to
equally effective, one may estimate cost differ- the intervention. A variety of instruments exist to
ence with a CMA evaluation. calculate QALYs by broadly estimating health
quality states on a 0 to 1 scale using a multi-attri-
bute utility model (MAU, 0 = no utility of life/
40.3.2 Cost-Effectiveness Analysis death and 1  =  perfect health) and multiplying
this number by the life expectancy benefit of a
Cost-effectiveness analysis (CEA) is a method procedure: QALY  =  (MAU)  *  (LEB). Several
used to compare the costs and outcomes of one MAUs are commonly used including the
intervention versus the costs and outcomes of an EuroQoL-5 Dimensions, Health Utilities Index,
alternative. This type of evaluation provides a Short Form-6 Dimensions, Assessment of
value of additional units of cost per additional Quality of Life, 15 Dimensions, and Quality of
unit of benefit. The set of outcomes must be mea- Well-Being [13]. This type of analysis is highly
sured with the same units of effectiveness appropriate in orthopedics, as research in this
between both interventions. As seen in Table 40.3, field typically looks at increased quality of life
this leads to four possible outcomes. The advan- rather than simply increasing length of life. CUA
tage of CEA is the ability to directly compare using QALYs from the societal perspective is
costs and outcomes of two interventions. The dis- recommended by the Panel on Cost-Effectiveness
advantage of CEA is only one outcome may be in Health and Medicine convened by the US
measured even if the intervention may have mul- Public Health Service in 1996 and 2016 [15].
tiple relevant outcomes. For example, a total hip CUA has been effectively used to evaluate
arthroplasty may provide measurable outcomes QALY following total hip arthroplasty using
such as relief of pain, increased range of motion, QALYs [3–5].
and decreased narcotic usage, but only one can be
considered at a time in CEA. An excellent exam-
ple of this type of evaluation is demonstrated by 40.3.4 Cost-Benefit Analysis
Rajan et al. (2018) in their study on surgical fixa-
tion of distal radius fractures [10]. Because of Cost-benefit analysis (CBA) is an analysis that
this short coming, CEA does not fully provide an places a monetary value on both the cost of the
answer to the value question: Is the additional intervention and the outcomes of the interven-
collective benefit gained worth the extra cost? tion. This allows for an analysis that establishes a
net benefit of an intervention. This evaluation
attempts to comprehensively consider what
40.3.3 Cost-Utility Analysis patients value in their healthcare. The advantage
of this analysis is you have an objective net ben-
Cost-utility analysis (CUA) differs from CEA in efit which can answer the question is the addi-
that CUA outcomes measured using a health tional benefit gained worth the additional cost?
The disadvantage of this type of evaluation is
Table 40.3  An example of how comparator choice may assigning monetary values to clinical outcomes is
impact conclusions often difficult and sometimes controversial,
Cost Time Accuracy (%) requiring social science research expertise and
Osteotome and mallet $500 2 min 80 validation. An example of CBA in orthopedics
Hand saw $25 5 min 95 would be assigning dollar values to the net bene-
Hand rasp $25 4 h 99.9 fits gained following total hip arthroplasty, which
378 J. Edgington et al.

would allow a patient-centric conclusion about Table 40.4  Steps to performing an economic healthcare
the value of THA. study
Identify the relevant question
Understand current knowledge
40.4 Perspective Formulate hypothesis
Obtain relevant financial information
Obtain relevant outcome information
The perspective of the study is the lens through Perform appropriate statistical analysis
which the study is viewed and drives the assess- Report results
ments of cost and benefit. Relevant perspectives Discuss results with respect to question of interest
in healthcare economic studies include the
patients’, providers’, hospitals’, insurers’, or a in calculating costs depending on the methodol-
comprehensive societal perspective. The perspec- ogy used can be found in Palsis et al. (2018) [8].
tive can be thought of as the target audience for Several examples of cost or financial data are
any conclusions drawn from the study. It is listed in Table 40.4. In many cases, it can be dif-
important as the priorities of the different groups ficult or expensive to obtain cost or financial data.
likely differ significantly. Changing the perspec- For investigators, identifying an accurate
tive changes both the numerator and denominator source of information is critical. Typically, data
of the value equation because different elements that is derived more directly from clinical records
must be included depending on whose perspec- is more accurate than that derived from adminis-
tive is under consideration. For example, if an trative databases, which can be subject to errors
evaluation is executed from an insurer’s perspec- in coding. Nevertheless, valuable information
tive, the definition of cost would likely exclude can still be obtained from administrative
patients’ out-of-pocket expenses. In certain databases.
instances, it may be valuable to include multiple Additionally, data that is obtained exclusively
perspectives in each study, in which case results from one clinical site (e.g., a hospital) may not
of evaluations should be reported separately from accurately reflect the total costs of care, including
each perspective. items such as costs related to postoperative reha-
bilitation. As financial analysis of healthcare-
related costs becomes more sophisticated, there
40.5 Challenges is increasing ability to more fully understand
costs. In addition, allocation of costs can be done
Healthcare economic studies present several in multiple ways: they can be related to charges,
types of challenges not frequently encountered in known costs (such as for implants), or activity-
other projects. These challenges include diffi- based costing.
culty defining and acquiring data on costs and Finally, the economic data must be tied to
benefits, as well as the underlying issue of what clinical outcomes for patients. Assessment of
interventions are compared to one another. The clinical outcomes is a critically important area
conclusions reached in healthcare economic and is further discussed in other chapters.
studies also may be difficult to apply in practice Relating economic and outcome data should be
or be limited in their generalizability to other pro- performed with appropriate statistical rigor.
cedures or practice settings.

40.5.2 Defining Outcomes


40.5.1 Defining Costs
Compared to defining costs, outcomes may be
Financial or cost data can be obtained from mul- even more challenging to capture completely.
tiple sources, from the individual practice level to Outcome data is often gathered from clinical
national databases. An example of the variability research and prior literature. The most reliable
40  How to Perform an Economic Healthcare Study 379

benefit or outcome information would be from a knee replacement. Studies may have limited clin-
double-blind randomized control trial (RCT). ical applicability in cases where cost differences
Few orthopedic RCT comparing surgical out- are statistically different but not clinically signifi-
comes exist, and therefore much of the orthope- cant. For example, suppose a study showed cost
dic outcomes data is observational [1, 6, 11]. savings of $50 per patient for anterior versus pos-
Observational studies can be pooled or meta- terior hip arthroplasty. This may not be signifi-
analyses can be used; however these introduce cant to the patient or payors if total costs are in
bias or error. Outcome data should be as accurate the $10,000–$20,000 range.
and applicable as possible, and as above, meth-
ods of obtaining the data should be explained in
detail. 40.5.5 Generalizability

Many economic evaluations rely on multiple


40.5.3 Comparator Choice assumptions and estimations that may not be uni-
versally true. Accepting assumptions and estima-
Choosing the correct interventions to compare to tions allows the study to provide results and
one another presents further challenges in health- conclusions. However, this shortcoming means
care economic studies. Appropriate comparator many evaluations cannot be generalized to other
choice helps provide correct conclusions follow- settings or populations. The vast difference in
ing analysis. For example, consider tools for mak- cost for services throughout the healthcare field
ing cuts during a total knee arthroplasty as seen in makes generalization statements extremely
Table  40.3. The hand saw is more effective and difficult.
cheaper than an osteotome and mallet, so the lat-
termost can immediately be rejected. The deci- Take-Home Message
sion then is whether the additional 4.9% accuracy • Given the rising healthcare costs, the role of
is worth the additional 3  h and 55  min of time economic healthcare studies in orthopedics
spent. If one were to compare only an osteotome will certainly increase over the coming years.
and mallet combination to a hand rasp, the errone- Just as evidence-based medicine brought about
ous conclusion of the hand rasp as the best tool changes in clinical decision-making in years
will be made. Choosing appropriate comparators past, economic healthcare studies will likely
can be quite difficult in orthopedics and vastly present themselves as necessary knowledge.
impacts conclusions drawn following analysis. • We suggest that researchers follow a step-by-
step progression in performing an economic
healthcare study as seen in Table 40.4.
40.5.4 Clinical Applicability • This chapter has provided a brief overview for
the practicing orthopedic surgeon who may be
As many economic evaluations may take on the interested in performing such study or under-
broad perspectives of society or a hospital sys- standing results of these types of studies.
tem, clinical applicability of the conclusions of
these studies may be difficult. Translation of
“macro” results may be difficult into the perspec- References
tives of the provider or patient. Consider a study
executed from a societal perspective that shows 1. Brauer CA, Rosen AB, Olchanski NV, Neumann
PJ.  Cost-utility analyses in orthopaedic surgery. J
hyaluronic acid injections are not cost-effective Bone Joint Surg Am. 2005;87:1253–9. https://doi.
compared to cortisone injections for knee arthri- org/10.2106/JBJS.D.02152.
tis. Cost may seem irrelevant to a patient who is 2. Briggs AH, O’Brien BJ. The death of cost-minimiza-
considering trying another type of injection ver- tion analysis? Health Econ. 2001;10:179–84. https://
doi.org/10.1002/hec.584.
sus undergoing a major surgery such as a total
380 J. Edgington et al.

3. Chang RW, Pellisier JM, Hazen GB. A cost-effective- ation of three operative modalities. J Bone Joint
ness analysis of total hip arthroplasty for osteoarthritis Surg Am. 2018;100:e13. https://doi.org/10.2106/
of the hip. JAMA. 1996;275:858–65. JBJS.17.00181.
4. Daigle ME, Weinstein AM, Katz JN, Losina E.  The 11. Rajan PV, Qudsi RA, Wolf LL, Losina E.  Cost-

cost-effectiveness of total joint arthroplasty: a sys- effectiveness analyses in orthopaedic surgery: raising
tematic review of published literature. Best Pract the bar. J Bone Joint Surg Am. 2017;99:e71. https://
Res Clin Rheumatol. 2012;26:649–58. https://doi. doi.org/10.2106/JBJS.17.00509.
org/10.1016/j.berh.2012.07.013. 12.
Reidpath DD, Olafsdottir AE, Pokhrel S,
5. Givon U, Ginsberg GM, Horoszowski H, Shemer Allotey P.  The fallacy of the equity-efficiency
J.  Cost-utility analysis of total hip arthroplasties. trade off: rethinking the efficient health system.
Technology assessment of surgical procedures by BMC Public Health. 2012;12:S3. https://doi.
mailed questionnaires. Int J Technol Assess Health org/10.1186/1471-2458-12-S1-S3.
Care. 1998;14:735–42. 13. Wisløff T, Hagen G, Hamidi V, Movik E, Klemp M,
6. Maniadakis N, Gray A. Health economics and ortho- Olsen JA. Estimating QALY gains in applied studies:
paedics. J Bone Joint Surg Br. 2000;82:2–8. a review of cost-utility analyses published in 2010.
7. Napper M, Newland J. Health economics information Pharmacoeconomics. 2014;32:367–75. https://doi.
resources: a self-study course: U.S. National Library org/10.1007/s40273-014-0136-z.
of Medicine; 2016. 14.
Trends in orthopaedics: an analysis of medi-
8. Palsis JA, Brehmer TS, Pellegrini VD, Drew JM, Sachs care claims, 2000–2010. https://www.healio.
BL.  The cost of joint replacement: comparing two com/orthopaedics/journals/ortho/2013-3-36-
approaches to evaluating costs of total hip and knee 3/%7Bed4e1047-4f2f-46bc-9f01-65f36a2efd58%7D/
arthroplasty. J Bone Joint Surg Am. 2018;100:326– trends-in-orthopaedics-an-analysis-of-medicare-
33. https://doi.org/10.2106/JBJS.17.00161. claims-20002010. Accessed 13 Feb 2018.
9. Porter ME.  What is value in health care? N Engl J 15. Recommendations from the second panel on cost-
Med. 2010;363:2477–81. https://doi.org/10.1056/ effectiveness in health and medicine | Guidelines |
NEJMp1011024. JAMA | The JAMA Network. https://jamanetwork.
10. Rajan PV, Qudsi RA, Dyer GSM, Losina E.  The
com/journals/jama/article-abstract/2552214?redirect
cost-effectiveness of surgical fixation of distal =true. Accessed 15 Feb 2018.
radial fractures: a computer model-based evalu-
Part IX
Multi-Center Study: How to Pull It Off?
Conducting Multicenter Cohort
Studies: Lessons from MOON
41
José F. Vega and Kurt P. Spindler

41.1 Introduction 41.2 T


 he Early Years: Envisioning
MOON and Laying
The Multicenter Orthopaedic Outcomes Network the Foundation
(MOON) is the largest prospective longitudinal
anterior cruciate ligament (ACL) reconstruction The first MOON patient was enrolled in January
cohort with at least 80% follow-up in the United of 2002, but the story of MOON actually begins
States. MOON is a collaboration of 17 surgeons a decade earlier, in the early 1990s, with a much
from 7 different large academic medical centers smaller prospective cohort.
(Cleveland Clinic Foundation, Vanderbilt At the time (the late 1980s and early 1990s), it
Orthopaedic Institute, The Ohio State University, was clear that ACL repair was an ineffective
University of Iowa, Washington University in strategy for treating ACL rupture, as failure rates
St.  Louis, Hospital for Special Surgery, and were high and patients did poorly, especially
University of Colorado) that has collected >80% after the first and second postoperative years [8,
follow-up at 2, 6, and 10 years post-op of more 21]. It was also established that chronic ACL
than 4400 ACL reconstructions [5, 13, 19, 27]. To deficiency led to poor outcomes and the
date, the MOON cohort has produced over 40 ­early-­onset posttraumatic osteoarthritis [1, 17].
peer-reviewed publications; possibly more However, new and encouraging data suggested
importantly, it has created a template for con- that “augmented” ACL reconstruction (primary
ducting high-quality, prospective orthopedic repair with the assimilation of a bone-patellar
research that has been utilized by other large, tendon-bone [BTB] autograft into the repair) led
high-quality, prospective, multicenter studies [4, to improved knee function and outcomes [7].
18, 19, 34]. This chapter chronicles the planning, Thus, the senior author (KPS) assembled a
development, and execution of MOON. cohort of 54 patients that underwent acute (within
3 months of injury) ACL reconstruction. Patients
were enrolled by the senior author, who was
J. F. Vega completing a sports medicine fellowship at the
Cleveland Clinic Lerner College of Medicine, Cleveland Clinic at the time. This cohort was
Cleveland, OH, USA
assembled with the intention to follow the par-
e-mail: vegaj@ccf.org
ticipants for 10  years in order to better under-
K. P. Spindler (*)
stand the importance of bone bruising and
Cleveland Clinic Sports Health Center,
Garfield Heights, OH, USA articular cartilage lesions on ACL reconstruction
e-mail: spindlk@ccf.org, stojsab@ccf.org outcomes [11, 30].

© ISAKOS 2019 383


V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_41
384 J. F. Vega and K. P. Spindler

However, it did not take long to appreciate that the same time, the VSM-CCF ACL Reconstruction
the outcome of an ACL reconstruction is influ- Registry gained a new partner in the form of The
enced by a myriad of factors beyond bone bruis- Ohio State University, which was maintaining its
ing and articular cartilage lesions, and, as a result, own ACL reconstruction database. This three-­
54 patients would not be enough. This was also a institution cohort, containing 2286 ACL recon-
period of transition, as the use of primary ACL structions performed over a 10-year period,
repair was nearly abandoned, while the utiliza- yielded a total of eight publications on a wide
tion of ACL reconstruction was growing in popu- range of topics [2, 9, 10, 15, 22, 23, 29, 31].
larity. Thus, new questions were appearing at a What made this multicenter cohort so unique
rate much greater than we could answer. was not only its size (at the time, it was one of the
As the sun began to rise high in the midmorn- largest prospective ACL reconstruction cohorts
ing sky, the senior author and one of his Cleveland in the country) but also its use of patient-reported
Clinic colleagues (JTA) found themselves cycling outcome measures (PROMs) as primary end-
through the elevation changes of Sun Valley, points. The collection of preoperative and intra-
Idaho, mulling over potential ways to address the operative variables, along with the follow-up
seemingly endless list of questions surrounding PROMs, allowed for complex multivariable
ACL reconstruction. regression analysis to identify clinically relevant
It was then that they admitted, reluctantly, that predictors of outcomes at 5 years [31]. Although
the only feasible way to answer so many ques- difficult to imagine given the state of today’s lit-
tions would be to assemble another cohort, this erature, use of PROMs as a primary endpoint was
time with 10 or 20 times the number of patients tantamount to clinical research heresy in the early
they had enrolled in Cleveland. Naturally, the dis- and mid-2000s [12, 35]. Nonetheless, the devel-
cussion then turned to how they could possibly opers of this registry adopted two recently vali-
enroll and follow hundreds (and more likely dated PROMs to measure outcomes following
thousands) of patients undergoing a procedure ACL reconstruction—the International Knee
which, at the time, was being performed less than Documentation Committee Subjective Knee
90,000 times per year across the entire country Form (IKDC-SKF) and the Knee injury and
[3]. The answer, of course, was to develop a mul- Osteoarthritis Outcome Score (KOOS).
ticenter network.

Fact Box 41.1


41.2.1 The Vanderbilt Sports The use of patient-reported outcome
Medicine (VSM)-Cleveland measures has exploded over the last two
­
Clinic Foundation (CCF) ACL decades. In a retrospective review of 4 major
Reconstruction Registry orthopedic surgery journals, Siljander et al.
found that 94 publications used patient-
Having finished his fellowship at the Cleveland
reported outcome measures in 2004, com-
Clinic, the senior author accepted a position as an
pared to 228 in 2016 [26].
assistant professor at Vanderbilt University
Medical Center. This transition came with excit-
ing potential, as two sites could certainly enroll
more patients than either site alone, and, thus, the Aside from the hesitancy of the scientific
VSM-CCF ACL Reconstruction Registry was community to accept the use of PROMs as pri-
born. mary endpoints, the utilization of the IKDC and
Between the fall of 1991 and spring of 1998, the KOOS created an additional problem—miss-
the three surgeons (KPS, JTA, and RP) involved ing baseline data, as the KOOS was not devel-
in the VSM-CCF ACL Reconstruction Registry oped until 1998, and the IKDC debuted 3 years
enrolled a total of 1201 patients and followed later, in 2001 [14, 25]. As a result, this cohort was
them until the fifth postoperative year. Around unable to capture the true impact of ACL recon-
41  Conducting Multicenter Cohort Studies: Lessons from MOON 385

struction (or the prognostic significance of pre- 3. What is the natural progression of posttrau-
and intraoperative variables), as there was no matic osteoarthritis, and what variables mod-
baseline PROM data which would be needed in ify the trajectory?
order to observe change over time. Thus, once
again, there was a need for a new cohort—one
that would be larger, comprised of patients from 41.3.3 Sample Size
more centers, and from which more variables
could be collected. Due to the fact that the co-investigators wanted to
assess long-term outcomes, as well as the develop-
ment of osteoarthritis, this new cohort would need
41.3 Designing MOON to be followed for at least 10 years. Since they also
wanted to identify predictors of graft failure, this
41.3.1 Assembling the Clinical new cohort would need to be large. Considering
Research Team that failure was seen in roughly 10% of the three-
institution ACL reconstruction registry, and that
The MOON principal investigator (PI) and co-­ multivariable logistic regression was to be done to
investigators recognized that designing a pro- identify significant predictors of graft failure,
spective longitudinal cohort would require a large MOON would need to enroll at least 2250 partici-
team comprised of individuals with clinical pants (15 participants per variable included in the
research expertise beyond orthopedic surgeons. logistic regression, multiplied by 15 variables to
They were aided by one of orthopedic surgery’s be included, multiplied by 10 because failure
original clinician-scientists—Sandy Kirkley— would likely occur in only 10% of the cohort).
who had specialized training in clinical research To enroll such a large number of patients in a
long before it became commonplace (although reasonable period of time, the designers of
still rare today, a master’s in public health [MPH] MOON set out to assemble a team of surgeons
or MPH equivalent was virtually unheard of in and various high-volume ACL reconstruction
orthopedics in the 1990s). In addition, we had institutions that could enroll roughly 600 patients
several experts in prospective trial and cohort per year. Building off of the VSM-CCF ACL
design, an epidemiologist, and a PhD biostatisti- reconstruction registry with the help of personal
cian with specialized training in multivariable connections, the final tally of MOON surgeons
analysis. Furthermore, they hired a program and sites came to 17 surgeons at 7 institutions
manager and several research assistants. Without (Cleveland Clinic Foundation, Vanderbilt
this expertise, all outside of orthopedics, MOON Orthopaedic Institute, The Ohio State University,
would have certainly not been successful at both University of Iowa, Washington University in St.
achieving NIH funding and maintaining high Louis, Hospital for Special Surgery, and
rates of longitudinal follow-up. University of Colorado).

41.3.2 Aims 41.3.4 Outcomes

The first step in designing the MOON cohort was In the same fashion as the co-investigators’ three-­
to identify which questions were to be answered institutional ACL reconstruction registry, the
by MOON.  Among the many options, the co-­ decision was made to use PROMs as a primary
investigators settled on three primary aims: endpoint. Again, they opted to utilize the IKDC
and the KOOS to track PROMs from baseline
1. What preoperative and intraoperative vari-
until the completion of the study (tentatively pro-
ables predict both short- and long-term out- jected to be 10 years post-op).
comes following ACL reconstruction? The decision to use PROMs as a primary
2. What predicts graft failure? ­endpoint was twofold. First, PROMs had been
386 J. F. Vega and K. P. Spindler

demonstrated to be valid methods of measuring


outcomes over time and were directly relevant to
both clinicians and patients; second, the use of
PROMs would allow the investigators to follow a
large cohort without incurring insurmountable
expense or significant losses to follow-up, as par-
ticipants could complete PROMs from the com-
forts of their own homes.

41.3.5 Data Collection

The next challenge to overcome was to develop a Fig. 41.1  Compaq iPAQ, 2004
data collection system that would not be overly
burdensome to patients and/or surgeons. With
some guidance from their late friend (SK), the classified (e.g., meniscal tear location, depth,
designers compiled a series of paper question- type, etc.) and then treated (e.g., meniscectomy
naires that captured all of the pertinent variables vs. meniscal repair). The group performed a pair
and trialed the Compaq iPAQ (Fig.  41.1) as a of inter-rater agreement studies to demonstrate,
means for electronic data capture. This was no in a scientific way, that their methodology clas-
small feat, especially considering that the Apple sifying each particular variable (specifically,
iPhone would not debut for another 6 years. meniscus tears and articular cartilage lesions)
Luckily, electronic databases such as REDCap was reliable across the group [6, 20].
are now available and make excellent options for A similar challenge emerged with respect to
collecting large amounts of data rapidly across the method in which each surgeon’s technique for
multiple sites. performing a typical ACL reconstruction. Again,
the group conducted two studies in which they
demonstrated that the differences between the
Clinical Vignette way in which each surgeon performed his pre-
In the early stages of MOON, the typical ferred ACL reconstruction were minimal and
work flow involved completion of paper likely clinically irrelevant [32, 33]. In the first
PROMs which were then faxed to a central study, 12 surgeons each performed 6 ACL recon-
location for data collection. As mentioned structions on cadaveric knees using their tech-
above, the iPAQ was trialed briefly, but, nique of preference (e.g., two-incision,
because of difficulties with reliable syncing trans-tibial, or anteromedial). The 72 knees were
and data upload, was abandoned quickly. then imaged in order to identify any significant
differences in tunnel characteristics (which none
were present) [33]. Four surgeons then went on to
duplicate the same study design in MOON
41.3.6 Agreement Studies patients, complete with postoperative CT scans
to assess each tunnel. Again, no differences were
While having so many fellowship-trained sur- identified [32].
geons was undoubtedly an advantage, managing The last obstacle to overcome with regard to
a group as large and experienced as that was not agreement was which graft type to use. Rather
without its own challenges. The most obvious than attempt to convince all of the participating
hurdle facing the group was how to come to an surgeons to perform their ACL reconstructions
agreement on how certain variables would be with BTB or hamstring (which would have been
41  Conducting Multicenter Cohort Studies: Lessons from MOON 387

impossible), the group showed, through a roughly half was provided by a combination of
systematic review, that no clinically relevant
­ unrestricted gifts (Smith & Nephew [$450,000]
difference exists between BTB and hamstring
­ and Aircast [$200,000]) and a grant from the
autograft [28]. In hindsight, this was actually a National Football League (NFL) Charities
judicious approach, as the inclusion of both BTB ($125,000). The other half was covered by an
and hamstring ACL reconstructions only provided internal tax placed on one of MOON’s designers
further data to conclusively demonstrate that no (KPS) and two of his partners (ECM, JK) at
clinically relevant difference exists between the Vanderbilt University ($450,000) and the remain-
two approaches when done correctly. ing money from an OREF Prospective Clinical
Research Grant ($149,000).

41.3.7 Funding
Fact Box 41.2
As expected, enrolling and following such a The award rate (defined as the number of
large cohort would come with significant cost awards given in a fiscal year divided by the
(e.g., the estimated annual cost of operationaliz- total number of applications reviewed,
ing MOON at Vanderbilt University alone was including new applications and resubmis-
nearly $200,000). sions from the previous review period) for
MOON enrolled and followed patients for NIH grant proposals between 1990 and 2014
nearly 3 full years before submitting its first fluctuated between 15 and 30%, although it
application for a National Institutes of Health has not exceed 20% since 2005 [24].
(NIH) RO1 grant in February 2004 (which, by
the way, was not selected for funding). By the
end of 2005 (still before MOON was funded by
the NIH), MOON had enrolled 2340 patients. At 41.4 MOON Becomes a Reality
2 years post-op, 93% had phone follow-up, and
85% had returned completed PROMs. Of course, Finally, in January of 2002, MOON was ready to
achieving such high follow-up rates for a cohort enroll its first patient. But the challenges did not
as large as that came with a large price tag. stop there. It was only because of teamwork and
MOON had four full-time employees devoted to dedication that MOON was able to become a real-
seeking follow-up (one employee could follow ity. New and/or ongoing concerns were discussed
roughly 600 patients per year), and, still, MOON during monthly conference calls (a call that has
surgeons had to personally call 25% of their taken place on the second Monday of each month
patients in order to get them to follow-up. The for the last 17  years and continues to this day).
nested cohort within MOON (aimed at under- Each site continues to maintain an active institu-
standing the development of posttraumatic osteo- tional review board (IRB) application.
arthritis) also came with significant expense, Since it received initial funding in 2006,
which included training radiology personnel at MOON has successfully renewed its funding on
three separate sites, specialized bilateral weight-­ three separate occasions, allowing for 6- and
bearing radiographs, and blinded evaluation by 10-year patient follow-up. Currently, MOON has
surgeons and physical therapists. It was not until >4400 ACL reconstructions in its database and
2006, after three submissions and three rejec- has achieved >80% follow-up at 2, 6, and 10 years
tions, that MOON received NIH funding. post-op [5, 13, 27].
In total, MOON cost roughly $1.4 million to To date, MOON has produced more than 40
operate between 2001 (when the first patient was peer-reviewed publications and has laid the foun-
enrolled) and 2006 (when MOON was first dation for other multicenter orthopedic research
awarded grant funding). Of the $1.4 million, groups [4, 18, 19, 34].
388 J. F. Vega and K. P. Spindler

tears. J Shoulder Elbow Surg. 2016;25(8):1303–11.


Fact Box 41.3 https://doi.org/10.1016/j.jse.2016.04.030.
5. Dunn WR, Spindler KP, MOON Consortium.
Many of the clinical sites that participated Predictors of activity level 2 years after anterior cru-
in MOON went on to participate in other ciate ligament reconstruction (ACLR): a Multicenter
well-known studies including the Meniscus Orthopaedic Outcomes Network (MOON) ACLR
Tear in Osteoarthritis Research (MeTeOR) cohort study. Am J Sports Med. 2010;38(10):2040–
50. https://doi.org/10.1177/0363546510370280.
Trial, the Multicenter ACL Revision Study 6. Dunn WR, Wolf BR, Amendola A, et  al. Multirater
(MARS), and an upcoming randomized agreement of arthroscopic meniscal lesions. Am J
controlled trial. Sports Med. 2004;32(8):1937–40.
7. Engebretsen L, Benum P, Fasting O, Mølster A,
Strand T.  A prospective, randomized study of
three surgical techniques for treatment of acute
Take-Home Message ruptures of the anterior cruciate ligament. Am
• The story of MOON is one of vision, perse- J Sports Med. 1990;18(6):585–90. https://doi.
org/10.1177/036354659001800605.
verance, and teamwork. 8. Feagin JA, Curl WW.  Isolated tear of the ante-
• Conducting prospective multicenter orthope- rior cruciate ligament: 5-year follow-up study.
dic research is possible, but it requires signifi- Am J Sports Med. 1976;4(3):95–100. https://doi.
cant buy-in from all participants—from the PI org/10.1177/036354657600400301.
9. Fox JA, Nedeff DD, Bach BR Jr, Spindler KP. Anterior
to the research coordinators and even the cruciate ligament reconstruction with patellar auto-
study participants themselves. graft tendon. Clin Orthop. 2002;402:53–63.
• When multiple investigators are involved, 10. Graham SM, Parker RD.  Anterior cruciate ligament
inter-rater agreement studies are crucial to reconstruction using hamstring tendon grafts. Clin
Orthop. 2002;402:64–75.
demonstrate the validity of the data being 11. Hanypsiak BT, Spindler KP, Rothrock CR, et  al.

collected. Twelve-year follow-up on anterior cruciate liga-
• Arguably the greatest key to success is regular ment reconstruction: long-term outcomes of pro-
and open communication. spectively studied osseous and articular injuries.
Am J Sports Med. 2008;36(4):671–7. https://doi.
• Do not be afraid to dream big. In the words of org/10.1177/0363546508315468.
Norman Peale, “Shoot for the moon. 12. Heckman JDME-C.  Are validated questionnaires

• If you miss it, you will still land among the valid? [Letter]. J Bone. 2006;88(2):446.
stars” [16]. 13. Hettrich CM, Dunn WR, Reinke EK, MOON Group,
Spindler KP. The rate of subsequent surgery and pre-
dictors after anterior cruciate ligament reconstruction:
two- and 6-year follow-up results from a multicenter
cohort. Am J Sports Med. 2013;41(7):1534–40.
References https://doi.org/10.1177/0363546513490277.
14. Irrgang JJ, Anderson AF, Boland AL, et  al.

1. Andersson C, Odensten M, Good L, Gillquist Development and validation of the international knee
J. Surgical or non-surgical treatment of acute rupture documentation committee subjective knee form. Am J
of the anterior cruciate ligament. A randomized study Sports Med. 2001;29(5):600–13. https://doi.org/10.11
with long-term follow-up. J Bone Joint Surg Am. 77/03635465010290051301.
1989;71(7):965–74. 15. Kaeding CC, Pedroza AD, Parker RD, Spindler

2. Bowers AL, Spindler KP, McCarty EC, Arrigain KP, McCarty EC, Andrish JT.  Intra-articular find-
S.  Height, weight, and BMI predict intra-articular ings in the reconstructed multiligament-injured
injuries observed during ACL reconstruction: evalu- knee. Arthroscopy. 2005;21(4):424–30. https://doi.
ation of 456 cases from a prospective ACL database. org/10.1016/j.arthro.2004.12.012.
Clin J Sport Med. 2005;15(1):9–13. 16. Kandel B. Peale still positive; Words he lives by. USA
3. Buller LT, Best MJ, Baraga MG, Kaplan Today. 1988:2A.
LD.  Trends in anterior cruciate ligament recon- 17. Kannus P, Järvinen M.  Conservatively treated tears
struction in the United States. Orthop J Sports of the anterior cruciate ligament. Long-term results. J
Med. 2015;3(1):2325967114563664. https://doi. Bone Joint Surg Am. 1987;69(7):1007–12.
org/10.1177/2325967114563664. 18. Katz JN, Brophy RH, Chaisson CE, et al. Surgery ver-
4. Dunn WR, Kuhn JE, Sanders R, et  al. 2013 Neer sus physical therapy for a meniscal tear and osteoar-
Award: predictors of failure of nonoperative treatment thritis. N Engl J Med. 2013;368(18):1675–84. https://
of chronic, symptomatic, full-thickness rotator cuff doi.org/10.1056/NEJMoa1301408.
41  Conducting Multicenter Cohort Studies: Lessons from MOON 389

19. Lynch TS, Parker RD, Patel RM, et  al. The impact reconstruction: a multicenter cohort study. Orthop J
of the Multicenter Orthopaedic Outcomes Network Sports Med. 2017;5(7 Suppl 6):2325967117S00247.
(MOON) research on anterior cruciate ligament recon- https://doi.org/10.1177/2325967117S00247.
struction and orthopaedic practice. J Am Acad Orthop 28. Spindler KP, Kuhn JE, Freedman KB, Matthews CE,
Surg. 2015;23(3):154–63. https://doi.org/10.5435/ Dittus RS, Harrell FE.  Anterior cruciate ligament
JAAOS-D-14-00005. reconstruction autograft choice: bone-tendon-bone
20. Marx RG, Connor J, Lyman S, et al. Multirater agree- versus hamstring: does it really matter? A systematic
ment of arthroscopic grading of knee articular carti- review. Am J Sports Med. 2004;32(8):1986–95.
lage. Am J Sports Med. 2005;33(11):1654–7. https:// 29. Spindler KP, McCarty EC, Warren TA, Devin C,

doi.org/10.1177/0363546505275129. Connor JT.  Prospective comparison of arthroscopic
21. Odensten M, Lysholm J, Gillquist J.  Suture of
medial meniscal repair technique: inside-out suture
fresh ruptures of the anterior cruciate ligament. A versus entirely arthroscopic arrows. Am J Sports Med.
5-year follow-up. Acta Orthop Scand. 1984;55(3): 2003;31(6):929–34. https://doi.org/10.1177/0363546
270–2. 5030310063101.
22. Paul JJ, Spindler KP, Andrish JT, Parker RD, Secic 30. Spindler KP, Schils JP, Bergfeld JA, et al. Prospective
M, Bergfeld JA. Jumping versus nonjumping anterior study of osseous, articular, and meniscal lesions
cruciate ligament injuries: a comparison of pathology. in recent anterior cruciate ligament tears by mag-
Clin J Sport Med. 2003;13(1):1–5. netic resonance imaging and arthroscopy. Am
23. Piasecki DP, Spindler KP, Warren TA, Andrish JT, J Sports Med. 1993;21(4):551–7. https://doi.
Parker RD.  Intraarticular injuries associated with org/10.1177/036354659302100412.
anterior cruciate ligament tear: findings at liga- 31. Spindler KP, Warren TA, Callison JC, Secic M, Fleisch
ment reconstruction in high school and recreational SB, Wright RW.  Clinical outcome at a minimum of
­athletes. An analysis of sex-based differences. Am J five years after reconstruction of the anterior cruciate
Sports Med. 2003;31(4):601–5. https://doi.org/10.11 ligament. J Bone Joint Surg Am. 2005;87(8):1673–9.
77/03635465030310042101. https://doi.org/10.2106/JBJS.D.01842.
24. Rockey S.  What are the chances of getting funded? 32. Wolf BR, Ramme AJ, Britton CL, Amendola A,

NIH Extramur Nexus. June 2015. https://nexus. MOON Knee Group. Anterior cruciate ligament
od.nih.gov/all/2015/06/29/what-are-the-chances-of- tunnel placement. J Knee Surg. 2014;27(4):309–17.
getting-funded/. Accessed Feb 1 2018. https://doi.org/10.1055/s-0033-1364101.
25. Roos EM, Roos HP, Lohmander LS, Ekdahl C,
33. Wolf BR, Ramme AJ, Wright RW, et  al. Variability
Beynnon BD.  Knee Injury and Osteoarthritis in ACL tunnel placement: observational clini-
Outcome Score (KOOS)—development of a self-­ cal study of surgeon ACL tunnel variability. Am
administered outcome measure. J Orthop Sports Phys J Sports Med. 2013;41(6):1265–73. https://doi.
Ther. 1998;28(2):88–96. https://doi.org/10.2519/ org/10.1177/0363546513483271.
jospt.1998.28.2.88. 34. Wright RW, Huston LJ, Haas AK, et  al. Effect of
26. Siljander MP, McQuivey K, Fahs A, Galasso L,
graft choice on the outcome of revision anterior
Serdahely K, Karadsheh MS.  Current trends in cruciate ligament reconstruction in the Multicenter
patient reported outcome measures in total joint ACL Revision Study (MARS) Cohort. Am J
arthroplasty: a study of four major orthopedic jour- Sports Med. 2014;42(10):2301–10. https://doi.
nals. J Arthroplasty. 2018. https://doi.org/10.1016/j. org/10.1177/0363546514549005.
arth.2018.06.034. 35. Zarins B. Are validated questionnaires valid? J Bone
27. Spindler KP, Huston LJ.  O’Donoghue sports injury Joint Surg Am. 2005;87(8):1671–2. https://doi.
award 10 year outcomes and risk factors after ACL org/10.2106/JBJS.E.00554.
MARS: The Why and How of It
42
Rick W. Wright, Amanda K. Haas,
and Laura J. Huston

42.1 Why MARS Multicenter Orthopaedic Outcomes Network


(MOON), the graft failure rate in primary
Anterior cruciate ligament (ACL) reconstruction ­reconstructions was 3% in the ipsilateral knee
remains the treatment of choice for ACL-deficient and 3% in the contralateral knee at 2-year follow-
active individuals involved with sports or activi- up [11]. In a systematic review evaluating ham-
ties that involve quick start/stop, cutting, jump- string vs. patellar tendon autografts, Spindler
ing, and abrupt change of direction activities. et  al. reported a 3.7% overall failure rate (95%
Based on industry and implant evidence, there CI: 1.5–5.7%) [8]. At minimum 5-year follow-
are approximately 400–500,000 ACL reconstruc- up, Wright et al. found in a systematic review of
tions per year in the United States. Fortunately, ipsilateral vs. contralateral failure a rate of 5.8%
primary reconstructions typically do well, but in the ipsilateral and 11.8% in the contralateral
can fail at a low but significant rate [8, 11, 14]. knee [14].
While highly successful in the short term, there These are reasonably low levels of failure, but
can be problems with primary reconstructions the question remains what happens to those that
including loss of motion, extensor dysfunction unfortunately do fail. These patients may have
with certain grafts, arthritis, and graft failure. inappropriate expectations of their current and
This treatise will address graft failure and the future knee health. They may desire to return
multicenter, multi-surgeon group assembled to once again to the activities that resulted in ACL
study the issues surrounding revision ACL failure and not realize the potential for poor
reconstruction. results. In their mind it is like changing the oil or
A variety of studies have evaluated graft fail- tires and you’re good to go again. A better under-
ure in the primary ACL reconstruction setting standing of revision results would allow us to bet-
and have found the rates to range from ~1 to 8% ter counsel patients as to expected outcomes.
in the standard patient and graft setting. In the What outcomes can patients expect and what
outcomes truly matter? This became the impetus
for our current Multicenter ACL Revision Study
R. W. Wright (*) · A. K. Haas (MARS) Group. There was consensus among
Department of Orthopaedic Surgery, Washington
these surgeons that revision ACL reconstruction
University School of Medicine, St. Louis, MO, USA
e-mail: rwright@wustl.edu; ahaas22@wustl.edu typically resulted in worse outcomes compared
to primary reconstructions. Revision ACL recon-
L. J. Huston
Vanderbilt Orthopaedic Institute, Nashville, TN, USA struction was the strongest predictor for worse
e-mail: laura.huston@Vanderbilt.Edu KOOS scores in a mixed ACL reconstruction

© ISAKOS 2019 391


V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_42
392 R. W. Wright et al.

cohort early in the MOON experience. In a series score, it is assumed on a 16-point scale that 2
of ACL reconstructions reported at minimum points (representing more than a 10% change or
5-year follow-up, revision was the strongest pre- difference) would be clinically significant. The
dictor for worse outcome across the board, but in IKDC at 2 years was 85.6 for primary ACL recon-
the editing process the journal reviewers and edi- structions and 79.6 for revisions, which was sta-
tor requested that the revisions be removed from tistically different (p = 0.005), but not clinically
the study [9]. significant (MCID: 11.5 points). For the 5 KOOS
Unfortunately, little Level 1 or 2 evidence subscales, a difference of 8–10 points is clinically
existed to help us confirm this discrepancy in out- significant. In this study, the KOOS Knee-related
comes between primary and revision ACL- Quality of Life (KRQOL) subscale was lower in
reconstructed patients. In a mixed model revisions (75 vs. 62.5) at 2 years (p < 0.001). The
meta-analysis of 21 studies with minimum 2-year KOOS Sports and Recreation subscale also was
follow-up after revision reconstruction, these worse for revisions (85 vs. 75; p  =  0.004) at
worse results were demonstrated [13]. Of the 21 2  years. KOOS Pain was lower in revisions and
studies, however, only 4 were Level 1 or 2, while potentially clinically significant (91.7 vs. 83.3).
1 was Level 3, and 16 were Level 4 studies. KOOS Symptoms (85.7 vs. 78.6) and ADLs (98.7
Objective failure (defined as a re-revision, vs. 97.1) did not demonstrate a clinically signifi-
KT-1000 >5  mm, or a positive pivot shift) cant difference between primary and revision
occurred in 13.7 ± 2.7%—much higher than typi- reconstructions.
cal primary failure rates. Patient-reported out- Thus, this represented a prospectively col-
comes were worse than expected compared with lected cohort of primary and revision ACL recon-
primary ACL results and usually exceeded the structions evaluated identically by validated
known clinically important difference for these patient-reported outcome measures with worse
outcome scores. scores across the board for revisions, but with no
Given these findings, the Multicenter obvious factors contributing to these worse out-
Orthopaedic Outcomes Network (MOON) Group comes. Also, revisions represented only 10% of
reviewed their prospectively collected cohort the cohort. To perform multivariable analysis
which began as a mixed primary and revision requires ~10–15 subjects per variable assessed,
patient ACL reconstruction cohort [10, 12]. One and with the multiple factors (50–75 or more)
working hypothesis for the study was that revision potentially contributing to revision ACL recon-
ACL reconstruction results in worse outcome struction outcomes, it would require quick assim-
compared to primary reconstruction, as measured ilation of 750–1000 patients. It became apparent
by validated patient-based outcome measures the MOON Group, with less than 20 members,
including the Marx activity level, Knee injury and could not enroll an adequate number of patients
Osteoarthritis Outcome Score (KOOS), and quickly enough for this type of study. As we jok-
International Knee Documentation Committee ingly say, “a simple moon could not get it done,
Subjective form (IKDC). 487 ACL reconstruc- we needed a planet.” With this in mind, we set out
tions met the following inclusion/exclusion crite- to establish a larger group of interested sports
ria: (1) all meniscal/chondral treatment included medicine surgeons.
and (2) osteotomy, posterior cruciate ligament We felt the basic approach utilized by the
(PCL), or collateral ligament surgery excluded. MOON Group would be appropriate for evaluat-
408/487 (84%) were available at minimum 2-year ing the revision patient but realized there were dif-
follow-up with 39/47 (83%) revisions available at ferent factors involved in revision surgeries that
2-year follow-up. At 2 years, median Marx scores would need to be captured. A small group devel-
had dropped from 12 to 9 points in the primary oped a standard operating procedure (SOP) man-
reconstructions vs. 10 to 6 points in revisions ual that outlined rules of engagement for surgeons
(p = 0.009). While the minimally clinically impor- and patients. We very early on engaged the
tant difference (MCID) is not known for the Marx American Orthopaedic Society for Sports
42  MARS: The Why and How of It 393

Medicine (AOSSM), through the late Bart Mann, After obtaining informed consent, the patient
PhD, the research director. He, the AOSSM filled out a 13-page questionnaire that included
Research Committee, and the AOSSM society questions regarding demographics, sports partici-
were excited about the chance to offer the research pation, injury mechanism, comorbidities, and
opportunity to their members. Based on this we knee injury history. Within this questionnaire,
utilized their website and email to advertise to each participant also completed a series of vali-
members to participate. Once interested members dated general and knee-specific outcome instru-
were identified, we set up three meetings to edu- ments, including the Knee injury and
cate surgeons and describe how the study would Osteoarthritis Outcome Score (KOOS), the
proceed. We also engaged these attendees in International Knee Documentation Committee
designing forms and determining the variables we Subjective form (IKDC), and the Marx activity
would collect. Over 100 members originally rating scale. Contained within the KOOS was the
expressed interest, and we currently have 83 sur- Western Ontario and McMaster Universities
geons participating at 52 IRB-approved sites. The Osteoarthritis Index (WOMAC). This was filled
surgeons are a near 50/50 mix of academic and out preoperatively and at planned postoperative
private practice surgeons, adding to our generaliz- follow-up points of 2, 6, and 10  years. Patients
ability of results [5]. were paid $20 for filling out the questionnaire
In determining study design, we debated over each time. There were no other patient incentives
a prospective cohort design versus a randomized since the treatment was standard care for a revi-
trial. Ultimately, we believed we did not know a sion ACL reconstruction. Surgeons filled out a
single critical variable to randomize and felt a 42-page questionnaire at the time of the revision
cohort to determine predictors would best serve a surgery that included the impression of the etiol-
revision series with its rich number of potential ogy of the previous failure, physical exam find-
factors. As compared to primary reconstructions, ings, surgical technique utilized, the intra-articular
additional variables involved all of the previous findings, and surgical management of meniscal
reconstruction issues such as tunnel position, and chondral damage.
widening, graft choice, fixation, meniscus and Two-year patient follow-up was completed by
chondral procedures, etc. We thus chose a pro- mail with readministration of the same question-
spective longitudinal cohort similar to the naire as the one they completed at baseline.
Framingham study for cardiovascular disease Patients were also contacted by phone to deter-
many years ago [1]. mine whether any subsequent surgery had
Surgeon inclusion was based on AOSSM occurred to either knee since their initial revision
membership, attendance at an introduction meet- ACL reconstruction. If so, operative reports were
ing, and a willingness to follow the procedural obtained, whenever possible, in order to docu-
issues identified in the SOP. This included utiliz- ment pathology and treatment.
ing a Musculoskeletal Transplant Foundation Completed data forms were mailed from each
graft if allograft was chosen. In the introductory participating site to the data coordinating center.
meetings, we determined that radiographs would Data from both the patient and surgeon question-
be a critical data point and made a required and naires were scanned with Teleform™ software
recommended x-ray list. Required baseline radio- (Cardiff Software, Inc., Vista, CA) utilizing opti-
graphs included a bilateral standing AP view and cal character recognition, and the scanned data
a full extension lateral view. Additional recom- was verified and exported to a master database.
mended views, if they were taken as part of their Teleform paper forms were utilized so data
standard clinical care, included a bilateral bent was available in real time. If starting today we
knee weight-bearing view at 45° (Rosenberg), a would utilize electronic data capture, preferably
bilateral patellofemoral view (Sunrise or REDCap, in 2006, it was felt this was best prac-
Merchant), and a bilateral long leg standing x-ray tice. Data was housed at Vanderbilt similar to
for alignment. MOON. Our data capture was relatively amazing
394 R. W. Wright et al.

Table 42.1  Data completeness


Data completeness within a subset of variables (n = 900 cases)
Outcome variable Questionnaire source form # of variables # missing/observations (%) % complete
Marx activity level Patient 4 23/3600 (0.6%) 99.4
IKDC Patient 19 38/17,100 (0.2%) 99.8
KOOS Patient 42 259/37,800 (0.7%) 99.3
Current graft type Surgeon 1 1/900 (0.1%) 99.9
Surgical technique Surgeon 1 2/900 (0.2%) 99.8
Rehab factors Surgeon 6 35/5400 (0.7%) 99.3

with >99% of all patient and surgeon data points least annually and provides advice and over-
completed when assessed on our first 900 patients sight for the study. This has included what
(Table 42.1). We relied on MOON personnel research studies should be performed via an
originally, but quickly identified the need for a application form by members. This has been
national coordinator to assist Laura Huston, and critical for governance issues. The makeup is
Amanda Braun joined our team and was centered balanced by geography, gender, and practice
at the coordinating center at Washington type (Table 42.2).
University in St. Louis. Our coordinator was ini-
tially funded by personal research funds and
industry support that had been donated and 42.2 Findings of MARS
through the AOSSM. To truly run this study with
the personnel necessary depended upon a larger 1215 patients were enrolled in the study. Median
sustained grant process such as the NIH or age was 26 with a range from 12 to 63  years.
Department of Defense, which was the impetus 505 (42%) were female. For 87% it was their
to pursue NIH funding. Once 2-year follow-up first revision [5], while 13% were undergoing
began, we added a full-time follow-up coordina- their second or higher revision ACL reconstruc-
tor that contacts all patients to send out and obtain tion. Seventy-three percent were injured while
questionnaires and perform phone follow-up. playing a sport. The most common sports were
Each site manages their site with their personal soccer and basketball that involve both genders.
health care personnel or research personnel. For the first decade of the modern ACL recon-
Beyond initial enrollment there is little personnel struction with appropriate grafts and tunnel
involvement for the sites. Staff developed an location, the etiology of ACL graft failure was
electronic newsletter (Fig.  42.1) that arrived felt to be technical issues [6] (Tables 42.3 and
monthly to all surgeons and coordinators and 42.4). In this cohort traumatic failure was felt to
acted as an impetus to stimulate patient enroll- be more common, which may reflect two issues:
ment listing total and monthly enrollment figures (1) improved technical ability with improved
for surgeons. Everyone wanted to appear on the training and education and (2) surgeons self-
Top 10 list. reporting on their own failures in this cohort.
Keys that helped us early on were frequently 334 (28%) were the surgeons own failures.
based on our experience with MOON and Associated surgeries included high tibial oste-
included the use of conference calls and the abil- otomies (n  =  21), medial meniscus transplants
ity to communicate by email which was relatively (n  =  34), and lateral meniscus transplants
new at the time. The group’s experience with IRB (n = 10).
helped new sites get approval, and the knowledge The cause of technical failure was most com-
regarding grants helped immensely in obtaining monly felt to be due to the femoral tunnel
funding. (Table  42.5) with the tibial tunnel the second
We developed a Scientific Advisory Board most common cause. Seldom was fixation felt to
for advice. This eight-member Board meets at be an issue. Varus and valgus malalignment was
42  MARS: The Why and How of It 395

Fig. 42.1  Electronic Newsletter


396 R. W. Wright et al.

Table 42.2  Time from last ACL reconstruction Table 42.6  Previous approach
# % # %
<1 year 149 12 One incision 972 80
1–2 years 389 32 Two incisions 201 17
>2 years 654 55 Trad’l arthrotomy 12 1
Unknown 8 <1 Mini arthrotomy 10 <1
Total 1200 100 Blank 5 <1
Total 1200 100
18 previous double bundle
Table 42.3  Cause of failure
# %
Traumatic 671 56 Table 42.7  Previous graft source
Technical 615 51 Autograft Allograft
Biologic 325 27 BTB 485 (40%) 133 (11%)
Other 35 3 HS STG 285 (24%) 14 (1%)
Blank 2 <1 HS ST 27 (2%) 3 (<1%)
Note: 36% (427/1200) of these responses listed a combi- Quad 4 (<1%)
nation of these components as reason for failure ITB 3 (<1%)
The denominator is >100% due to the multiple choice Achilles 39 (3%)
option of this question (surgeons were instructed to Ant tib 61 (5%)
“check all that apply”) Post tib 13 (1%)
Other/unk 5 (<1%) 73 (6%)
Blank 1 (<1%)
Table 42.4  Venn diagram of cause of failure
Combined 6 (<1%) 8 (<1%)
TRAUMATIC 815 (68%) 345 (29%)

33% Table 42.8  Current graft choice


Autograft Allograft
15% 5% BTB 314 (26%) 285 (24%)
2% Quad 18 (2%) 3 (<1%)
22% 12% 8% HS ST 20 (2%) 17 (1%)
HS STG 219 (18%) 4 (<1%)
TECHNICAL BIOLOGICAL Achilles 83 (7%)
Ant tib 137 (11%)
Post tib 52 (4%)
Other/unk 1 (<1%) 1 (<1%)
Table 42.5  Cause of technical failure
Blank 1 (<1%)
Femoral tunnel 595 Combined 2 (<1%) 10 (<1%)
Tibial tunnel 228 Total 574 (48%) 593 (49%)
V/V malalignment 23 Note: 3% used both auto- and allograft
Femoral fixation 38
Tibial fixation 18
Autograft source 17
a­ utograft 68% of the time, allograft 29%, and a
Allograft source 66
combination (autograft + allograft) 2%. The most
Posteromedial laxity 16
Posterolateral laxity 4 common autograft was patellar tendon, utilized
40% of the time. Patellar tendon was the most
common allograft utilized (11%) (Table  42.7).
also seldom felt to be an issue. Prior approach Current revision graft utilized was 26% patellar
was most commonly a single incision (80%) tendon autograft, 24% patellar tendon allograft,
(Table 42.6). Two-incision or rear-entry approach 22% soft tissue autograft, and 25% soft tissue
was used in 17%. Prior graft utilized was allograft (Table 42.8).
42  MARS: The Why and How of It 397

42.2.1 Graft Choice p  =  0.045). KOOS Sports and Recreation sub-


scale significantly improved with autograft (OR
We analyzed graft choice as a predictor for out- 1.33; CI: 1.02–1.73; p = 0.037), as did the KOOS
come as one of the specific aims of our NIH- Quality of Life (OR 1.33; CI: 1.03–1.73;
funded grant [4]. The demographics of patients p = 0.031). The KOOS ADL and Symptoms sub-
that received autograft and allograft can be seen scales were not affected by graft choice.
in Table  42.9. Allografts were placed in older,
less active patients on average. The form of treat-
ment was known for each allograft and included MARS Findings
no radiation, light total body irradiation Autograft 2.78 × less likely to re-rupture
<1.8  mrad or rarely terminal radiation 91% of patients have meniscal or articu-
(Table  42.10). The overall re-rupture rate at lar cartilage damage
2 years was 37/1112 (3.3%), including 12 auto- Previous lateral meniscectomy and
graft, 24 allograft, and 1 combination graft. trochlear groove chondrosis strong predic-
Autograft use was 2.78 times less likely to re- tors of worse outcome
rupture (p = 0.047; 95% CI = 1.01, 7.69). No dif- Metal fixation predicted better outcome
ference in re-rupture rate was found in soft tissue Early weight bearing, ROM did not
vs. BTB in either autograft or allograft. IKDC impact outcome
score improved with autograft reconstruction
with an odds ratio of 1.33 (95% CI: 1.01–1.7;
Many surgeons believed that graft choice was
a predetermined fate in a revision setting and that
Table 42.9  Graft choice demographics
the surgeon truly had no choice in determining
Autograft Allograft Auto + allo
what graft would be used for the patient. To ana-
group group group
(n = 584) (n = 601) (n = 34) lyze this belief, we performed a propensity analy-
Males (%) 352 (60%) 337 (56%) 14 (41%) sis for graft choice (Fig.  42.2). Our analysis
Females (%) 232 (40%) 264 (44%) 20 (59%) demonstrated that the surgeon performing the
Median age 24 (19, 32) 28 (21, 36) 22 (19, 31) procedure was far and away the biggest factor on
(25%, 75%
what graft type was chosen for the revision recon-
quartile)
Median 12 (4, 16) 10 (3, 15) 11 (8, 16) struction, approximately five times more impact-
baseline Marx ful than the second most common predictor
activity level (which was prior graft). Thus, surgeons truly did
(25%, 75% have a choice in what graft they utilized.
quartile)
Median T2 8 (3, 12) 5 (1, 11) 10 (3, 15)
Marx activity
level (25%, 42.2.2 Meniscus and Articular
75% quartile) Cartilage

Compared to primary reconstructions, the revi-


Table 42.10  Allograft treatment sion patient has a much higher chance of having
MARS meniscus or articular cartilage damage at the
Sterilization allograft MARS allograft time of revision reconstruction. In our cohort,
method cohort failure cohort
only 9% of the patients did not have a meniscus
Aseptic 247 (42%) 13 (52%)
Whole body 313 (53%) 11 (44%) tear or grade 2 or worse articular cartilage dam-
1.2–1.8 mrad age. 91% had at least meniscus or articular carti-
Terminal 0.7–1.0 31 (5%) 1 (4%) lage damage, and 60% had both (Table  42.11).
mrad We analyzed the impact of these meniscus tears
398 R. W. Wright et al.

DOCTOR.IMP
ACLREV.PRIOR.ACLGRAFT.IMP
ACLREV.NO.IMP
AGE
ACLREV.SURGEON.FAIL
ACLREV.PRIOR.SURG.TECH.IMP
WORK.STATUS.IMP
ACLREV.SURG.OPIN.FAIL.IMP
SEX
LEVELH.IK.IMP
ACLREV.PRIOR.TIB.FIX.IMP
MCLPM.REP
ACLREV.PRIOR.FEM.FIX.IMP
PCL.RECON
ACLREV.PRIOR.FEM.TUNNEL.POS.IMP
LEVEL1.IK.IMP
BMI
ACLREV.CAUSE.OF.FAIL.IMP
ACLREV.PRIOR.ACLGRAFT.SOURCE.IMP
LCLPL.REP
SMOKE
PreviousSurgeryContralateral
SCHOOL.YRS.IMP
ACLREV.PRIOR.FEM.TUNNEL.TECH
MARITAL.IMP
ACLREV.PRIOR.TIB.TUNNEL.POS.IMP
ETHNICITY.IMP
MARX.t0
SPORT1.IK

0 20 40 60 80

χ2 – df

Fig. 42.2  Propensity values

Table 42.11  Meniscal and chondral damage was the impact of trochlear groove chondrosis
Articular cartilage (Table 42.13).
pathology
Normal Abnormal
Meniscal Normal 109 (9%) 146 (12%) 42.2.3 Surgical Factors
pathology Abnormal 226 (19%) 719 (60%)
Surgical factors were analyzed to determine
and articular cartilage damage on patient out- impact on patient-reported outcome measures at
comes at 2 years [2]. Previous lateral meniscec- 2  years [7]. A variety of factors were analyzed,
tomy prior to the time of revision significantly and in many cases, it is difficult to determine
resulted in worse patient-reported outcomes intuitively why certain factors impacted outcome.
(Table 42.12) and previous medial meniscectomy With regard to surgical approach, having under-
less so. Grade 2 or worse articular cartilage dam- gone a prior arthrotomy decreased IKDC scores
age grade also impacted patient-reported out- (p = 0.037, OR = 2.43) and decreased all KOOS
comes at 2  years. The most significant finding subscales (p  <  0.05, OR range  =  2.38–4.35).
42  MARS: The Why and How of It 399

Table 42.12  Meniscus impact on PROs


KOOS WOMAC
Sports-
Structure Marx Symptoms Pain ADL rec QoL IKDC Stiffness Pain ADL
Meniscus (previous
pathology)
 Medial 0.002 0.035
 Lateral 0.008 0.042 <0.001 0.038 0.03 0.032
Meniscus (current pathology)
 Medial
 Lateral

Table 42.13  Articular cartilage impact on PROs


KOOS WOMAC
Sports-
Structure Marx Symptoms Pain ADL Rec QoL IKDC Stiffness Pain ADL
Articular cartilage
(previous)
 Yes/no
Articular cartilage (current)
 Medial femoral condyle 0.018 0.012
 Lateral femoral condyle 0.048 0.048
 Medial tibial plateau 0.004
 Lateral tibial plateau
 Patella
 Trochlea 0.034 <0.001 0.011 0.01 <0.001

A double femoral tunnel resulted in worse KOOS Use of biologics and bone grafting was ana-
QOL scores (p  =  0.027, OR  =  3.13). An ideal lyzed. Use of biologic enhancement in the revision
tibial position that was not enlarged resulted in setting resulted in lower 2-year MARX activity
worse KOOS, WOMAC, and IKDC scores level scores (p = 0.025, OR 1.79). Utilizing femo-
(p = 0.001–0.03, OR 1.19–2.68). Using a femoral ral bone grafts resulted in lower MARX activity
tunnel declared “optimum” vs. drilling an entirely scores at 2  years (p  =  0.048, OR  =  2.04).
new femoral tunnel resulted in worse KOOS Conversely, not bone grafting the tibia resulted in
QOL scores (p = 0.025, OR 1.79). Undergoing a worse KOOS Pain scores (p = 0.046, OR 1.95) and
notchplasty decreased KOOS, IKDC, and WOMAC Pain scores (p = 0.004, OR 3.31).
WOMAC scores (p  =  0.013–0.034, OR 1.40–
1.49). Factors that did not impact outcome
included blended tunnels and knee position at 42.2.4 Rehabilitation Factors
time of graft fixation.
Graft fixation as a surgical factor impacting Rehabilitation factors were analyzed regarding
outcome was also analyzed. Femoral fixation their potential impact on revision ACL recon-
with a metal screw had better 2-year outcomes in struction outcomes [3]. Two rehabilitation factors
KOOS and WOMAC scores (p = 0.01–0.05, OR predicted outcome: (1) use of an ACL derotation
1.41–1.96) when compared to using a bioabsorb- brace for return-to-sport had better KOOS sports/
able screw, cross-pin, or combination. Tibial fixa- rec scores at 2  years (odds ratio  =  1.50; 95%
tion other than metal screw resulted in worse CI = 1.07–2.11; p = 0.019) and (2) use of an ACL
IKDC (p = 0.017, OR 1.67) and WOMAC stiff- derotation brace for postoperative rehabilitation
ness (p = 0.013, OR 1.72) scores. period were 2.3 times more likely to have a
400 R. W. Wright et al.

s­ ubsequent surgery by 2 years (OR = 2.26; 95% 42.3.2 Funding


CI = 1.11–4.60; p = 0.024). Use of an ACL dero-
tation brace at the time of return-to-sport could Funding a study of this magnitude is probably
not be determined to improve or decrease the our biggest challenge. Initially we relied upon
graft re-rupture rate. Restricting or allowing of start-up funds from industry which were fun-
all other factors did not predict outcome includ- neled through AOSSM to support our coordina-
ing active range of motion, passive range of tors, but this was stopped when one of our
motion, immediate weight-bearing, and the use partners believed the study might cast their prod-
of rehabilitative postoperative bracing. ucts in a poor light. An additional industry part-
ner provided unrestricted funds that also
supported our coordinator efforts. Ultimately, we
42.3 Challenges knew we would need a large grant to support
these efforts. Fortunately, we were able to obtain
Obviously, a study of this magnitude involving NIH funding in 2011 and have successfully
this many centers and surgeons faces multiple extended that funding with a competitive renewal
significant challenges. grant thru the middle of 2022. Department of
Defense grants would have been another option
or larger sustainable industry or society grants.
Challenges
Realistically, it takes those types of substantial
Maintaining IRB
grants to run these studies, but the findings can be
Funding
more impactful than virtually any other type of
Authorship
efforts for the societies, and an investment in this
Patient follow-up
type of study at start-up can pay off for the soci-
ety with notoriety and ultimately a nationally
funded study such as ours.
42.3.1 Institutional Review
Boards (IRB)
42.3.3 Variation of Surgeon
Obtaining and maintaining IRB approval at 52 Involvement
sites take significant effort from the coordinators
and sites. Each site required individual submis- In hindsight one area that we would do differ-
sions, with their own forms and institutional ently with a future study would be to instill more
requirements. No two forms or submission mate- rigorous rules from the start. I expected everyone
rials were identical. As such, this was quite time to cooperate and work well together if they
intensive in the beginning and continues to be a wanted to remain in the study and unfortunately
burden in ensuring that all sites maintain their did not establish enough ground rules for main-
active IRB status. A study of this magnitude ini- taining membership in MARS. It became appar-
tially needed at least one FTE to keep up with all ent that there were surgeons that were not
52 sites’ submissions (and any subsequent modi- enrolling their patients after they received IRB
fications that were requested by each IRB) and approval from their institution. For example, they
was vastly underestimated by us. Recently the stated that they performed 25 revisions per year
NIH has adopted that a single IRB be coordinated during the time of the surgeon training meetings
from a single site, and it is hopeful that this will but then had only enrolled 1 patient during their
decrease the IRB burden for large multicenter first 3  months after IRB approval. Because we
studies of this type. Some sites require a payment had not established inclusion rules, it became dif-
to obtain and maintain their IRB for this study, ficult to police this type of activity or to prove
and this was a challenge from a funding stand- that from an objective standpoint. In the future I
point initially. would require all surgeon members to enroll at
42  MARS: The Why and How of It 401

least one patient and to require surgeon enroll- to deal with multiple recommendations for
ment logs be submitted in order to verify that improving the manuscript, but the final product is
they were enrolling 80–90% of the patients in the always improved.
study that were eligible. Additionally, I would
ask and expect that they would assist in contact-
ing and helping follow their patients. 42.3.5 Peer-Reviewed Journal
Submissions

42.3.4 Authorship The reviews from orthopedic and sports medicine


journals have been challenging. These are com-
Authorship can be a cantankerous topic for mem- plex studies, and we have taken more than 80
bers. Multiple issues exist for this topic. One potential reviewers out of commission with their
issue is the assumption that when a few members MARS membership. It appears the concept of
take data that has been analyzed and by writing a multivariable analysis is confusing. We try to
manuscript reporting the findings that they should explain that our variables are controlled in a way
be the only recognized authors. This discounts that they are independent predictors of outcome
the national coordinators and grant writers and and not associated with other variables analyzed.
Scientific Advisory Board members that have For instance, if notchplasty is a predictor for
worked diligently behind the scenes to keep the worse outcome, then reviewers will comment it
study funded and up and running. Thus, a system must be a surrogate for osteoarthritis or chondro-
where there would be four or five named authors sis in the rest of the knee. We explain that chon-
and the MARS Group listed on the masthead drosis was a variable and was controlled, but it
became a problem. Thus, I adopted corporate comes up with every submission. We will con-
authorship with everyone listed as an author that tinue to try to educate in our manuscripts.
met author criteria, and I was able to convince the
journals we typically utilize to make all acknowl-
edged authors PubMed searchable. The order of 42.3.6 Patient Follow-Up
the acknowledged authors is based on scientific
contribution with first, second, and last making Follow-up obviously is critical and the key to
the most significant contribution in planning, success of the study. It has been challenging to
writing, data analysis, etc. This has been effective maintain this with each passing year. We cur-
in decreasing the concerns about why some were rently have a full-time research assistant whose
named on the masthead and others were not. It sole responsibility is reaching out to patients to
has to be acknowledged that this type of research keep up with their contact information and cajole
and handling of authorship be recognized by them into filling out their follow-up question-
department chairmen and promotion committees naires. We have found a simple mailing will
for academic members who need publications for achieve 40–50% return. Reaching out by the staff
academic progress. Chairmen need to realize this will get us to approximately 70% follow-up. To
is rigorous Level 1 or 2 research and membership get into the 80% follow-up percentiles requires
requires effort and participating in these studies individual surgeons reaching out to patients,
is clinically meaningful and practice changing. which is obviously time and work intensive, but
We have maintained strict standards on author- has allowed us to stay above the 80th percentile
ship. When a manuscript is complete, it is sent to for our follow-up. A challenge has been the sites
all members with a deadline for returning the that won’t allow us to contact their patients for
edited or approved manuscript. If the deadline is follow-up (due to the individual institution’s IRB
missed, then you are not listed on the final regulations or any non-US-based site), and we
acknowledged author list for publication. This must rely on the sites to do the work. Their per-
can be challenging for the authors who then need sonnel are typically involved in several studies,
402 R. W. Wright et al.

and the MARS patients may not be their top pri- Multicenter ACL Revision Study (MARS) cohort. Am
J Sports Med. 2010;38:1979–86.
ority. Personally, it has been difficult enough that 6. Johnson DL, Swenson TM, Irrgang JJ, Fu FH, Harner
I would never use sites again in a multicenter CD.  Revision anterior cruciate ligament surgery:
study that required site vs. central follow-up. experience from Pittsburgh. Clin Orthop Relat Res.
1996;100–9.
7. MARS Group, Allen CR, Anderson AF, Cooper DE,
DeBerardino TM, Dunn WR, et  al. Surgical predic-
42.4 Conclusions tors of clinical outcomes after revision anterior cru-
ciate ligament reconstruction. Am J Sports Med.
While there have been multiple challenges 2017;45:2586–94.
8. Spindler KP, Kuhn JE, Freedman KB, Matthews CE,
encountered in the MARS study, they have for Dittus RS, Harrell FE Jr. Anterior cruciate ligament
the most part been surmountable. The level of reconstruction autograft choice: bone-tendon-bone
research and the questions that can be asked and versus hamstring: does it really matter? A systematic
answered in a cohort of this size and type is review. Am J Sports Med. 2004;32:1986–95.
9. Spindler KP, Warren TA, Callison JC Jr, Secic M,
unmatched by any other approach. We believe the Fleisch SB, Wright RW.  Clinical outcome at a
study design and scaffolding we have developed minimum of five years after reconstruction of the
for this type of truly multi-surgeon multicenter anterior cruciate ligament. J Bone Joint Surg Am.
research can be a model for future groups. 2005;87:1673–9.
10. Wright R, Spindler K, Huston L, Amendola A, Andrish
J, Brophy R, et al. Revision ACL reconstruction out-
comes: MOON cohort. J Knee Surg. 2011;24:289–94.
11. Wright RW, Dunn WR, Amendola A, Andrish JT,
References Bergfeld J, Kaeding CC, et  al. Risk of tearing the
intact anterior cruciate ligament in the contralateral
1. Cook NR, Moons KG, Harrell FE Jr. Assessing pre- knee and rupturing the anterior cruciate ligament graft
dictive performance beyond the Framingham risk during the first 2 years after anterior cruciate ligament
score. JAMA. 2010;303:1368–9; author reply 1369. reconstruction: a prospective MOON cohort study.
2. Group M.  Meniscal and articular cartilage predic- Am J Sports Med. 2007;35:1131–4.
tors of clinical outcome following revision anterior 12. Wright RW, Dunn WR, Amendola A, Andrish JT,
cruciate ligament reconstruction. Am J Sports Med. Flanigan DC, Jones M, et  al. Anterior cruciate liga-
2016;44:1671–9. ment revision reconstruction: two-year results from
3. MARS Group (Wright RW corresponding author). the MOON cohort. J Knee Surg. 2007;20:308–11.
Surgical predictors of clinical outcome after revision 13. Wright RW, Gill CS, Chen L, Brophy RH, Matava
anterior cruciate ligament reconstruction. Am J Sports MJ, Smith MV, et  al. Outcome of revision anterior
Med. 2017;45(11):2586–94. cruciate ligament reconstruction: a systematic review.
4. Group M, Group M.  Effect of graft choice on the J Bone Joint Surg Am. 2012;94:531–6.
outcome of revision anterior cruciate ligament 14. Wright RW, Magnussen RA, Dunn WR, Spindler

reconstruction in the Multicenter ACL Revision KP.  Ipsilateral graft and contralateral ACL rupture
Study (MARS) Cohort. Am J Sports Med. 2014;42: at five years or more following ACL reconstruc-
2301–10. tion: a systematic review. J Bone Joint Surg Am.
5. Group M, Wright RW, Huston LJ, Spindler KP, Dunn 2011;93:1159–65.
WR, Haas AK, et al. Descriptive epidemiology of the
Multicenter Study: How to Pull It
Off? The PIVOT Trial
43
Eleonor Svantesson, Eric Hamrin Senorski,
Alicia Oostdyk, Yuichi Hoshino,
Kristian Samuelsson, and Volker Musahl

43.1 Introduction Randomized controlled trials (RCTs) have for


decades been considered as providing the highest
One of the most important characteristics for a level of evidence of all study designs. However,
researcher is to question current practice and for- RCTs have also been criticized for not reflecting
mulate hypotheses for further research contribut- reality since they are often conducted only in
ing to evidence-based medicine. However, even highly specialized centers [17] and on a very
the most interesting research question may be of selected study population after applying strict
no value to evidence-based medicine if inadequate exclusion criteria. A multicenter design might
methodology is chosen to study the questions pro- increase the external validity of not only RCTs but
posed. The higher quality the methods used in the also prospective and retrospective cohort trials. In
study, the more trustworthy will the results be. fact, multicenter cohort trials are valuable comple-
ments to single-center RCTs, and the performance
of such trials should be encouraged.
E. Svantesson (*) Multicenter trials offer many advantages. They
Department of Orthopaedics, Institute of Clinical enable investigation of large populations with dif-
Sciences, The Sahlgrenska Academy, University of
ferent ethnicities and demographic characteristics
Gothenburg, Gothenburg, Sweden
and offer a possibility to compare results among
E. H. Senorski
participating centers, all important factors for
Department of Health and Rehabilitation, Institute of
Neuroscience and Physiology, The Sahlgrenska increasing the generalizability of the study.
Academy, University of Gothenburg, However, to conduct a multicenter trial is chal-
Gothenburg, Sweden lenging. The strengths of the study design are at
A. Oostdyk · V. Musahl risk of instead being limitations if the cooperation
Department of Orthopaedic Surgery, University of between the centers is not ensured and if the study
Pittsburgh, Pittsburgh, PA, USA
protocol is not carefully prepared before study
Y. Hoshino start, with minimal room for discrepancies of
Department of Orthopaedic Surgery, Kobe University,
study performance across centers. This chapter
Kobe, Japan
focuses on sharing the experiences of how the
K. Samuelsson
Prospective International Validation of Outcome
Department of Orthopaedics, Institute of Clinical
Sciences, The Sahlgrenska Academy, University of Technology (PIVOT) trial, a prospective multi-
Gothenburg, Gothenburg, Sweden center trial, was planned and performed across
Department of Orthopaedics, Sahlgrenska University four international centers and on presenting some
Hospital, Mölndal, Sweden of the outcomes and learning points of the trial.

© ISAKOS 2019 403


V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_43
404 E. Svantesson et al.

before surgery and optimize ACL reconstruction


Fact Box 43.1 for restoring dynamic knee function and ulti-
Multicenter trials offer many advantages mately improve patient outcome.
and have high external validity. The Two separate entities of the pivot shift test
Prospective International Validation of were quantified—tibial acceleration and lateral
Outcome Technology (PIVOT) trial was a compartment tibial translation during the reduc-
prospective multicenter trial, planned and tion phase of the pivot shift. For this purpose, two
performed across four international centers. noninvasive technological devices were used.
The tibial acceleration was quantified by an iner-
tial sensor system (KiRA, Orthokey LLC, Lewes,
43.2 The PIVOT Trial DE, USA) consisting of a triaxial accelerometer
that is held in place over Gerdy’s tubercle, on the
The PIVOT trial was a multicenter study per- lateral aspect of tibia, by hypoallergenic skin
formed between four academic centers: the straps (Fig.  43.1) [20, 21]. The sensor system
University of Pittsburgh (Pittsburgh, PA, USA), communicates with a tablet PC via Bluetooth.
Istituto Ortopedico Rizzoli (Bologna, Italy), The tibial translation was quantified via an image
Sahlgrenska University Hospital (Gothenburg, analysis system [7]. Three markers are placed on
Sweden), and Kobe University (Kobe, Japan). three specific landmarks at the lateral aspect of
The purpose of the study was to provide novel the knee, which are tracked via video recording
in vivo information about the validity of quantita- using a commercial tablet (iPad, Apple Inc.,
tive measurements of the pivot shift test to aid in Cupertino, CA) during the execution of the pivot
the assessment of anterior cruciate ligament shift test (Fig. 43.2). The specially designed soft-
(ACL) deficiency and the outcome after ACL ware system subsequently analyzes the relative
reconstruction. The goal of the trial was to pro- movement of the three markers and calculates the
vide a foundation for utilizing quantitative analy- tibial translation relative to the femur and also
sis of the pivot shift for ACL-deficient knees displays this graphically [11].

Fig. 43.1  The KiRA inertial sensor system for quantifying lateral tibial acceleration during the pivot shift test
43  Multicenter Study: How to Pull It Off? The PIVOT Trial 405

Fig. 43.2  Image analysis system on iPad for quantifying lateral tibial translation during the pivot shift test

aspect of the future trial. In Bologna, the execu-


Fact Box 43.2 tion of the pivot shift was practiced, the two
The purpose of the study was to provide devices for quantification of the pivot shift test
novel in vivo information about the validity tested and ACL reconstruction was performed
of quantitative measurements of the pivot together to assimilate the procedure (Fig. 43.3).
shift test to aid in the assessment of anterior The upcoming sections highlight specific parts
cruciate ligament (ACL) deficiency and the of study preparation and are important sections
outcome after ACL reconstruction. Two to consider when developing a manual of opera-
noninvasive technological devices were tions and procedures.
used, quantifying tibial acceleration and
lateral compartment tibial translation dur-
ing the pivot shift, respectively. 43.4 Establishing Aims
and Specific Research
Questions
43.3 P
 reparation Prior to 
Study Start When setting up a study, it is essential that the aim
and hypothesis are clearly formulated together
The PIVOT trial was preceded by a phase of with the specific research questions prior to start
detailed study preparation, including a series of of the study. These recommended points are all
important steps to ensure performance of a high- fundamental for how the process of study plan-
quality trial. A corner stone in the early study ning continues since the next steps include choos-
preparation was the investigators meeting held in ing an appropriate methodology to analyze the
Bologna in 2012 with investigators from all results and answer the predefined research ques-
institutions present, where the Manual of tions. Unfortunately, studies that are not based on
Operations and Procedures was unanimously these fundamentals are at risk of being subjected
developed, which was fundamental for every to “fishing” for interesting results rather than rely-
406 E. Svantesson et al.

Fig. 43.3  The PIVOT


team practicing
execution of the pivot
shift test in Bologna,
2012

ing on testing the hypothesis. As honest research- beginning. To be clear about this before knowing
ers, we need to be aware that this type of research the results was, from our point of view, important
is biased, and also, that this type of approach to for promoting a collaborative environment and
research surely exists among ­published literature. avoid potential future friction in the group.
In a multicenter trial, it is important to clearly
frame the purpose of the study. If all participating
centers are in agreement regarding the aims of the Fact Box 43.3
study and what is expected and relevant prior to It is essential that the aim and hypothesis are
study start, not only can unnecessary disputes and clearly formulated together with the specific
misunderstandings be avoided, but it also ascer- research questions prior to start of the study.
tains that contribution to publication bias is lim- The agreement of the research questions in
ited. It is important that the purpose and research the PIVOT trial was accomplished through
questions are not too vague, but clear and answer- an open communication in which all centers
able. The agreement of the research questions in participated and shared their opinions until a
the PIVOT trial was accomplished through an consensus was reached.
open communication in which all centers partici-
pated and shared their opinions until a consensus
was reached. Thereafter, the aims and research
questions were documented in the manual of 43.5 R
 ecruitment Plan and Study
operations and procedures to function as the over- Population
all common guide during study performance. The
PIVOT trial had three specific aims, all accompa- As for all studies, the study protocol needs to be
nied with a hypothesis and specific research clear regarding the inclusion and exclusion
questions. criteria. Furthermore, a standardized way of
­
Another important factor for a well-function- recruiting patients needs to be established in
ing collaboration was that it was decided what order to avoid selection bias. It is important to
aim each center was primarily responsible for, ­acknowledge that each center may have different
i.e., each center was appointed to take the lead in routines for this purpose, especially in a situation
the manuscript writing for specific aims from the where centers from three continents are included.
43  Multicenter Study: How to Pull It Off? The PIVOT Trial 407

Influencing factors may include legal policies, good study since potential bias may be intro-
cultural factors, volume of the patients, and the duced. It was decided that all patients would be
participating researchers’ own preferences. evaluated at 3, 6, 12, and 24 months after ACL
Therefore, this part of the study was discussed reconstruction, and a detailed section for sched-
thoroughly among the participating sites with uled follow-up evaluation was written. For each
focus on how to include a population that could follow-up occation, an exact list of what vari-
answer our research questions while eliminating ables to collect and how to collect these was cre-
as many confounding factors as possible. Strict ated. All sites obtained standardized
inclusion and exclusion criteria were written questionnaires and protocols for collection of
down in the manual of operation. Moreover, a data. Moreover, specific information of all
standardized screening process was established patient-reported outcome measures and clinical
and a power analysis, taking into account an tests were distributed to the clinics. Due to the
anticipated loss to follow-up, was performed to language differences among each country in the
yield the total number of patients to be recruited. PIVOT trial, forms to be completed by the
In agreement of the collaborators, a definition of patients were translated. A specific person was
evaluable patients was established, and it was assigned at the lead site to periodically check the
also decided on how to handle patient with- status of follow-up in each center and send
drawal. As clinicians we are all aware of the hec- notices for upcoming follow-up patients. This
tic clinical work and the limited time for patient person plays a key role to ensure that no patients
consultation, which could contribute to acciden- are missed during the follow-up phase of the
tal deviation from the strict eligibility criteria. To study.
minimize such events, and to further standardize
the recruitment screening, special interview
forms were developed as well as a checklist for Fact Box 43.5
inclusion and exclusion criteria. Specific and clearly stated endpoints were
formulated for each aim of the study. It was
decided that all patients would be evaluated
Fact Box 43.4 at 3, 6, 12, and 24 months after ACL recon-
A standardized way of recruiting patients struction, and a detailed section for sched-
needs to be established in order to avoid uled follow-up evaluation was written.
selection bias. The manual of operation
included strict inclusion and exclusion cri-
teria, a definition of evaluable patients, and 43.7 Standardization of Clinical
how to handle patient withdrawal. To fur- Testing
ther standardize the recruitment screening,
special interview forms were developed as A multicenter trial entails that several examin-
well as checklists. ers are involved in the trial. Therefore, it is
important to undertake every possible action to
standardize the exams and to have a common
approach to the clinical testing. When planning
43.6 T
 ime Line and Follow-Up for the PIVOT trial, the main goal was to create
Schedule a standardized approach for data collection.
A  lot of effort has been undertaken to ensure
It was decided that the PIVOT trial would exam- that a strict training plan existed for clinicians
ine patients over a 24-month period. An impor- that were to be involved in the examination of
tant factor for the study purpose was that specific patients. The most challenging clinical testing
and clearly stated endpoints were formulated for facing the PIVOT trial was the execution of the
each aim of the study. Without defined study pivot shift test.
endpoints, it is simply not possible to conduct a
408 E. Svantesson et al.

The pivot shift test is a dynamic knee laxity test institution has an internal process for reviewing
which has been referred to as the most specific test research trials for ethical purposes. With a study
for detection of ACL deficiency [2]. However, there across four international centers, the legal stan-
are several execution techniques described for the dards regarding research conducted in each coun-
test [1, 14, 16], which entails that the test is limited try must be addressed as well. Additionally, with
by intra- and inter-variability [8, 9]. In an attempt to studies recruiting human subjects, issues of
overcome this, we described and analyzed the informed consent must be addressed. For the
results of a standardized execution technique for PIVOT trial, the lead site developed protocols,
the pivot shift test among surgeons prior to study data collection forms, and an informed consent
start [6, 14]. Moreover, a separate section in the template to be used to obtain all necessary
manual of operations and procedures was written approvals at each site.
to describe the training plan, clearly stating the It cannot be underlined enough that ensuring
training requirements for participation. When the a system for documentation prior to start of the
PIVOT trial subsequently was started, each exam- study is absolutely crucial for a multicenter
iner was provided a video outlining the steps of the study. Not only is this a matter of legal and
standardized pivot shift test as well as written ­ethical responsibility, but it also involves the
instructions for the performance. Additionally, the database of your collected data and the docu-
investigators involved in the PIVOT study under- mentation of communication between centers.
went training for the pivot shift test during the One central database was used for all data col-
University of Pittsburgh Panther Global Summit on lection to ensure standardization of all forms
ACL Reconstruction held in Pittsburgh in August completed for the study. Make sure that there
2011. Apart from the specific training and stan- are persons appointed to take responsibility and
dardization of the pivot shift test, all examiners coordinate documentation, and also ascertain
received written instructions for how to perform all that everyone involved in entering data are
other examinations in a standardized manner. educated in how the documentation system
­
Finally, all investigators and team members of the works with training and written instructions.
PIVOT trial had to complete Good Clinical Furthermore, arrangements for how to docu-
Practices Training prior to the start of the study. ment correspondence, teleconferences, and vid-
eoconferences among the sites need to be made.
If everything is documented, the room for con-
Fact Box 43.6 tretemps is minimized.
A lot of effort was undertaken to ensure
that clinical examination was standardized
among clinicians that were to be involved
in the examination of patients. All investi- Fact Box 43.7
gators received oral and written instruc- The legal standards regarding research
tions of the clinical exams and also conducted in each country were addressed,
underwent training prior to study start. as well as issues of informed consent. One
central database was used for all data
collection to ensure standardization of
­
all  forms completed for the study.
43.8 Regulatory Responsibility Furthermore, arrangement for how to doc-
and Trial Documentation ument correspondence, teleconferences,
and videoconferences among the sites was
To conduct a clinical study, ethical concerns and made.
the regulatory process must be addressed. Each
43  Multicenter Study: How to Pull It Off? The PIVOT Trial 409

43.9 Communication Plan operations and procedures, including a final


confirmation of what had been decided in terms
Due to the nature of an international multicenter of; effective and culturally appropriate recruit-
research study, it is imperative for investigators to ment and retention strategies, standardized sur-
have an open and consistent communication. The gical and outcome measure procedures, plans
investigators of the PIVOT trial discussed and for training, approval of study forms, discussion
reached consensus regarding how communica- of translating study forms, regulatory require-
tion should primarily take form and established a ments, and any other necessary administrative
policy around this. All investigators agreed on activities that were needed for the successful
scheduling teleconferences or videoconferences and timely startup of the PIVOT trial. It was also
at least monthly, as well as more frequent com- decided on beforehand that the investigators
munication via email to address questions and would meet in person at least annually which
issues as they emerged. could, for example, be in conjunction to an
Investigator meetings were also a prioritized international meeting (Fig. 43.4). All in-person
matter among the investigators. During the meetings were documented and minutes distrib-
planning phase of the PIVOT trial, the study uted for all members of the group that were
investigators met to discuss and finalize plans unable to attend. Moreover, over the course of
for implementation of the study. The meeting four years, the research team got together once
included review and approval of the manual of at each participating center (Fig. 43.5).

Fig. 43.4  PIVOT team meeting at the 15th ESSKA Congress in Geneva, 2012
410 E. Svantesson et al.

Fig. 43.5  Annual PIVOT meeting in Kobe, Japan

Fact Box 43.8 pating sites and prepare a plan for man-
The investigators of the PIVOT trial dis- uscript writing early in the planning
cussed and reached consensus regarding process.
how communication should primarily take • Apply a standardized methodology
form and established a policy around this. across all parts of study performance
All in-person meetings were documented including recruitment of patients, the
and minutes distributed for all members of screening process, the intervention, and
the group that were unable to attend. the clinical examination. The use of
checklists and protocols promotes a
standardized approach.
• Document everything! Have persons
Fact Box 43.9: Key Points for a Successful appointed to take responsibility and
Performance of a Multicenter Trial in the coordinate documentation.
Experience of the PIVOT Trial • Have an open and consistent communi-
• Frame the purpose of the study and cation. A multicenter trial is a team-
establish research questions and hypoth- work, and everyone must contribute to a
eses prior to study start. collaborative environment.
• Define how specific responsibilities • Follow all regulatory requirements of
should be distributed across the partici- each institution and country, respectfully.
43  Multicenter Study: How to Pull It Off? The PIVOT Trial 411

43.10 T
 he Results of the PIVOT ing for continued analysis of the data. Since then,
Trial a total of four other papers have been published
in various journals [3, 10, 18, 19], and additional
43.10.1  Data Collection manuscripts are either submitted or in prepara-
and Analysis tion. Over 20 presentations of the PIVOT trial
have been held at international meetings, and
The collection of data was evaluated continu- several abstracts and posters have been presented.
ously during the enrollment phase. The enroll- The PIVOT trial has also received attention as
ment phase was ended when having enrolled a newsletter articles [4, 5] and as a podcast episode
total of 107 patients, which subsequently were by the American Journal of Sports Medicine [12].
followed for 24 months. As part of the study pro- In 2017, the book Rotatory Knee Instability: An
tocol, the statistical analysis for each study aim Evidence Based Approach edited by the principal
was planned in advance, and the setup of a com- investigator of each participating center, was
mon dataset facilitated data analysis. The princi- published [15]. The book could be seen as a
pal site (University of Pittsburgh) took the lead in deepened overview of all major aspects of the
performing data analysis to increase consistency, assessment of rotatory knee instability, including
ensuring that the investigators who performed the the pivot shift phenomenon, and highlights cur-
statistical analysis were well-familiar with the rent knowledge as well as future aspects that
database. The PIVOT trial team followed the ini- need to be addressed by further research.
tial plan of which center that was responsible for
writing the manuscript of each specific study
aim. Additionally, a progress report was updated Fact Box 43.11
successively to keep track of study progress, pub- A validation study of the noninvasive
lications, and assignments. All adverse events, technology for quantitative pivot shift
reoperations, and contralateral ACL ruptures dur- was primarily performed. Since then, a
ing the study period were also documented. total of four other papers have been pub-
lished and several manuscripts are in
progress. Over 20 presentations of the
Fact Box 43.10 PIVOT trial have been held at interna-
The statistical analysis for each study aim tional meetings, and in 2017, the book
was planned in advance, and the setup of a Rotatory Knee Instability: An Evidence
common dataset facilitated data analysis. Based Approach was published.
A progress report was updated successively
to keep track of study progress, publica-
tions, and assignments.
Fact Box 43.12: Key Findings from the
PIVOT Trial
43.10.2  Presentation of the Results • Both devices for quantification of the
pivot shift were found valid and able to
The first results presented from the PIVOT trial detect differences between clinically
was a validation study of the noninvasive tech- graded low- and high-grade pivot shift.
nology for quantitative pivot shift [13]. The paper • Quantitative pivot shift detected a signifi-
concluded that both techniques were valid and cantly higher tibial acceleration and lateral
that the technology was able to detect differences compartment translation in patients under
between clinically graded low- and high-grade anesthesia compared with in awake state.
pivot shift, findings which were indeed encourag-
412 E. Svantesson et al.

References
• Preoperative tibial acceleration and lat-
eral compartment translation were sig- 1. Anderson AF, Rennirt GW, Standeffer WC Jr.
Clinical analysis of the pivot shift tests: description
nificantly reduced by anatomic of the pivot drawer test. Am J Knee Surg. 2000;13(1):
single-bundle ACL reconstruction. 19–23.
• Generalized joint laxity does not appear 2. Benjaminse A, Gokeler A, van der Schans
to correlate with quantitative pivot shift CP.  Clinical diagnosis of an anterior cruciate liga-
ment rupture: a meta-analysis. J Orthop Sports Phys
in the ACL-injured knee. Ther. 2006;36(5):267–88. https://doi.org/10.2519/
• Static anteroposterior knee laxity tests jospt.2006.2011.
are poorly correlated to quantitative 3. Hamrin Senorski E, Svantesson E, Sundemo
pivot shift in the ACL-injured knee. D, Musahl V, Zaffagnini S, Kuroda R, et  al.
Preoperative knee laxity measurements predict the
Static and rotatory knee joint laxity achievement of a patient-acceptable symptom state
should be considered as separate entities after ACL reconstruction: a prospective multicenter
of the knee examination. study. J ISAKOS: Joint Disord Orthop Sports Med.
2018;3:26–32.
4. Hoshino Y, Musahl V, Kuroda R, Zaffagnini S,
Samuelsson K, Lopomo N, et al. PIVOT Study Group
Project Update ISAKOS Newsletter. 2014.
43.11 Future Directions 5. Hoshino Y, Musahl V, Ryosuke K, Zaffagnini S,
Samuelsson K, Lopoma N, et  al. Quantified pivot
shift test by accelerometer and iPad App. ESSKA
During the course of the PIVOT trial, the collabo- Newsletter. 2015.
ration among the centers has grown stronger, and 6. Hoshino Y, Araujo P, Ahlden M, Moore CG, Kuroda
the network has expanded. Conducting a multi- R, Zaffagnini S, et  al. Standardized pivot shift test
center trial has encouraged us to continue applying improves measurement accuracy. Knee Surg Sports
Traumatol Arthrosc. 2012;20(4):732–6. https://doi.
this methodology, i.e., quantitative evaluation of org/10.1007/s00167-011-1850-0.
the pivot shift test, for high-quality research. Some 7. Hoshino Y, Araujo P, Ahlden M, Samuelsson K, Muller
of the preliminary research questions have been B, Hofbauer M, et al. Quantitative evaluation of the
answered; however, they have also resulted in pivot shift by image analysis using the iPad. Knee
Surg Sports Traumatol Arthrosc. 2013;21(4):975–80.
identification of new areas of research that need to https://doi.org/10.1007/s00167-013-2396-0.
be undertaken. For example, the results from the 8. Kim SJ, Kim HK.  Reliability of the anterior drawer
PIVOT trial have led to the setup of another study test, the pivot shift test, and the Lachman test. Clin
which aims to compare the outcomes and the rota- Orthop Relat Res. 1995;(317):237–42.
9. Kuroda R, Hoshino Y, Kubo S, Araki D, Oka S,
tory laxity between ACL reconstruction and ACL Nagamune K, et  al. Similarities and differences of
reconstruction complemented by tenodesis. diagnostic manual tests for anterior cruciate ligament
insufficiency: a global survey and kinematics assess-
Take-Home Message ment. Am J Sports Med. 2012;40(1):91–9. https://doi.
org/10.1177/0363546511423634.
• The multicenter PIVOT trial has provided 10. Lopomo N, Signorelli C, Rahnemai-Azar AA, Raggi
novel knowledge about the use of quantitative F, Hoshino Y, Samuelsson K, et  al. Analysis of the
pivot shift in ACL-injured patients across influence of anaesthesia on the clinical and quantita-
three continents. tive assessment of the pivot shift: a multicenter inter-
national study. Knee Surg Sports Traumatol Arthrosc.
• Performance of a multicenter trial requires 2017;25(10):3004–11. https://doi.org/10.1007/
teamwork, which is promoted by an open and s00167-016-4130-1.
consistent communication. 11. Muller B, Hofbauer M, Rahnemai-Azar AA, Wolf
• A cornerstone for the successful execution of M, Araki D, Hoshino Y, et al. Development of com-
puter tablet software for clinical quantification of
the PIVOT trial was the stringent preparation lateral knee compartment translation during the pivot
prior to study start, where consensus was shift test. Comput Methods Biomech Biomed Engin.
reached across all participating sites for a 2016;19(2):217–28. https://doi.org/10.1080/1025584
standardized methodology regarding recruit- 2.2015.1006210.
12. Musahl V. Validation of quantitative measures of rota-
ment of patients, the screening process, the tory knee laxity. Am J Sports Med. 2016;44(9):2393–8.
intervention, and the clinical examination.
43  Multicenter Study: How to Pull It Off? The PIVOT Trial 413

13. Musahl V, Griffith C, Irrgang JJ, Hoshino Y, tive pivot shift and generalized joint laxity: a prospec-
Kuroda R, Lopomo N, et  al. Validation of quan- tive multicenter study of ACL ruptures. Knee Surg
titative measures of rotatory knee laxity. Am Sports Traumatol Arthrosc. 2018;26(8):2362–70.
J Sports Med. 2016;44(9):2393–8. https://doi. https://doi.org/10.1007/s00167-017-4785-2.
org/10.1177/0363546516650667. 19. Svantesson E, Hamrin Senorski E, Mårtensson J,
14. Musahl V, Hoshino Y, Ahlden M, Araujo P, Irrgang Zaffagnini S, Kuroda R, Musahl V, et  al. Static
JJ, Zaffagnini S, et  al. The pivot shift: a global anteroposterior knee laxity tests are poorly cor-
user guide. Knee Surg Sports Traumatol Arthrosc. related to quantitative pivot shift in the ACL-
2012;20(4):724–31. https://doi.org/10.1007/ deficient knee: a prospective multicentre study.
s00167-011-1859-4. J ISAKOS: Joint Disord Orthop Sports Med.
15. Musahl V, Jón K, Kuroda R, Zaffagnini S.  Rotatory 2018;3:83–88.
knee instability: an evidence based approach: Springer 20. Zaffagnini S, Lopomo N, Signorelli C, Marcheggiani
International Publishing; 2017. Muccioli GM, Bonanzinga T, Grassi A, et  al.
16. Noyes FR, Grood ES, Cummings JF, Wroble RR. An Innovative technology for knee laxity evaluation: clin-
analysis of the pivot shift phenomenon. The knee ical applicability and reliability of inertial sensors for
motions and subluxations induced by different exam- quantitative analysis of the pivot-shift test. Clin Sports
iners. Am J Sports Med. 1991;19(2):148–55. https:// Med. 2013;32(1):61–70. https://doi.org/10.1016/j.
doi.org/10.1177/036354659101900210. csm.2012.08.007.
17. Rosenberg W, Donald A.  Evidence based medi-
21. Zaffagnini S, Signorelli C, Grassi A, Yue H, Raggi
cine: an approach to clinical problem-solving. BMJ. F, Urrizola F, et  al. Assessment of the pivot shift
1995;310(6987):1122–6. using inertial sensors. Curr Rev Musculoskelet
18. Sundemo D, Blom A, Hoshino Y, Kuroda R, Lopomo Med. 2016;9(2):160–3. https://doi.org/10.1007/
NF, Zaffagnini S, et al. Correlation between quantita- s12178-016-9333-z.
Conducting a Multicenter Trial:
Learning from the JUPITER
44
(Justifying Patellar Instability
Treatment by Early Results)
Experience

Jason L. Koh, Shital Parikh, Beth Shubin Stein,


and The JUPITER Group

44.1 Introduction have sufficient statistical power to address a


­clinical question. In addition, the patient popula-
Multicenter trials are critically important in tion or the particular practice patterns of a single
answering significant research questions in ortho- location can make generalizing the results of a
pedic surgery. Due to the nature of orthopedic single-site study difficult [3]. For example, in
injuries and treatment, it can be difficult for a patellofemoral instability, surgical results based
single center to accumulate enough patients to on an injury group consisting primarily of trau-
matic injuries to male military recruits in their
JUPITER Group—Elizabeth A.  Arendt, MD; Jackie 20s may not be applicable to atraumatic disloca-
Brady, MD; Dennis Crawford, MD; Diane L. Dahm, MD; tions in skeletally immature female patients with
Henry Ellis, MD; Matthew Halsey, MD; Peter Fabricant,
MD; Jack Farr, MD; Dan Green, MD; Benton Heyworth, trochlear dysplasia. A multicenter trial might
MD; Kosmas Kayes, MD; Dennis Kramer, MD; Aaron help address diverse patient populations and
Krych, MD; Robert Magnussen, MD; Todd Milbrandt, practice patterns.
MD; Matthew Milewski, MD; Charlie Popkin, MD;
Lauren Redler, MD; David Roberts, MD; Verena
Schreiber, MD; Seth Sherman, MD; Sabrina Strickland,
MD; Marc Tompkins, MD; Eric Wall, MD; Philip Wilson, Multicenter trials can provide important
MD; Yi-Meng Yen, MD information by collecting sufficient num-
The JUPITER Group bers and varieties of patients to allow signifi-
cant statistical power and generalizability.
J. L. Koh (*)
Department of Orthopaedic Surgery, NorthShore Challenges are related to the geographic
University Health System, Northshore Orthopedic separation and different locations that make
Institute, Evanston, IL, USA trial coordination more difficult than in a
Pritzker School of Medicine, University of Chicago, single-site study.
Chicago, IL, USA
S. Parikh
Sports Medicine, Division of Pediatric Orthopaedic There are unique challenges related to multi-
Surgery, Cincinnati Children’s Hospital, University of
Cincinnati, Cincinnati, OH, USA center trials. The obvious one is that investigators
are geographically spread apart, resulting in
B. S. Stein
Hospital for Special Surgery, Weill Cornell Medical increased difficulty in communication and poten-
College, New York, NY, USA tial for increased variability in conducting the

© ISAKOS 2019 415


V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_44
416 J. L. Koh et al.

study. What is just as critical is the dedication of 44.3 Discussion and Planning


the group of investigators to be willing to be Phase
­collaborative and compromise to achieve a col-
lective goal. A discussion phase was initiated after some ini-
tial face-to-face and email contacts among mem-
bers of a group of researchers interested in the
44.2 Initiation of Multicenter Trial area of patellofemoral instability, primarily as
determined by attendance at the International
Typically, a multicenter trial begins with a small Patellofemoral Study Group (IPSG) meeting in
group of investigators who share a common Chicago in 2015. Interest was gauged by a small
interest in a clinical research question. This is group of initial investigators. In most multicenter
often based on results of a single institution’s trials, there is a project leader or leadership team
experience and the desire to identify how this that helps keep the trial on track. Key to the con-
may be further generalizable. In the JUPITER duct of this and any multicenter study is signifi-
trial, the investigators have been motivated to cant commitment by this group of investigators.
initiate a multicenter study investigating the
results of treatment of patellofemoral instability
in a pediatric, adolescent, and young adult popu- Discussion and planning began with a
lation. The standard of care for initial acute small group. Essential statistical evaluation
patellofemoral dislocation has historically been was performed, and a screening form was
nonoperative [10]; however, it has been demon- developed and circulated to evaluate site
strated that the rate of recurrent instability can be and investigator capabilities.
quite high and also that many patients have
symptoms or loss of function related to the dislo-
cation. In addition, recurrent instability has been The research goals were identified, and after
associated with a significant rate of articular car- this was clarified, a statistician performed a
tilage damage and long-term osteoarthritis [6]. power calculation for the study. A statistician or
Recent work done at several centers has identi- epidemiologist with statistical training is a criti-
fied specific risk factors for recurrent instability cal partner in identifying the appropriate number
[1, 2, 5, 9]. Algorithms have been proposed for of patients to ensure sufficient power to answer
treatment [11], but questions still remain as to the proposed research question. Appropriate cor-
the natural history and results of treatment for rections should be made for patient dropout or
different patients [7, 8]. JUPITER is a multi- loss to follow-up. The calculation of overall
center, multi-armed prospective cohort study enrollment numbers will be compared to antici-
aimed at addressing some of these questions, pated individual site enrollments for a given
particularly which patients can do well with an period of time to help determine the total number
isolated medial patellofemoral ligament recon- of sites and/or anticipated length of time for
struction for stabilization and which patients enrollment.
need other procedures. A screening form for potential sites interested
in joining the study was developed and circulated.
In this tool, sites and investigators provided infor-
mation about level of interest, site and investiga-
Initiation of the JUPITER trial was based tor experience and support (including financial
on a pilot single-center study and was support and research personnel such as research
designed to identify risk factors for recur- assistants/coordinators), estimated frequency of
rent patella instability and treatment enrollment, and anticipated level of commitment.
outcomes. A sample screening form page from JUPITER is
shown in Fig. 44.1 (Tables 44.1 and 44.2).
44  Conducting a Multicenter Trial: Learning from the JUPITER (Justifying Patellar Instability Treatment… 417

JUPITER (Justifying Pediatric Instability Treatment by Early Results)


SCREENING FORM

Name of surgeon:

Institution Name:

Affiliated University:

Phone No

Research Coordinator

Years in Practice: years

Type of Practice:

Average no of Patellar Instability treated non-operatively per year:

Average no of Patellar Instability treated operatively per year:

Average no of Medial-sided repair per year:

Average no of isolated MPFL reconstructions per year:

Average no of TTO (Elmslie-Trillat, AMZ, distalization) per year:

Average no of Osteochondral fracture Rx following patellar stabilization per year:

Average no alignment osteotomies (femur/tibia, coronal/rotational) per year:

1. Of all operative patellar stabilization, how often do you do knee arthroscopy?

Knee Arthroscopy %

2. Of all operative patellar stabilization, how often patients have open femoral physis?

Open Physis %

Fig. 44.1  JUPITER Screening Form

44.3.1 Protocol Development patient-reported outcome scores and standard-


ized evaluation tools.
Protocol development was initially performed by The specific aims of JUPITER were to evalu-
the executive committee based on a pilot study ate the safety and effectiveness of (1) nonopera-
initiated at one site. Multiple questions need to be tive treatment, (2) isolated medial patellofemoral
answered during protocol development, includ- ligament (MPFL) reconstruction, and (3) MPFL
ing eligibility criteria and assessment. Assessment reconstruction combined with bony procedures
for clinical projects can include history, physical (osteotomy, trochleoplasty). Subject recruitment
examination, and radiographic studies, as well as was planned for a 1  year time period at ten
418 J. L. Koh et al.

Table 44.1  JUPITER authorship criteria (adapted from PRISM https://www.prismsports.org/)


Eligibility criteria for authorship in JUPITER manuscripts (adapted from PRISM)
I. All of the following criteria must be met to be considered for authorship:
  1. Maintain good standing in JUPITER, as defined in the manual of operations
  2. Respond with all of the following within 2 weeks for each manuscript
   (a) Comments/edits of manuscript (or an “all good” response)
   (b) Completion of all disclosure forms
   (c) Completion of all copyright transfer forms, etc.
II. In addition, investigators must have met a set of the following criteria (by receiving at least 3 points)
  1. Participated in protocol development and study design—1 point
  2. Participated in writing the original manuscript—2 points
  3. Reviewed a rough draft of the article with substantial suggestions and editing—1 point
  4. Patient enrollment with complete data used for this study:
   (a) 1–9% of patients in study with sufficient follow up data—1 point
   (b) 10–29% of patients in study with sufficient follow up data—2 points
   (c) ≥30% of patients in study with sufficient follow up data—3 points
  5. Participated in grant writing for study group funding—2 points
III. Authorship order will be determined by Executive Committee based upon:
  • Good standing in JUPITER
  • Amount and quality of manuscript drafting/edits/review
  • Number of patients entered into the registry
IV. If you wish to perform a research sub‐study utilizing the multicenter database, you need to complete a “Research
Proposal Form”
  • The form will be reviewed by the investigators and coordinators at Cincinnati Children’s Hospital and the
Hospital for Special Surgery to assure:
   – No conflict with existing study proposals
   – Compliance with the “FINER” criteria: Feasible, Interesting, Novel, Ethical, and Relevant
  • If the study is approved, all participating Investigators in the group will be notified about the study. Centers with
clean and complete data related to the topic will be invited to participate
  • Preliminary authorship criteria and order will be established with a PI and one other representative from the
proposing institution and a PI only from other participating institutions. The tag line “JUPITER study group” will be
added to the list of authors on all publications
JUPITER-Version 1—February 2018

c­ enters. Posttreatment outcome assessment was and reproducibility of data. Validated tools are
to be performed at 6, 12, and 24 months, includ- important to use to make sure the data appropri-
ing assessment of function, activity level, health- ately reflects desired outcome evaluation; we use
related quality of life, patellar stability, knee Pedi-IKDC, Kujala, HSS Pedi-FABS, Banff
motion, and complications. Patellofemoral Instability instrument 2.0, and
KOOS Knee survey. Initially, the assessment tool
was a paper document; however, specialty society
44.3.2 Clinical Assessment grant funding was received allowing investigators
to use a Web-based system for data collection and
In JUPITER, a draft assessment tool for data col- management (Oberd™, Columbia, Missouri
lection was developed. The initial tool was rela- USA), and the study t­ransitioned to this  during
tively lengthy, and multiple conference calls were the course of enrollment. Other i­ nvestigators have
made by the group of investigators to help further used REDCap (Research Electronic Data
refine and develop the protocol. Critically, it was Capture), which is a free, research data manage-
felt that it was important to simplify the initial ment system sponsored by Vanderbilt University
form to minimize the burden on the investigators and supported by the National Institutes of
located at multiple sites. A simplified assessment Health. The advantages of using electronic data-
tool also allows for improved patient compliance bases are multiple: (1) can allow for remote col-
44  Conducting a Multicenter Trial: Learning from the JUPITER (Justifying Patellar Instability Treatment… 419

Table 44.2  JUPITER institutions and investigators Given the relative complexity of radiographic
Institutions Investigators evaluation (e.g., for the measurement of anterior
Cincinnati Children’s Shital Parikh, PI tibial tubercle–trochlear groove distance on MRI)
(Coordinating Center) Eric Wall
Hospital for Special Surgery Beth Shubin Stein
in the JUPITER study, it was important that train-
(Coordinating Center) Dan Green ing to establish common standards for imaging
Sabrina Strickland evaluation was necessary. Training was per-
Peter Fabricant formed by using a standard set of images
Boston Children’s Yi-Meng Yen reviewed at in-person meetings and also distrib-
Dennis Kramer
Benton Heyworth uted electronically. Ultimately, concerns still
Matthew Milewski remained about variability in image interpreta-
Columbia University Charlie Popkin tion across sites, and part of the way through
Lauren Redler enrollment, the investigators were able to obtain
Mayo Clinic Diane L. Dahm
sufficient funding to have images electronically
Todd Milbrandt
Aaron Krych sent to a central site to be interpreted by a specifi-
NorthShore University Health Jason L Koh cally trained team of musculoskeletal radiolo-
System David Roberts gists. For this aspect, REDCAP was used to
Verena Schreiber collect and store the data. This evolution to cen-
OrthoIndy Jack Farr
Kosmas Kayes
tralized radiographic evaluation and repository
Oregon Health Sciences Jackie Brady for image assessment is expected to improve con-
University Dennis Crawford sistency of this aspect of evaluation and also
Matthew Halsey reduced the time commitment of individual sites.
Ohio State University Robert Magnussen If there is a potential for significant variability in
Texas Scottish Rite Henry Ellis
Philip Wilson
radiographic interpretation, then centralized
University of Minnesota/TRIA Elizabeth A. Arendt imaging analysis at a single site is preferred.
Marc Tompkins
University of Missouri Seth Sherman
44.3.4 Centralized Data Repository

lection of PRO by any electronic media, (2) can The data from multiple sites must be collected
send automated updates related to follow-up and and aggregated at a centralized site. This data
incomplete data, and (3) can help with data analy- management can be time-consuming and expen-
sis due to advanced data output functions. sive; however, it is critically important to be able
to have the data in a secure location that remains
accessible to appropriate researchers. This typi-
Protocol development was conducted by a
cally requires financial support, which can be
team, and clinical and radiographic assess-
either provided by the sponsoring institution or
ment tools were selected. Assessment is
external grants.
aided by use of Web-based data collection
and management for clinical outcomes and
centralized radiographic evaluation. Data is
44.3.5 Funding and Grants
collected centrally.
Once the research protocol has been established,
it is often valuable to submit for research grants
44.3.3 Radiographic Evaluation from various funding sources. In many cases, the
initial pilot study is self- or institution-funded;
Regarding radiographic evaluation, an extensive however, extrapolating the study to multiple sites
amount of time was spent in investigator meet- requires an additional level of funding. In most
ings to develop standard radiographic methods. nonindustry-sponsored research, support for
420 J. L. Koh et al.

research personnel at each individual site is typi-


cally that site’s responsibility. Individual sites Regulatory issues (such as IRB issues and
may have limited resources to participate in the safety monitoring) can be challenging and
trial, and obtaining grant support may be critical delay site initiation. The use of a central-
to active site enrollment. Investigators should be ized IRB may be helpful but may require
active in pursuit of local support as well as from extensive back and forth with a primary
larger organizations. Using data from the pilot site. Standardized protocols and proce-
study, the JUPITER executive group successfully dures help with consistent and safe data
submitted an application for research funding collection.
from one of the orthopedic specialty societies to
support patient-reported outcome data collection.
Additional grant funding from university/depart- Notably, audits from IRB during trials are
mental research funding allowed for single-site common, and one has to organize and prepare
radiographic review. Ultimately, it is hoped that everything such that complete transparency and
the JUPITER experience will allow competition responsibility could be proven at any point dur-
for NIH-funded grants that provide multi-institu- ing the study. Recently, one of our coordinating
tional support. centers had their IRB audit the entire JUPITER
study. There were some omissions and some
Grant funding plays an important role in minor lapses, which have since been corrected.
supporting the research project. An additional component of any clinical
trial  for publication in most major journals is
­registration with a clinical trial database. In the
United States, www.clinicaltrials.gov is free and
44.3.6 Institutional Review Board is the most commonly used registration.
(IRB) Approval

In a clinical trial, institutional review board 44.3.7 Data Safety and Monitoring


(IRB) ethics approval for human research must Board
be obtained. This can often be a complex and
time-consuming effort. The process is even As part of most clinical trials, a data safety and
more complex when multiple sites are involved, monitoring board is often required by funding
and data must be transmitted between different agencies. The purpose is to maintain subject
institutions. In JUPITER, the pilot site had safety and data integrity but also to recommend
developed an IRB-approved protocol that served cessation of the trial for ethical reasons (e.g., fail-
as a template for IRB protocols at the other ure to meet enrollment or interim analysis show-
sites. This speeded up site-specific protocol ing dramatic differences). Protocol deviations are
development; however multisite approval also evaluated. Research coordinators at key sites
resulted in delays in initiating study at several can help monitor compliance and are in charge of
sites, for many months in some cases. In the data cleaning. Periodic audits at each institution
future, the authors would consider utilizing a would help ensure complete data collection.
central IRB for the trial for as many sites as pos-
sible, which would hopefully speed gaining the
appropriate ethical approval for multiple sites 44.3.8 Standardized Operating
and decrease time to full enrollment. Recently, Procedures/Training
the NIH has released a policy on using a single
IRB for multicenter trials, which may help with Once a clearly defined research protocol and
this process. This can be found at https://grants. standardized assessment tools have been cre-
nih.gov/policy/clinical-trials/single-irb-policy- ated, a manual of operating procedures (MOP)
multi-site-research.htm. can be developed. This can help with creating
44  Conducting a Multicenter Trial: Learning from the JUPITER (Justifying Patellar Instability Treatment… 421

JUPITER (adapted from MARS/MOON)


Sub-Study Proposal Sheet 9. Is The Current IRB [ ] Yes [ ] No [ ] UNSURE
Approval Adequate?
Based upon “Finer” approach to clinical questions desscribed in Designing Clinical Research (see below).
It No or Unsure, Please
Feasible, Intresting, Noval, Ethical, Relevant
List The Needed
Amendments:
Name: Phone Number:
E-Mail: Date Submitted:
10. Will The Study Require [ ] Yes [ ] No
Additional Funding?
1. Study Title:
If Yes, Please State The
Source.

2. Authors/Investigators: 1. 11. Length of Time Needed


2. To Complete THis Study?
3.

2a. Reviewers: 1.
2. Based upon: Designing Clinical Research: An Epidemiologic Approach by Stephen B. Hulley,
Steven R. Cummings, Warren S. Browner, Deborah G. Grady, Thomas B. Newman. Lippincott
Williams & Wilkins. Third Edition, November 1, 2006.
3. Hypotheses:
Adapted from Vanderbilt University Sports Medicine and MOON Forms

4. Outcome Measures: A.
B.
C.

5. Significance/ Previous Studies:

6. Data/Information
Required From
Coordinating Centers:

7. Power Analysis.
Can The Cohort Answer
The Question?

8. Statistical Analysis
Required? Who Will
Perform This?

Fig. 44.2  JUPITER Sub-Study Proposal Sheet

standardization of enrollment and assessment. It evaluate proposed research questions. A modi-


will also help with training of research person- fication of this has been used by the MOON
nel. During the course of enrollment, it is not (Multicenter Orthopedic Outcomes Network)
unlikely that there may be some change of and MARS (Multicenter ACL Revision Study)
research personnel at several of the sites where groups, and this was adopted by the JUPITER
they may lose their research coordinator. A group to determine which studies to pursue
manual can assist in helping sites when research (Fig. 44.2).
assistants or coordinators change. Training can
be performed either in person, through docu-
ments, or by phone or online. Regularly sched- 44.3.10  Presentation
uled phone or in-person meetings can keep and Publication
personnel updated.
One of the potentially more challenging aspects
of multicenter trials is how results will be pre-
44.3.9 Research Questions sented and published. It is best to address ques-
tions of authorship and publication credit and
During the course of the trial or afterward, it is priority before study initiation and certainly
common to have additional research questions before publication submission. We feel that
emerge. How to prioritize these questions and authorship should follow International Committee
choose which ones to pursue can be difficult. It of Medical Journal Editors (ICME) criteria,
is best practice to develop criteria for the which is actually required by many of the pre-
­governing committee to evaluate these propos- mier medical journals. Authorship must meet
als. The FINER (feasible, interesting, novel, several criteria, including significant contribu-
ethical, relevant) criteria [4] are often used to tions to study design, execution, assessment, and
422 J. L. Koh et al.

writing and editing. Up-to-date criteria with addi- and limited face-to-face engagement can result in
tional detail are provided online at http://www. investigator focus being directed elsewhere.
icmje.org/recommendations. JUPITER has successfully addressed this with
regularly scheduled monthly conference calls
that include clinical research staff (such as
Research questions are evaluated using the research assistants and coordinators) as well as
FINER (feasible, interesting, novel, ethi- investigators. During these calls, critical opera-
cal, relevant) criteria. Presentation and tional updates can be provided to the group and
publication guidelines should be developed especially the personnel that are typically per-
in advance so that there is a clear under- forming much of the day-to-day enrollment and
standing about authorship. data collection activity. There is also time for
investigators or coordinators to bring up and dis-
cuss questions or areas where further clarification
With respect to multicenter trials, authorship is needed. Summary minutes provide valuable
questions become more complex. Fortunately, information that can ­provide updates to an exist-
several existing models for publication author- ing standardized protocol.
ship and priority exist. Historically, many jour-
nals limited the number of named authors;
however, with the advent of electronic publica- Communication and monitoring are critical
tion and indexing, it has become easier to credit to study progress. The use of regularly
multiple authors on a publication. In many cases, scheduled meetings improves communica-
papers will be published with several principal tion and consistency; transparency about
authors and the rest of the investigators credited site-specific trial milestones helps monitor
as a group, with each individual investigator’s progress and encourages continued investi-
name listed and searchable electronically. For gator participation.
JUPITER, authorship criteria were first discussed
among the executive committee and then circu-
lated among the larger group of investigators for Another successful tool has been to send out
comment. Criteria were modeled after the PRISM frequent regular score cards indicating progress
(Pediatric Research in Sports Medicine) group to specific trial milestones, including investiga-
criteria and included assessment of active partici- tors and sites, IRB status, and their current enroll-
pation in data collection, as well as ICMJE crite- ment numbers. Subject visit and follow-up
ria. To encourage multiple author participation, it compliance are additional measures to be poten-
was recommended that principal authors from tially added. The score cards serve several func-
different sites be listed on each research paper. tions. First, they update the entire group of
Publications would be submitted to the group for investigators as to the current status with respect
editorial review and input as required by ICMJE. to overall enrollment in the project. It is motivat-
ing to see the progress being made across the dif-
ferent locations as the trial progresses. Secondly,
44.4 Execution it allows the group to identify and learn from the
sites that are most successful in terms of enroll-
44.4.1 Communication ment. Finally, it can spur some friendly competi-
and Coordination tion and additional engagement to increase
enrollment activity.
Throughout the conduct of the trial, it is critical Face-to-face group meetings are also valuable
to keep investigators and sites engaged in the to engage the group and continue active participa-
research process. Too often, multicenter trials tion. We have tried to have face-to-face meetings at
lose focus and energy since geographic distance major medical conferences where it is anticipated
44  Conducting a Multicenter Trial: Learning from the JUPITER (Justifying Patellar Instability Treatment… 423

that there will be multiple investigators available. begin to work in a timely fashion. Appropriate
This can be challenging since not all investigators involvement of a biostatistician in planning the
will be at every conference, and even if investiga- study design can make the analytical work more
tors attend the conference, they may have other straightforward when the data has been collected.
commitments that limit their availability. Some Trying to make sense of a pile of data after the
authors have suggested to address this issue, sepa- fact without appropriate preparation can be
rate investigator meetings are helpful; however, challenging.
this can be a significant time and expense burden. Paper drafts should be completed promptly,
and coauthors should commit to providing rapid
review and comments to the drafts, hopefully
44.4.2 Data Monitoring within 1–2  weeks. The main author/s should
assess and appropriately incorporate these com-
Enrollment, ongoing participation, and continued ments and prepare for submission. Determination
follow-up need to be monitored during the course of the appropriate journal to submit to should
of the trial. In this way, accurate progress to involve the lead authors.
­milestones can be assessed and communicated. Following submission, it is not uncommon
We recommend significant transparency through- for high-quality journals to either reject or
out this process as there can be loss of trial request significant revisions to the article. The
­participation at every step. Digital forms can sig- lead author/s should take the responsibility to
nificantly help in monitoring progress. respond to comments and revise the article for
resubmission or submission to a new journal.
It is appropriate for the group to celebrate after
44.5 Publication publication!

As previously noted, discussion of authorship


and priority should be performed as early as pos- 44.6 Tips for Multicenter Trials
sible, including in the planning phase of the
study. Multiple papers typically emerge from a Multicenter trials have unique challenges in
multicenter trial. Commonly, a methods paper is that there are additional layers of complexity
a first publication and is based primarily on the due to the multiple parties involved. It is critical
research protocol. Elements of the main paper to have a highly motivated core group of inves-
(particularly the introduction and methods) can tigators that are committed to the project. An
be written prior to obtaining complete data and important aspect is the inclusion of an epidemi-
analysis. Secondary studies can also be proposed ologist/statistician in study design and plan-
prior to trial completion and evaluated as previ- ning. To simplify the process, centralization is
ously discussed using FINER criteria. helpful. A central IRB may help decrease time
to initiation of the trial in multiple centers.
Centralized image analysis can improve consis-
Multiple papers often arise from a mul- tency and decrease investigator burdens.
ticenter trial. Authorship should follow Centralized data collection is critical to a suc-
guidelines. Initial components of paper cessful trial.
writing (such as the methods section) can It is important to be aggressive in seeking out
proceed in parallel with study recruitment. funding opportunities. Multicenter trials typi-
Authors should respond in a timely fashion. cally require funding of the central coordinating
site and also funding of resources at each of the
contributing sites. Early preparation and submis-
The data analytics team should be notified sion of grants can help significantly in getting the
when collection is complete so that they can research off the ground.
424 J. L. Koh et al.

Finally, multicenter trials are critically impor-


Tips for Multicenter Trials tant for medicine, but they are also a great way to
• Unique challenges due to multiple build collegiality and friendship with investiga-
parties. tors across multiple institutions. A critical part of
• Include epidemiologist/statistician in medicine is shared knowledge, and working
study design and planning. together with like-minded, interested investiga-
• Centralized IRB and data collection and tors builds the community of scholars that con-
analysis are helpful. tributes to the advancement of clinical care.
• Funding can be challenging but is often
needed to support the central coordinat-
ing site. Clinical Vignette/Case Study
• Take advantage of others’ knowledge; The JUPITER (Justifying Patellar
the MARS/MOON and PRISM multi- Instability Treatment by Early Results)
center trial groups were very helpful in study is an example of how a multicenter
setting up JUPITER. trial is initiated, developed, and executed.
• These are complex, so anticipate chal- After an initial pilot study at a single insti-
lenges to enrollment. tution, a small group of investigators was
• Communication and transparency main- established to develop a protocol.
tain consistency and encourage Additional investigators were invited to
investigators. participate, and the protocol was further
• Multicenter trials have unique bene- developed, including the decision to cen-
fits—not only to answer research ques- tralize data collection and analysis. Input
tions but also to build collegiality and was received from investigators from other
collaboration with investigators from multicenter studies, such as MOON,
multiple institutions. MARS, and PRISM. It was decided to use
the FINER criteria for evaluation of pro-
posed research questions, and authorship
guidelines were developed as well. Some
It’s very helpful to take advantage of opportu-
challenges were encountered with obtain-
nities to discuss with other groups that have initi-
ing local IRB approval at several sites, and
ated and successfully executed multicenter trials.
others had difficulty with research coordi-
In orthopedic sports medicine, the MARS and
nator recruitment, but ultimately enroll-
MOON groups have been very helpful and have
ment goals were able to be achieved, and
been generous with their advice. The pediatric
longitudinal data is in the process of being
PRISM (Pediatric Research in Sports Medicine)
gathered.
group has also provided models of how to address
some of the questions about authorship.
One should expect that there will be difficul-
ties in conducting the study. Sites may have dif-
ficulty with IRB approval or after beginning may Take-Home Messages
lose their coordinator. It is helpful to anticipate • Multicenter trials have unique advantages in
this so build in additional sites and/or time for obtaining large numbers and increased gener-
recruitment and enrollment. alizability of results but have coordination
Communication is critical throughout the pro- challenges.
cess, for several reasons. It maintains the interest • Careful research design, including an epide-
of geographically separated investigators. It miologist/statistician, is critical.
improves the creation and conduct of the trial. • Centralized data storage and analysis can
Transparency with score cards regarding comple- improve consistency.
tion of trial milestones also is important to engage • Other orthopedic multicenter trials and their
ongoing enrollment. investigators are a valuable resource.
44  Conducting a Multicenter Trial: Learning from the JUPITER (Justifying Patellar Instability Treatment… 425

• Challenges are likely to arise during the IRB miologic approach. 3rd ed. Philadelphia: Lippincott
Williams & Wilkins; 2006.
process and conducting the study, so addi- 5. Jaquith BP, Parikh SN. Predictors of recurrent patellar
tional leeway should be included for possibly instability in children and adolescents after first-time
delays. dislocation. J Pediatr Orthop. 2017;37(7):484–90.
• Communication between investigators is 6. Koh JL, Stewart C.  Patellar instability. Orthop Clin
North Am. 2015;46(1):147–57.
critical. 7. Krych AJ, O’Malley MP, Johnson NR, Mohan R,
• Multicenter trials can build collegial relation- Hewett TE, Stuart MJ, et  al. Functional testing and
ships and further collaborations. return to sport following stabilization surgery for
recurrent lateral patellar instability in competi-
tive athletes. Knee Surg Sports Traumatol Arthrosc.
2018;26(3):711–8.
References 8. Sanders TL, Pareek A, Hewett TE, Stuart MJ, Dahm
DL, Krych AJ.  High rate of recurrent patellar dislo-
1. Askenberger M, Janarv PM, Finnbogason T, Arendt cation in skeletally immature patients: a long-term
EA.  Morphology and anatomic patellar instability population-based study. Knee Surg Sports Traumatol
risk factors in first-time traumatic lateral patellar dis- Arthrosc. 2018;26(4):1037–43.
locations: a prospective magnetic resonance imaging 9. Tompkins MA, Arendt EA. Patellar instability factors
study in skeletally immature children. Am J Sports in isolated medial patellofemoral ligament recon-
Med. 2017;45(1):50–8. structions—what does the literature tell us? A system-
2. Christensen TC, Sanders TL, Pareek A, Mohan atic review. Am J Sports Med. 2015;43(9):2318–27.
R, Dahm DL, Krych AJ.  Risk factors and time 10. Vavken P, Wimmer MD, Camathias C, Quidde J,

to recurrent ipsilateral and contralateral patel- Valderrabano V, Pagenstert G. Treating patella insta-
lar dislocations. Am J Sports Med. 2017;45(9): bility in skeletally immature patients. Arthroscopy.
2105–10. 2013;29(8):1410–22.
3. Chung KC, Song JW, Group WS. A guide to organiz- 11. Weber AE, Nathani A, Dines JS, Allen AA, Shubin-
ing a multicenter clinical trial. Plast Reconstr Surg. Stein BE, Arendt EA, et al. An algorithmic approach
2010;126(2):515–23. to the management of recurrent lateral patellar
4. Hulley SB, Cummings SR, Browner WS, Grady DG, dislocation. J Bone Joint Surg Am. 2016;98(5):
Newman TB.  Designing clinical research: an epide- 417–27.
How to Organise an International
Register in Compliance with the
45
European GDPR: Walking
in the Footsteps of the PAMI
Project (Paediatric ACL Monitoring
Initiative)

Daniel Theisen, Håvard Moksnes, Cyrille Hardy,


Lars Engebretsen, and Romain Seil

45.1 Introduction legal framework to ensure adequate protection of


personal data, the General Data Protection
Consideration of ethical, legal and regulatory Regulation 2016/679, termed hereafter GDPR.1
norms and standards for medical research involv- The GDPR applies to all organisations collecting,
ing human subjects has been emphasised for over processing and holding personal data of data sub-
50  years. More recently, the protection of per- jects residing in the EU, independently of whether
sonal data has taken centre stage in the light of the organisation is located within or outside of
rapid technological developments and globalisa- the EU.  The GDPR also applies to research
tion that have transformed human activities to an organisations and to the vast majority of their sci-
unprecedented scale. The facilitated cross-border entific activities. Researchers active in the field of
flow of personal data has prompted the European clinical orthopaedics need to be aware of how
Union (EU) to implement a strong and coherent this new EU regulation impacts the organisation
of their research projects.
D. Theisen (*) · C. Hardy The aim of this chapter is to highlight some
Sports Medicine Research Laboratory, Luxembourg important key points to be considered when
Institute of Health, Luxembourg, Luxembourg implementing clinical orthopaedic research
e-mail: daniel.theisen@lih.lu under the new European GDPR in general and
H. Moksnes setting up an international register in particular.
Oslo Sports Trauma Research Center, Norwegian To this end, this contribution is organised in two
School of Sport Sciences, Oslo, Norway
parts. The first illustrates GDPR requirements
L. Engebretsen most relevant for the context of clinical orthopae-
Oslo Sports Trauma Research Center, Norwegian
School of Sport Sciences, Oslo, Norway dic research. The second presents the organisa-
tional structure of PAMI, the ESSKA Paediatric
Division of Orthopedics, Oslo University Hospital,
University of Oslo, Oslo, Norway
Regulation (EU) 2016/679 of the European Parliament
1 
R. Seil
and of the Council of 27 April 2016 on the protection of
Sports Medicine Research Laboratory, Luxembourg
natural persons with regard to the processing of personal
Institute of Health, Luxembourg, Luxembourg
data and on the free movement of such data, and repealing
Department of Orthopaedic Surgery, Centre Directive 95/46/EC (General Data Protection Regulation).
Hospitalier Luxembourg, Luxembourg, Luxembourg Official Journal L 119, 4.5.2016, p. 1–88.

© ISAKOS 2019 427


V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_45
428 D. Theisen et al.

ACL Monitoring Initiative, as an illustrative the free movement of such data”.2 Discrepancies
example of an international register recently in the specific national laws of EU member
implemented in Luxembourg, Europe. The reader states resulted in a lack of harmonisation, which
is cautioned not to consider the information pro- is why the EU adopted the GDPR 2016/679 in
vided in this chapter to be exhaustive in any way April 2016. GDPR has come into force on May
and is strongly recommended to take legal coun- 25, 2018, thus replacing the previous directive.
sel before setting up his/her research project. Being a regulation, GDPR is a binding legisla-
tive act that must be applied in its entirety across
the EU without requiring national enabling
Fact Box 45.1: A New European Law on Data legislation.
Protection
The European Union (EU) General Data
Protection Regulation (GDPR) applies to 45.3 Some Important Definitions
all organisations that hold or process per-
sonal data of data subjects residing in the In its article 4, the GDPR specifies a number of
EU (whether they are EU citizens or not), definitions that the researcher in clinical ortho-
independently of the organisation’s loca- paedics should be familiar with. The most impor-
tion. It also applies to research organisa- tant ones are presented hereafter.
tions and concerns the vast majority of Personal data—any information, electronic
scientific activities. GDPR is a binding leg- or not, that relates to an identified or identifiable
islative act that must be applied in its natural person, i.e. a “data subject”. Examples
entirety across the EU without requiring include the person’s name, address, title and
national enabling legislation. It has come contact details but also a study identification
into force on May 25, 2018. Any processing number, IP address or a social security number.
under way before that date should be made In other words, compliance to the GDPR is not
compliant with the GDPR within a period required if no link whatsoever can be established
of 2 years. between the processed data and the person they
belong to. This is, however, very rarely the case.
All data pertaining to a person’s health (clinical
outcome questionnaire, treatment, biological
45.2 Background and Aim sample, biometrical measurement, etc.) is con-
of GDPR sidered sensitive. The processing of sensitive
data requires that the concerned data subject has
The general aim of the GDPR is to guarantee a provided explicit consent, as will be further
consistent and high protection of the rights and explained below.
freedoms of natural persons within the EU, Processing—any operation performed on per-
while at the same time facilitating the flows of sonal data, automated or not, such as collection,
personal data within the Union. These aspects recording, organisation, structuring, storage,
are of particular interest in international clinical adaptation or alteration, retrieval, consultation,
orthopaedic research involving human partici- use, disclosure by transmission, dissemination or
pants, where most of the collected data is related otherwise making available, alignment or combi-
to health and thus by their nature considered as nation, restriction, erasure or destruction.
“sensitive”.
Before May 2018, data protection laws in the
EU were implementations by its member states Directive 95/46/EC of the European Parliament and of
2 

the Council of 24 October 1995 on the protection of indi-


and members of EØS (Norway) of the Directive viduals with regard to the processing of personal data and
95/46/EC on “the protection of individuals with on the free movement of such data. Official Journal L 281,
regard to the processing of personal data and on 23/11/1995 P. 0031–0050.
45  How to Organise an International Register in Compliance with the European GDPR: Walking… 429

Pseudonymisation—the processing of per-


sonal data in such a manner that the personal data handling of personal data, in particular as
can no longer be attributed to a specific data sub- to the identification of the data controller
ject without the use of additional information, and data processor. The dataset that is sup-
provided that such additional information is kept posed to be collected entails sufficient
separately and is subject to technical and organ- detail so the patient’s identity could be
isational measures to ensure that the personal traced even if the data were pseud-
data are not attributed to an identified or identifi- onymised. Before joining the research
able natural person. project, Dr Frank decides to contact the
Data controller—the natural or legal person, data protection officer from his legal
public authority, agency or other body which, department to help him design a new set of
alone or jointly with others, determines the pur- patient documents that is compliant with
poses and means of the processing of personal the EU GDPR.
data. In general, the principal investigator bears
data controller responsibility.
Data processor—a natural or legal person, Both the data controller and the data processor
public authority, agency or other body which pro- are responsible to ensure compliance to the
cesses personal data on behalf of the controller. GDPR when processing personal data and must
Consent of the data subject—any freely given, be able to prove so. In particular, they must
specific, informed and unambiguous indication ensure that “appropriate technical and organisa-
of the data subject’s wishes by which he or she, tional measures” are implemented and can be
by a statement or by a clear affirmative action, held accountable by regulatory authorities. Non-
signifies agreement to the processing of personal compliance to the GDPR may expose both par-
data relating to him or her. ties to financial risks (administrative fine up to
4% of the corporate annual turnover or € 20 Mio),
in addition to business and corporate image
45.4 T
 o Be or Not to Be GDPR hazards.
Compliant Any data processing under way before the
date on which GDPR came into force should be
made compliant within a period of 2 years. If data
Clinical Vignette 1: Drs Frank and Stein Want processing in relation to a scientific study is
to Team Up in an International Research based on consent compliant with Directive 95/46/
Project EC, it is not necessary for the study participant to
Dr Frank is an orthopaedic surgeon and give his or her consent again, provided that the
researcher from Europe who has been manner in which the consent has been given com-
invited by his colleague Dr Stein from plies with the GDPR.
Canada to join an international, multicen- To be compliant to the GDPR within the
tre research project on meniscus injuries in framework of a scientific activity, a controller or
competitive high jumpers. The leadership processor must:
lies with the institution of Dr Stein who is
the principal investigator of the project. Dr –– Maintain a GDPR registry of his activities
Frank has received a copy of the study pro- –– Perform a data protection impact assessment
tocol including the patient information (DPIA) for each new activity that handles sen-
sheet and the patient consent form. sitive data
Reading through these documents, he –– Take organisational and technical measures to
realises that there is some ambiguity as to mitigate the remaining risks and to manage
where the responsibilities lie regarding the the data subjects’ rights (e.g. pseudonymisa-
tion, data minimisation, storage limitation,
430 D. Theisen et al.

protection against accidental loss, destruction


or damage, extent of data accessibility and Fact Box 45.2: Some Key Points of the GDPR
processing, etc.) for Scientific Research
–– Adapt their agreements with stakeholders, • Applicability of its jurisdiction to
subcontractors and data providers research institutions outside the European
–– Adapt their study participant information and Union
consent forms • Accountability of both the data control-
lers and the data processors
Additional principles need to be considered • Improved rights for study participants,
when setting up a scientific research project including the right to access their data,
involving human participants, although the fol- the right “to be forgotten”, the right to
lowing elements are in no way to be considered receive data breach notification and the
exhaustive.3 Study participants must be fully right to data portability
informed about the study purposes in an • Principles of “privacy by default” and
­intelligible and easily accessible way, using plain “privacy by design”
and clear language, in accordance with the ICH • Increased control and sanction regime
GCP.  The data controller and data processor
should be clearly identified in the patient infor-
mation form, and the data categories and legal
framework should be described. The purpose of 45.5 T
 he Paediatric ACL
personal data processing should be explained, Monitoring Initiative (PAMI)
although this is not always possible at the moment
of data collection. Therefore, study participants Instability and functional impairments following
should be given the opportunity to provide their anterior cruciate ligament (ACL) tears in skele-
consent to certain areas of research or parts of tally immature patients represent a serious prob-
research projects to the extent allowed by the lem, which has received increasing recognition
intended purpose. over the past years. There have been a rising
Explicit consent to participate in a study number of publications on the treatment of ACL
must be provided by the patient prior to partici- injuries in the skeletally immature population
pation for data processing to be lawful, and the over the past decade [1, 2, 4, 5, 11, 14, 15].
data controller must be able to demonstrate that Intrasubstance ACL ruptures are most worrisome
consent has been given. Participants must also due to the serious long-term health effects of
have the possibility to withdraw consent as eas- potential early-onset osteoarthritis [9].
ily and at any moment in time, a request that Furthermore, the open growth plates on both
must be met by the data controller. Other par- sides of the knee joint warrant particular caution
ticipant rights include the right of data access, before surgical interventions involving ACL
rectification or erasure (known as “the right to reconstructions [3, 8]. Treatment algorithms for
be forgotten”), the right to restriction of pro- ACL ruptures in skeletally immature children are
cessing and the right to data portability. The lat- different around the world [10], and the only con-
ter concerns the right of a data subject to receive sensus currently available is that the treatment of
his or her personal data provided to a controller these injuries is controversial [6, 7, 12].
in a structured and commonly readable format, To address this problem, the “Paediatric
so as to have the possibility to transmit this data Anterior Cruciate Ligament Monitoring
to another controller. Initiative” (PAMI) was recently initiated [10, 13].
The main purpose of this project is to collect and
analyse data from orthopaedic surgeons who are
The reader is kindly referred to the chapter 14 by
3  treating children and adolescents with ACL
Mrs C. Mouton et al. in this book. injury using a dedicated data collection system.
45  How to Organise an International Register in Compliance with the European GDPR: Walking… 431

The ultimate goals of the PAMI project are: authentication of the visited website as well as
confidentiality and integrity of the data exchanged
1. To describe current treatment options follow- through encryption. To maximise data security, a
ing a paediatric ACL injury two-factor authentication (2FA) solution has
2. To analyse the associated short-, medium- and been implemented using SMS (short message
long-term clinical outcome service), thus adding a second level of authenti-
3. To extend the evidence base on optimal treat- cation to a regular account login by project
ment choices participants.
4. To propose international treatment guidelines Finally, partner institutions/hospitals from dif-
ferent European countries providing treatment to
To reach these aims, an electronic data collec- the patient group of interest can engage in this
tion platform has been created, the PAMI data- project. They provide data on current surgical
base, to store systematic and standardised and non-surgical treatments, follow-up treat-
information relevant to the problematic of paedi- ments and clinical outcome using the dedicated
atric ACL injuries in different European coun- Web application and act as local data administra-
tries. Based on a long-term patient follow-up, this tors and data processors. Only pseudonymised
project will provide important insights into the patient information is uploaded into the PAMI
current outcomes of ACL injured knees in chil- database, to ensure maximal data protection and
dren and will allow to discriminate those patients avoid legal issues related to data transfer between
needing operative treatment from those who ben- different European countries. As a consequence,
efit most from a nonoperative treatment. each site coordinator manages a correspondence
Furthermore, large-scale objective outcome data table between patient ID information and a pri-
will provide the knowledge base necessary for a mary key generated by the system. Figure  45.1
first-time-ever proposal of international treat- below depicts the data flow of the upload of
ment guidelines. pseudonymised patient data.

45.6 PAMI Stakeholders 45.7 P


 atient Data Processing
in the PAMI Project
The PAMI project is initiated, promoted and
financially supported by the international
umbrella organisation ESSKA (European Society Clinical Vignette 2: A New Patient Joining
of Sports Traumatology, Knee Surgery and the PAMI Project
Arthroscopy—www.esska.org). Through a dedi-
cated steering committee, ESSKA acts as the Nancy is a 14-year-old handball player
overall project coordinator and thus bears data who tore the ACL of her left knee during
controller responsibility. an international tournament 6  weeks
Furthermore, the project is undertaken in col- ago. During the last medical visit that
laboration with the Sports Medicine Research she attended with her parents, she was
Laboratory of the Luxembourg Institute of Health recommended to undergo ACL recon-
(LIH—www.lih.lu) with extensive scientific and struction. The hospital where Nancy is
technical experience regarding the project objec- treated is a partner of the PAMI research
tives. The LIH is responsible for the develop- project, and her doctor explained the
ment, deployment, maintenance and security of importance of this research and what
the PAMI database and Web application and acts Nancy’s participation would entail.
as the main data processor. Data communications After reading the patient’s information
to and from the PAMI platform are managed via form and asking for some additional
a secured communication protocol allowing for
432 D. Theisen et al.

User authenticated on the platform

Patient creation and pseudonymisation

User requests the creation


of a new patient without User searches for a
nominative data patient by its ID

The system:
-Creates a new patient without nominative data,
-Assigns the patient a random ID
-Gives the user the generated ID linked to the patient

The site coordinator uploads


pseudonymized data to the PAMI
database (Questionnaires,
arthroscopy reports...)

The site coordinator updates his personal


correspondance table between IDs and
the patient nominative data. Data upload

Fig. 45.1  Upload dataflow within the PAMI project

Institutions/hospitals participating in the


explanations, her parents agreed to pro- PAMI project are responsible for patient recruit-
vide written informed consent for her ment, patient information and collection of writ-
participation in the research project. ten consent for participation. Patients involved in
Data regarding Nancy’s condition and the project will be the owners of their data.
the surgical procedures used will be Partner institutions will act as their patients’ data
uploaded into the PAMI database by the administrators. Every patient has the right to
local data processor. In addition, Nancy access their own data as per request to the partici-
will be asked to respond to two pating institution/hospital. A request can also be
­questionnaires on a yearly basis, one on directly addressed by the patient to the data con-
her knee function (International Knee troller (ESSKA). Both stakeholders (data con-
Documentation Committee Subjective troller and local data processor) are clearly
Knee Form in Children, Pedi-IKDC) identified on the patient information form.
and one on her habitual physical and Institutions/hospitals willing to participate in
sports activities (Paediatric Activity the PAMI project must designate a single site
Rating Scale). coordinator (natural person) acting on their
behalf within the PAMI project: this is the only
45  How to Organise an International Register in Compliance with the European GDPR: Walking… 433

local person with access rights to the PAMI data- Take-Home Message
base. Since only pseudonymised data are col- • A new General Data Protection Regulation of
lected within the PAMI project, each site the European Union has been enforced since
coordinator manages a correspondence table May 2018.
between patient ID information and a primary • Research organisations processing personal
key generated by the system. Site coordinators data of residents of the EU must comply with
are also responsible for the long-term follow-up this regulation.
of patients by sending them annual electronic • Researchers in clinical orthopaedics should
questionnaires on patient-reported clinical out- take into account these principles as part of
comes and physical activity/sport participation. risk management of their research projects.
Prior to participation in the PAMI project, • The present chapter presents the most impor-
institutions/hospitals have to seek for ethics tant aspects that need to be considered, as well
clearance to their local or national ethics com- as an illustrative example of an international
mittee when applicable, in accordance with register on paediatric ACL treatments.
their national laws and regulations. Formal • Researchers are strongly advised to take
proof of ethics clearance must be provided to legal counsel with a qualified data protection
the data controller and is a prerequisite for par- officer as soon as the project planning phase
ticipation in the PAMI project. Written consent starts.
will be sought by the participating institutions
from both legal tutors for patients under the age
of majority, which may be different in each References
country; when the patient reaches the age of
majority, he/she will be contacted and asked to 1. Ardern CL, et  al. 2018 International Olympic
Committee consensus statement on prevention, diag-
provide written consent himself/herself. It is the nosis and management of paediatric anterior cruciate
responsibility of each participating partner to ligament (ACL) injuries. Knee Surg Sports Traumatol
obtain written consent of their participants and Arthrosc. 2018;26:989–1010. https://doi.org/10.1007/
to archive all original hard copies. The PAMI s00167-018-4865-y.
2. Astur DC, Cachoeira CM, da Silva Vieira T, Debieux
steering committee, acting on behalf of the data P, Kaleka CC, Cohen M. Increased incidence of ante-
controller, reserves the right to perform on-site rior cruciate ligament revision surgery in paediatric
audits of participating institutions/hospitals. verses adult population. Knee Surg Sports Traumatol
The date of provided consent or, if applicable, Arthrosc. 2018;26:1362–6. https://doi.org/10.1007/
s00167-017-4727-z.
the date of consent withdrawal will be stored in 3. Caine D, DiFiori J, Maffulli N.  Physeal injuries in
the patient’s record within the PAMI database children’s and youth sports: reasons for concern? Br J
and be visible to the data controller. Every par- Sports Med. 2006;40:749–60. https://doi.org/10.1136/
ticipating institution will retain the right to bjsm.2005.017822.
4. Calvo R, et  al. Transphyseal anterior cruci-
access the pseudonymised data concerning their ate ligament reconstruction in patients with
patients. open physes: 10-year follow-up study. Am
Ideally, the PAMI database will enable very J Sports Med. 2015;43:289–94. https://doi.
long-term follow-up of (originally) skeletally org/10.1177/0363546514557939.
5. Dekker TJ, Godin JA, Dale KM, Garrett WE, Taylor
immature children and adolescents who have suf- DC, Riboh JC. Return to sport after pediatric anterior
fered an ACL injury. The foreseen timeframe of cruciate ligament reconstruction and its effect on sub-
follow-up will be 30 years for each patient. At the sequent anterior cruciate ligament injury. J Bone Joint
end of the 30-year follow-up, the data will be Surg Am. 2017;99:897–904. https://doi.org/10.2106/
JBJS.16.00758.
stored for an additional 10  years before being 6. Frosch KH, et  al. Outcomes and risks of operative
deleted. Data will be recorded for as long as the treatment of rupture of the anterior cruciate ligament in
patients are willing to comply with the annual children and adolescents. Arthroscopy. 2010;26:1539–
data collection. 50. https://doi.org/10.1016/j.arthro.2010.04.077.
434 D. Theisen et al.

7. Kaeding CC, Flanigan D, Donaldson C.  Surgical tion. Am J Sports Med. 2017;45:488–94. https://doi.
techniques and outcomes after anterior cruciate org/10.1177/0363546516638079.
ligament reconstruction in preadolescent patients. 12.
Reider B.  A matter of timing. Am J
Arthroscopy. 2010;26:1530–8. https://doi.org/10.1016/j. Sports Med. 2015;43:273–4. https://doi.
arthro.2010.04.065. org/10.1177/0363546515570023.
8. Kocher MS, Saxon HS, Hovis WD, Hawkins 13.
Seil R, Theisen D, Moksnes H, Engebretsen
RJ. Management and complications of anterior cruci- L. ESSKA partners and the IOC join forces to improve
ate ligament injuries in skeletally immature patients: children ACL treatment. Knee Surg Sports Traumatol
survey of the Herodicus Society and The ACL Study Arthrosc. 2018;26:983–4. https://doi.org/10.1007/
Group. J Pediatr Orthop. 2002;22:452–7. s00167-018-4887-5.
9. Lawrence JT, Argawal N, Ganley TJ.  Degeneration 14. Shaw L, Finch CF.  Trends in pediatric and ado-
of the knee joint in skeletally immature patients lescent anterior cruciate ligament injuries in
with a diagnosis of an anterior cruciate liga- Victoria, Australia 2005-2015. Int J Environ Res
ment tear: is there harm in delay of treatment? Public Health. 2017;14. https://doi.org/10.3390/
Am J Sports Med. 2011;39:2582–7. https://doi. ijerph14060599.
org/10.1177/0363546511420818. 15. Wall EJ, Ghattas PJ, Eismann EA, Myer GD,

10. Moksnes H, Engebretsen L, Seil R. The ESSKA pae- Carr P.  Outcomes and complications after all-
diatric anterior cruciate ligament monitoring initiative. epiphyseal anterior cruciate ligament reconstruc-
Knee Surg Sports Traumatol Arthrosc. 2016;24:680– tion in skeletally immature patients. Orthop J
7. https://doi.org/10.1007/s00167-015-3746-x. Sports Med. 2017;5:232596711769360. https://doi.
11. Pierce TP, Issa K, Festa A, Scillia AJ, McInerney org/10.1177/2325967117693604.
VK.  Pediatric anterior cruciate ligament reconstruc-
Part X
Helpful Further Information
Common Scales and Checklists
in Sports Medicine Research
46
Alberto Grassi, Luca Macchiarola, Marco Casali,
Ilaria Cucurnia, and Stefano Zaffagnini

46.1 Introduction an enormous amount of resources to reach ade-


quate enrollment. Conversely, if continuous mea-
Since the main purpose of clinical studies, espe- sures such as patient- or clinician-reported scales
cially randomized controlled trials (RCT), is to are chosen, a power analysis based on means and
report or compare the effect of different treat- standard deviations usually provides more feasi-
ments, the measurement methods of clinical out- ble sample size.
comes are crucial. Therefore, during the early However, care should be used during the
stage of study design, attention should be directed power analysis to ensure that it is based on a clin-
to choosing the appropriate outcomes and scales ical score which is able to detect a real difference
that evaluate a patient population. The calculation between the different treatments. In fact, building
of sample size of a RCT is primarily based on the a clinical study or RCT on outcomes which are
primary outcome being evaluated. When dichoto- not completely appropriate to the study purpose,
mous outcomes of rare events such as failures or patient population, and treatment administered
complications are used, extremely large sample could compromise the utility and subsequent
sizes are often required. This requirement may impact of the results. Therefore, the researcher
discourage the realization of the study or require should be very familiar with the main features of
clinical scores and should also know the main
A. Grassi characteristics of each scale in relation to the
IIa Clinica Ortopedica e Traumatologica, IRCCS pathology or treatment that is being investigated.
Istituto Ortopedico Rizzoli, Bologna, Italy
Another important aspect in outcome selec-
Dipartimento di Scienze Biomediche e Neuromotorie tion is the global assessment of the patient.
(DIBINEM), Università di Bologna, Bologna, Italy
Traditionally, clinical outcomes in orthopedics
SIGASCOT Arthroscopy Committee, Florence, Italy consisted of measuring impairments such as
e-mail: alberto.grassi@ior.it
range of motion, joint stability, strength, pain,
L. Macchiarola (*) · M. Casali · I. Cucurnia and joint function. At times, surgeons are margin-
IIa Clinica Ortopedica e Traumatologica, IRCCS
ally interested in patient’s global disability and
Istituto Ortopedico Rizzoli, Bologna, Italy
mental status; however, the patient’s perception
S. Zaffagnini
of changes in health status is the most important
IIa Clinica Ortopedica e Traumatologica, IRCCS
Istituto Ortopedico Rizzoli, Bologna, Italy indicator of the success of a treatment. Therefore,
there are two possibilities of measuring health-
Dipartimento di Scienze Biomediche e Neuromotorie
(DIBINEM), Università di Bologna, Bologna, Italy related quality of life in orthopedic and sports
e-mail: stefano.zaffagnini@unibo.it medicine conditions. The “generic measures”

© ISAKOS 2019 437


V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_46
438 A. Grassi et al.

pertain the overall health of the patient, including sions separated by a time interval under stable
physical, mental, and social well-being, and offer health conditions. It is considered when the raters
the advantage of being able to use them to com- are not involved or the raters’ effect is negligible.
pare different diseases, severity, and interven- It could be assessed with the intraclass correla-
tions. However, since they represent a generic tion (ICC) or the Cohen’s K statistics.
measure, their ability to detect small but impor- Intra-rater reliability: is defined and the
tant changes could be limited. On the other hand, agreement between two or more repeated score
the “disease-specific measures,” which pertain to evaluation performed by a single rater. Also in
a specific disorder treated in a patient, measure this case, intraclass correlation (ICC) or the
the physical, mental, and social aspects of health Cohen’s K statistics could be used.
affected by the specific disorder. Therefore, they Inter-rater reliability: is defined as the
are able to detect small but important chances but agreement between the scores obtained from two
have a limited value in comparison of health sta- or more raters’ assessment. It measures how
tus across different diseases. For the aforemen- much consensus or heterogeneity there is in the
tioned reasons, a complete picture of treatment rating given by judges. Similarly, intraclass cor-
effect on a patient could be provided only with the relation (ICC) or the Cohen’s K statistics is
assessment through a “disease-specific measure” employed.
in combination with a “generic measure.” Internal consistency: is defined as the corre-
lations between different items on the same test
and measures whether several items that propose
Fact Box 46.1 to measure the same general construct produce
Too often surgeons are poorly interested in similar scores. It is assessed using the Cronbach’s
patient’s global health self-perception and alpha statistics.
mental status; however, the patient’s per- Responsiveness to change: is defined as the
ception of changes in health status and dis- ability of an instrument to detect clinically impor-
ability is the main indicator of the success tant changes between the patient’s pre-interven-
of a treatment. tion and post-intervention state, assuming all
other factors remain constant.
Minimal detectable change (MDC): is
defined as the minimal change that falls outside
46.2 General Scale Characteristics the measurement error in the score of an instru-
ment used to measure a symptom.
The main characteristics and features of clinical Minimal clinically important difference
scales that should be known to choose the appro- (MCID): is defined as the minimal change in the
priate outcome measure are the following [54]. score that is meaningful for patients or that is
Construct validity: is defined as the ability of required for the patient to feel a difference in the
an instrument to measure what it is supposed to variable that is measured.
measure. It depends on how the items that make Standard error of measurement (SEM): It
up the scale include all relevant aspects of the measures the range within which a score would
pathology or disability that is measured. The con- likely fall in the case of re-measurement.
vergent validity indicates how the score could Standardized response mean (SRM): It
correlate with other scores that measure the same measures the responsiveness to change and is
construct. Meanwhile, predictive validity indi- defined as the mean change in score divided by
cates whether the score could predict a patient’s the standard deviation of the change scores.
score on a measure of some related construct. Floor effect: Floor effects occur when a mea-
Repeatability (Test/retest reliability): is sure’s lowest score is unable to assess a patient’s
defined as the agreement between the observa- level of ability. The test is considered poor if the
tions on the same patients on two or more occa- floor effect is >20%.
46  Common Scales and Checklists in Sports Medicine Research 439

Ceiling effect: Ceiling effects occur when a test. It is particularly useful when polyarticular
measure’s highest score is unable to assess a conditions should be evaluated or symptoms and
patient’s level of ability. This might be particu- function of the entire upper extremity are investi-
larly common for measures used over multiple gated. It is also useful in all elbow and hand con-
occasions. The test is considered poor if the ceil- ditions. However, the DASH is region specific
ing effect is >20%. and not joint specific; therefore specificity and
responsiveness are lower than those of unique
shoulder-specific tools [3].
46.3 Measures of Shoulder Conditions: Any or multiple disorders of
Function upper extremity, in particular painful conditions
including: rheumatoid arthritis, multiple sclero-
There are many instruments that measure symp- sis, adhesive capsulitis, shoulder impingement
tom and function of the shoulder and some that and tendinitis, proximal humerus fracture, distal
evaluate both the glenohumeral joint and the radius fractures, hand osteoarthritis or fractures,
whole upper limb. The most widespread and best arthroscopic acromioplasty.
tested is the disabilities of the arm, shoulder, and
hand questionnaire (DASH). Also, the shoulder
pain and disability index (SPADI), the Constant- 46.3.2 Shoulder Pain and Disability
Murley score (CMS), and the American shoulder Index (SPADI)
and elbow surgeons (ASES) questionnaire,
which are more specific for shoulder patholo- The SPADI is a patient-completed scale that
gies, are extensively employed. The simple includes 13 items regarding symptoms and pain,
shoulder test (SST), the shoulder disability ques- scored on a VAS/NRS scale [94]. It is one of the
tionnaire (SDQ), the Oxford shoulder score, and most representative shoulder instruments and has
the West Ontario shoulder instability index been tested in numerous settings; moreover, it is
(WOSI) complete the panorama of most com- easy to administer, understand, and complete. It
mon tools. has a good correlation with the DASH, ASES,
The basic psychometric characteristics, and CMS.  One possible weakness in construct
strengths, and weaknesses of the most common validity could be that only one item assesses
scales for shoulder function are described overhead work [3].
(Table 46.1). Conditions: Any disorder of the shoulder
joint, particularly adhesive capsulitis, rotator cuff
pathologies.
46.3.1 Disabilities of the Arm,
Shoulder, and Hand
Questionnaire (DASH) 46.3.3 American Shoulder and Elbow
Surgeons Society Shoulder
The DASH is a patient-completed scale which Assessment Form (ASES)
includes 30 items regarding symptoms, pain,
physical function, and social function [58]. The The ASES is a patient self-evaluation scale of 11
11-item QuickDASH short version is also avail- items evaluation pain and function, which is inte-
able [5]. The DASH is the best tool for the com- grated with a clinician-dependent part [92]. It has
prehensive assessment of upper extremity good reliability, construct validity, and respon-
conditions, since it is easy to apply, analyze, and siveness. However, it uses different type of scales
interpret; moreover it is good for research pur- (binary, Likert, VAS), and the clinician part could
poses in various upper extremity conditions and be time-consuming. It has been developed to be
has a good correlation with SPADI, HAQ, CMS, applied to all shoulder patients regardless of the
ASES, and EQ-5D with Pearson’s or Spearman’s diagnosis, since it evaluates also activities of
440

Table 46.1  Measures of shoulder function


Time Time
compilation calculation Internal
Questionnaire Items Options Administration Range Cutoffs (min) (min) consistency Test-retest SRM MDC MCID
DASHa 30 Likert (5) Patient 0 (best) to Yes 4 10 0.92–0.98 0.93–0.98 0.43–1.2 7.9–14.8 10.2
100 (worst)
SPADIa 13 VAS/NRS Patient 0 (worst) to No 5 2 0.86–0.96 0.84–0.95 1.23–1.81 13.2–21.5 13.2–23.1
10 (best)
ASESa 11 Mix Patient 0 (worst) to No 3 >8 0.61–0.96 0.84–0.9 1.42–1.81 11.2 6.4–16.9
100 (best)
CMSa 8 Likert (3–10) Patient + clinician 0 (worst) to Yes 5 NA 0.60 0.80–0.96 0.59–2.09 NR NR
100 (best)
SSTa 12 Dichotomous Patient 0 (worst) to No 2 5 0.85 0.97–0.99 0.63–1.94 NR 2.05–2.33
12 (best)
OSSa 12 Likert (5) Patient 12 (best) to No 3 5 0.94 0.98 1.10–1.14 NR NR
60 (worst)
UCLAb 5 Likert (2–6) Patient + clinician 0 (worst) to Yes 5 5 0.93–0.95 NR 0.15–0.90 NR NR
35 (best)
WOSIa 21 VAS/NRS Patient 0 (best) to No 3 6 0.88–0.96 0.87–0.98 0.93–1.40 NR NR
100 (worst)
WOOSc 19 VAS Patient 0 (worst) to No 10 5 NR 0.96 1.91 NR NR
100 (best)
WORCd 20 VAS Patient 0 (worst) to No 10 5 NR 0.96 NR NR 11.7
100 (best)
OSIQe 12 Likert (5) Patient 0 (worst) to Yes 5 5 0.88 0.87 NR 9 NR
48 (best)
Note: DASH (disabilities of the arm, shoulder, and hand questionnaire), SPADI (shoulder pain and disability index), ASES (American shoulder and elbow surgeons society
shoulder assessment form), CMS (Constant-Murley score), SST (simple shoulder test), OSS (Oxford shoulder score), UCLA (University of California at Los Angeles shoulder
score), WOSI (Western Ontario shoulder instability index), WOOS (Western Ontario osteoarthritis of the shoulder index), WORC (Western Ontario rotator cuff index), OSIQ
(Oxford shoulder instability questionnaire)
a
Angst F, Schwyzer HK, Aeschlimann A, Simmen BR, Goldhahn J. Measures of adult shoulder function. Arthritis Care Res (Hoboken). 2011 Nov;63 Suppl 11:S174–88
b
Amstutz HC, Sew Hoy AL, Clarke IC. UCLA anatomic total shoulder arthroplasty. Clin Orthop Relat Res. 1981 Mar-Apr;(155):7–20
c
Lo IK, Griffin S, Kirkley A. The development of a disease-specific quality of life measurement tool for osteoarthritis of the shoulder: The Western Ontario Osteoarthritis of the
Shoulder (WOOS) index. Osteoarthritis Cartilage. 2001 Nov;9(8):771–8
d
Kirkley A, Griffin S, Dainty K. Scoring systems for the functional assessment of the shoulder. Arthroscopy. 2003 Dec;19(10):1109–20
e
van der Linde JA, van Kampen DA, van Beers LW, van Deurzen DF, Terwee CB, Willems WJ. The Oxford Shoulder Instability Score; validation in Dutch and first-time assess-
ment of its smallest detectable change. J Orthop Surg Res. 2015 Sep 17;10:146
A. Grassi et al.
46  Common Scales and Checklists in Sports Medicine Research 441

daily living. It has a good correlation with both 46.3.6 Oxford Shoulder Score (OSS)
SPADI and DASH questionnaires [3].
Conditions: Any disorder of the shoulder The OSS is a patient-reported scale of 12 items
joint, particularly rotator cuff disease, shoulder evaluating pain and daily function [32, 35]. It pro-
impingement, shoulder arthritis, calcific vides a self-assessment of shoulder pain and func-
tendonitis. tion. It is short and easy to complete but not
frequently used in the current literature. Correlation
with SPADI, DASH, and CMS is good [3].
46.3.4 Constant-Murley Score (CMS) Conditions: Degenerative and inflammatory
shoulder conditions, subacromial impingement,
The CMS is both patient- and clinician-reported rotator cuff, osteoarthritis, and proximal humerus
score which includes eight items regarding pain, fractures.
ADLs, mobility, and strength. It is a method to
record individual parameters, providing an
overall clinical functional assessment, irrespec- 46.3.7 UCLA Shoulder Score
tive of diagnosis or radiographic abnormalities
[22, 23]. Based in the difference with the abnor- The UCLA (University of California at Los
mal side, the indexed shoulder could be graded Angeles) shoulder score is both a five-item
as excellent (<11), good (11–20), fair (21–30), patient- and clinician-reported scale which evalu-
or poor (<30). Despite the CMS is highly ates pain, function, ROM, strength, and patient’s
accepted throughout the clinical community, satisfaction [2]. Despite being one of the earliest
there are several limitations to its use due to the available shoulder outcome measures, it has not
low inter-tester reliability, non-standardized formally been validated. It is simple and fast but
measurement of strength, and only few items requires physician manual evaluation; for this
evaluating pain and ADL. It is useful for mea- reason, it could result in a poor validity or respon-
surement protocols but does not provide an ade- siveness, which does not make it ideal for
quate self-assessment of patient pain and research setting. The UCLA has a good correla-
function. It has a good correlation with ASES, tion with the DASH, SPADI, and SF-36 and
DASH, and SPADI [3]. could be dichotomized as good/excellent (>27)
Conditions: Mainly rotator cuff-related disor- or fair/poor (<27) [61].
ders, impingement, degenerative or inflammatory Conditions: Common shoulder pathologies.
pathologies, instability, osteoarthritis.

46.3.8 Western Ontario Shoulder


46.3.5 Simple Shoulder Test (SST) Instability Index (WOSI)

The SST is a patient-reported score which The WOSI is a 21-item patient-reported scale that
includes 12 dichotomous (yes/no) items regard- evaluates physical symptoms, pain, sport, work,
ing pain, strength, and range of motion [71]. It lifestyle, and emotions related to shoulder insta-
assesses the functional disability of the shoulder bility [62, 63]. It has been developed to assess
in a very simple and short manner; however, due disease-specific quality of life patients with symp-
to the binary response option, its use as a compre- tomatic shoulder instability. It has the advantage
hensive measure of outcomes could be ques- of being specific for this condition, but due to lack
tioned. It has a good correlation with the SPADI, of testing data, caution is necessary at individual
ASES, DASH, and CMS scores [3]. patient level. It has a good correlation with the
Conditions: General shoulder injuries and VAS for function and the DASH score [3].
rotator cuff pathology. Conditions: Shoulder instability.
442 A. Grassi et al.

46.3.9 Western Ontario excellent (40–48), good (30–39), fair (20–29),


Osteoarthritis of the Shoulder or poor (0–19) [108].
Index (WOOS) Conditions: Surgery or physiotherapy for
shoulder instability. Shoulder instability.
The WOOS is a 19-item patient-reported ques-
tionnaire that evaluates the area of pain, physi-
cal symptoms, sports and work, lifestyle 46.4 Measures of Elbow, Wrist,
function, and emotional function [72]. Its form and Hand Function
as 100-mm VAS makes it an easy, fast, and reli-
able questionnaire; however, it is specific for Elbow, wrist, and hand function represent a com-
degenerative pathologies, especially osteoar- plex dimension to evaluate. Especially for elbow,
thritis. Its multiple domains regarding both physical examination and objective evaluation of
function and psychologic aspects make the ROM and stiffness are important characteristics
WOOS a versatile and complete scale. In fact, it to assess the joint function, patient’s satisfaction,
contains many items rarely investigated by other and normal or pathologic conditions. Therefore,
shoulder questionnaires. It has a moderate cor- clinical scores often require a clinician-reported
relation with the Constant-Murley and UCLA items that increase the precision of the evalua-
scores [61]. tion, but on the other side, they reduce the reli-
Conditions: Osteoarthritis of the shoulder. ability and make them time-consuming as well.
The basic psychometric characteristics,
strength, and weakness of the most common
46.3.10  W
 estern Ontario Rotator scales for elbow, wrist, and hand function are
Cuff Index (WORC) described (Table 46.2).

The WORC is a 20-item patient-reported score


that evaluates symptoms, sport, work, emotion, 46.4.1 Mayo Elbow Performance
and social function [60]. It is easy and rapid to Score (MEPS)
administer, since it is composed of 100-mm VAS
items, but it is disease-specific since it has been The MEPS is both a patient- and clinician-reported
developed to evaluate rotator cuff-related quality score. It includes four Likert-scale items evaluat-
of life. ing mostly pain and motion, stability, and function
It has a good correlation with the ASES and [82]. It is correlated with other elbow measures for
UCLA scores [61]. raw scores rather than categorical ranks and
Conditions: Rotator cuff pathology treated requires clinician objective evaluation of the
surgically and conservatively. patient, which could lengthen its application [16].
Conditions: General elbow disorders, rheu-
matoid arthritis, synovectomy.
46.3.11  O
 xford Shoulder Instability
Questionnaire (OSIQ)
46.4.2 Oxford Elbow Score (OES)
The OSIQ is a 12-item patient-reported ques-
tionnaire that explores the impact of shoulder The OES is a 12-item patient-reported score. It
instability on work, sport and social life, its includes 12 Likert-scale items evaluating elbow
psychological repercussion, the quality of life, function, pain, and the psychological aspects
and the pain [33, 35]. It is specifically designed [30]. It is easy and simple to administer to
for glenohumeral dislocation and shoulder patients; however, it lacks objective evaluation of
instability. However, it has a good correlation clinical outcomes. It has a good correlation with
with both DASH and WOSI.  Based on the DASH, Mayo elbow score, and SF-36 [73].
obtained value, function could be graded as Conditions: General elbow disorders.
Table 46.2  Measures of elbow, wrist, hand function
Time compilation Time calculation Internal Test-
Questionnaire Items Options Administration Range Cutoffs (min) (min) consistency retest SRM MDC MCID
MEPSa 4 Likert Patient + clinician 0 (worst) to 100 No 5 3 NR 0.89 NR 11.3 15
(3–4) (best)
OESb 12 Likert (5) Patient 0 (worst) to 100 No 3 3 0.8–0.90 0.87 0.46–0.69 27.6 NR
(best)
ASESc 57 Mix Patient + clinician 0 (worst) to 100 No 15 10 0.68–0.82 0.95 NR NR NR
(best)
PRTEEd 10 NRS Patient 0 (best) tos 100 No 3 3 NR 0.87 2.01 NR NR
(worst)
MWSe 4 Likert Patient + clinician 0 (worst) to 100 No 5 5 NR NR NR NR NR
(3–4) (best)
MHQf 37 Likert (5) Patient 0 (worst) to 100 No 15 20 0.75–0.94 0.95 0.47–1.61 NR 3–13
(best)
46  Common Scales and Checklists in Sports Medicine Research

FIHOAf 10 Likert (4) Patient 0 (best) to 30 No 3 3 0.85–0.90 0.95 0.58–0.87 NR NR


(worst)
Note: MEPS (Mayo elbow performance score), OES (Oxford elbow score), ASES (American shoulder and elbow surgeons society shoulder assessment form), PRTEE
(patient-rated tennis elbow evaluation), MWS (Mayo wrist score), MHQ (Michigan hand outcome questionnaire), FIHOA (functional index for hand osteoarthritis)
a
Celik D. Psychometric properties of the Mayo Elbow Performance Score. Rheumatol Int. 2015 Jun;35(6):1015–20
b
Dawson J, Doll H, Boller I, Fitzpatrick R, Little C, Rees J, Jenkinson C, Carr AJ. The development and validation of a patient-reported questionnaire to assess outcomes of elbow
surgery. J Bone Joint Surg Br. 2008 Apr;90(4):466–73
c
MacDermid JC. Outcome evaluation in patients with elbow pathology: issues in instrument development and evaluation. J Hand Ther. 2001 Apr-Jun;14(2):105–14
d
Rompe JD, Overend TJ, MacDermid JC. Validation of the Patient-rated Tennis Elbow Evaluation Questionnaire. J Hand Ther. 2007 Jan-Mar;20(1):3–10
e
Amadio PC, Berquist TH, Smith DK, Ilstrup DM, Cooney WP 3rd, Linscheid RL.Scaphoid malunion. J Hand Surg Am. 1989 Jul;14(4):679–87
f
Poole JL. Measures of hand function. Arthritis Care Res (Hoboken). 2011 Nov;63 Suppl
443
444 A. Grassi et al.

46.4.3 American Shoulder and Elbow satisfaction [19]. It appropriately measures hand


Surgeons Society Shoulder function in various conditions; however, its appli-
Assessment Form (ASES) cation could be time-consuming [28].
Conditions: Hand and wrist injuries, includ-
The ASES elbow outcome score is both a patient- ing osteoarthritis.
and clinician-reported questionnaire that evalu-
ates elbow pain, function, and satisfaction
through 19 items and motion, stability, strength, 46.4.7 Functional Index for Hand
and physical findings through 38 items [92]. This Osteoarthritis (FIHOA)
score represents a complete evaluation but
requires substantial time to be complete, and pain The FIHOA is a patient-reported scale composed
has the highest influence in the overall score [80]. of ten items including questions about using
Conditions: General elbow disorders. keys, cutting, lifting, buttoning, and writing,
aimed to measure hand function in patients with
hand osteoarthritis. It has a good correlation with
46.4.4 Patient-Rated Tennis Elbow the MHOQ [36].
Evaluation (PRTEE) Conditions: Hand osteoarthritis.

The PRTEE is a 15-item patient-reported score that


evaluates forearm pain and disability in patients 46.5 Measures of Hip Function
with lateral epicondylitis. It presents two subscales:
pain and function. It is easy to complete and fast to The assessment of outcomes in hip surgery is
be administered and had good correlation with NRS focused on patient satisfaction and the quality
for pain during wrist extension and with the DASH; of life achieved, level of pain, range of motion
however, it is specific one condition [75, 97]. (ROM), comorbidities, and the use of walking
Conditions: Lateral epicondylitis of the elbow. aids. A variety of quality of life evaluation tools
have been developed that differ in their mea-
surement techniques and in the number of
46.4.5 Mayo Wrist Score (MWS) domains they assess. These scores are useful
not only for the normal clinical evaluation in
The MWS is both a patient- and clinician- old patients and in hip congenital disease but
reported score, which includes five Likert-scale also to assess the outcomes after joint-preserv-
items evaluating pain, function, ROM, and grip ing surgery. The ideal hip outcome measure
strength. It’s basic but involves objective evalua- should be one that is specific for the hip joint,
tion of wrist mobility and strength which could possesses a generic component, and is clear and
limit its usability [25]. Moreover, its reliability concise. Previous outcome tools were modifica-
and consistency characteristics have not been tions of preexisting tools that evaluate chronic
deeply investigated. It could be graded as excel- conditions such as osteoarthritis. Outcome
lent (90–100), good (80–90), satisfactory (60– measures most frequently used in clinical prac-
80), and poor (<60) [28]. tice are the Harris hip score, the hip disability
Conditions: Originally developed for scaph- and osteoarthritis outcome score, the Oxford
oid nonunion; could be used for wrist fractures hip score, and the Lequesne index of severity
and arthritis. for osteoarthritis of the hip. More specific
scores for sport-related hip injuries were
designed in the last years such as non-arthritic
46.4.6 Michigan Hand Outcome hip score and international hip outcome
Questionnaire (MHQ) tool-33.
The basic psychometric characteristics,
The MHQ is a 37-item patient-reported scale that strength, and weakness of the most common
evaluates hand function, appearance, pain, and scales for hip function are described (Table 46.3).
Table 46.3  Measures of hip function
Time Time
compilation calculation Internal
Questionnaire Items Options Administration Range Cutoffs (min) (min) consistency Test-retest SRM MDC MCID
HHSa 10 Likert Clinician 0 (worst) to 100 Yes 5 10 NR 0.93–0.98 2.52–2.73 NR NR
(2–15) (best)
HOOSb 40 Likert (5) Patient 0 (worst) to 100 No 10–15 2–3 0.82–0.98 0.75–0.97 1.29–3.24 9.6–16.2 NR
best)
OHSc 12 Likert (5) Patient 0 (worst) to 48 Yes 2–8 5 0.84–0.93 0.84–0.93 1.12 6.11 3–5
(best)
LISOHa 11 Likert Patient + clinician 0 (best) to 24 Yes 2–5 5 0.83–0.84 0.94 NR NR NR
(2–7) (worst)
NAHSc 20 Likert (5) Patient 0 (worst) to 100 No 8–10 5 0.69–0.92 0.87–0.95 NR 10.4–12.4 NR
(best)
iHOT33d,e 33 VAS Patient 0 (worst) to No 15 6 0.96–0.99 0.87–0.96 1.7 16.0 6.1
100(best)
HAGOSf 37 Likert (5) Patient 0 (worst) to No 10 10 0.37–0.73 0.82–0.92 NR 02-mag NR
46  Common Scales and Checklists in Sports Medicine Research

100(best)
Note: HHS (Harris hip score), HOOS (hip disability and osteoarthritis outcome score), OHS (Oxford hip score), LISOH (Lequesne in severity for osteoarthritis of the hip), NAHS
(non-arthritic hip score), iHOT33 (international hip outcome tool-33), HAGOS (the Copenhagen hip and groin outcome score)
a
Nilsdotter A, Bremander A. Measures of Hip Function and Symptoms. Arthritis Care & Research Vol. 63, No. S11, November 2011, pp. S200–S208
b
Martinelli N, Longo UG, et al. Cross-cultural adaptation and validation with reliability, validity, and responsiveness of the Italian version of the Oxford Hip Score in patients
with hip osteoarthritis. Qual Life Res (2011) 20:923–929
c
Ramisetty N, Kwon Y and Mohtadi N. Patient-reported outcome measures for hip preservation surgery. A systematic review of the literature. Journal of Hip Preservation Surgery
Vol. 2, No. 1, pp. 15–27
d
Mohtadi NG, Griffin DR, Pedersen ME et al. The development and validation of a self-administered quality-of-life outcome measure for young, active patients with symptom-
atic hip disease: the international hip outcome tool (iHOT-33). Arthroscopy 2012; 28: 595; 605
e
Kemp JL, Collins NJ, Roos EM et al. Psychometric properties of patient-reported outcome measures for hip arthroscopic surgery. Am J Sports Med 2013; 41: 2065–73
f
Thorborg K, Holmich P, Christensen A, Petersen F, Roos EM. The Copenhagen Hip and Groin Outcome Score (HAGOS): development and validation according to the COSMIN
checklist. Br J Sports Med. 2011;45:478–491
445
446 A. Grassi et al.

46.5.1 Harris Hip Score (HHS) replacement. The HOOS is an extension of the
WOMAC and is suggested to be valuable for
The HHS is a clinician-based outcome score younger and more active people due to the added
which includes ten items that evaluates pain, subscales. It is suitable for use in research as a
function, absence of deformity, and range of disease-specific questionnaire. It has a good cor-
motion [44]. There are two versions of the score: relation with Oxford hip score, the Lequesne
the original one, published in 1969, and the mod- index, and the VAS for pain. Based on its score, it
ified HHS (MHHS). The latter only includes could be graded as excellent (>41), good (­ 34–41),
pain and function components and has been fair (27–33), or poor (<27) [86].
widely used to evaluate outcomes in hip arthros- Conditions: Osteoarthritis, general hip
copy surgery. The HHS is widely used through- disorders.
out the world for evaluating outcome after THR,
and it has also been proven appropriate to mea-
sure outcome after surgical interventions for 46.5.3 Oxford Hip Score (OHS)
femoral neck fractures. It seems to be useful for
short-time follow-up; moreover there are unac- The OHS is a 12-item patient-reported outcome
ceptable ceiling effects that severely limit its score regarding pain and function of the hip in
validity. The HHS has been used in many differ- relation to daily activities such as walking, dress-
ent countries (Sweden, the Netherlands, ing, and sleeping. It is designed for the assess-
Denmark, etc.), but there are no validated ver- ment of joint replacement and has been used in
sions in other languages available. It has a good several countries in large registry studies [31,
correlation with WOMAC, NHP, NAHS, and 83]. The OHS was developed to supplement other
SF-36 for pain and function domain. Based on generic outcome measures in systematic studies
the obtained score, it could be graded as excel- of hip replacement surgery with long-time fol-
lent (90–100), good (80–90), fair (70–80), or low-up. It has also been validated and used in
poor (<70) [86]. revision hip replacement. Due to its shortness,
Conditions: Femoral neck fractures, osteoar- the OHS questionnaire is feasible for surveys by
thritis of hip, impingement. mail, and it yields a high response rate and is
therefore preferred for larger studies. High
correlation was found between OHS and the
­
46.5.2 Hip Disability HHS in THR patients [86].
and Osteoarthritis Outcome Conditions: Osteoarthritis of hip.
Score (HOOS)

The HOOS is a patient-reported score composed 46.5.4 Lequesne in Severity


of 40 items that evaluates pain, other symptoms, for Osteoarthritis of the Hip
function in activities of daily living, function in (LISOH)
sport and recreation, and hip-related quality of
life. It has been validated in two slightly different The LISOH is an interview-based or reported
versions, LK1.1 and LK2.0 [65, 85]. In 2008, a score which includes 11 items evaluating pain,
five-item measure of physical function, the maximum distance walked, and activities of daily
HOOS-PS, was published derived from the living [68–70]. The LISOH is available currently
HOOS questionnaire by item response theory to in several versions: interview based, self-admin-
elicit patients’ opinions about difficulties experi- istered, and in modified versions due to changed
enced due to hip problems. The HOOS has been scoring and wording. The LISOH was developed
used in subjects with hip disability with or with- to evaluate the severity of hip osteoarthritis in
out hip osteoarthritis and in patients with hip drug trials and the long-term treatment effects for
osteoarthritis pre- and postoperative total hip hip OA and as help in decision-making regarding
46  Common Scales and Checklists in Sports Medicine Research 447

the need for hip replacement. It has limited con- outcome) [81]. The iHOT-33 was developed
struct validity; also the convergent validity of the with the cooperation of the multicenter arthros-
questionnaire has been questioned. copy of the hip outcomes research network
Recommendations are to only use the LISOH for (MAHORN). It has a short version: the iHOT12
group comparisons. Based on its score, the hand- that includes 12 items instead of the original 33,
icap derived from hip osteoarthritis could be designed to be more easily used in clinical set-
graded as extremely severe (>14), very severe tings and validated and tested for reliability. The
(11–13), severe (10–8), moderate (5–7), mild appropriate population for this tools includes
(1–4), or none (0) [86]. patients aged between 18 and 60 years who have
Conditions: Osteoarthritis, the effectiveness a Tegner activity level of 4 or higher, meaning
of pharmacologic interventions, and to help with that they are engaged in recreational physical
indications for surgery like THA. activities at least once a week or have an occupa-
tion involving moderately heavy labor. There
were no floor or ceiling effects noted for iHOT-
46.5.5 Non-arthritic Hip 33 in their original paper. In the end the construct
Score (NAHS) validity was demonstrated with a correlation of
0.81 to the NAHS [57].
The non-arthritic hip score (NAHS) consists of Conditions: Femoroacetabular impingement,
20 items distributed in four domains of pain, hip arthroscopic surgery for intra-articular hip
mechanical symptoms, functional symptoms, lesion.
and activity level. It was developed for young
active patients with higher demands and expecta-
tions. This is a patient-based, self-administered 46.5.7 The Copenhagen Hip
questionnaire that was developed as a modifica- and Groin Outcome Score
tion of the Western Ontario and McMaster (HAGOS)
Universities Osteoarthritis Index (WOMAC).
Input from patients, surgeons, physical thera- The HAGOS is a patient-reported outcome ques-
pists, and epidemiologists was used in creating tionnaire; it consists of 37 items distributed in six
NAHS scoring system. The NAHS has satisfac- subscales of pain, symptoms, physical function
tory internal consistency in each of its four in activities of daily living, physical function in
domains. But there is no further evidence about sports and recreation, participation in physical
internal consistency from head-to-head compari- activities, and hip- and/or groin-related quality of
son studies with other outcome measures. Hence, life [107].
the summation score for internal consistency for The Copenhagen hip and groin outcome score
NAHS is good. The summation score for test- was developed in 2011, and this was the first out-
retest reliability is excellent. Construct validity is come measure developed with the COSMIN
satisfactory between the NAHS and the Harris checklist guidelines. The goal of this instrument
hip score (HHS) and short form (SF)-12, respec- is to evaluate hip and/or groin disability related to
tively [18]. impairment (body structure and function), activ-
Conditions: All non-arthritic hip conditions. ity (activity limitations), and participation (par-
ticipation restrictions) according to the
international classification of functioning, dis-
46.5.6 International Hip Outcome ability, and health (ICF) in young to middle-aged
Tool-33 (iHOT-33) physically active patients with hip and/or groin
pain. The HAGOS has excellent test-retest reli-
These 33 questions were formulated into a self- ability properties; this was evident from ICC
administered questionnaire using a visual analog ranging from 0.82 to 0.92 for all its subscales
scale response format from 0 to 100 (worst–best from their original paper, and it has also excellent
448 A. Grassi et al.

internal consistency properties and good content Conditions: Knee ligament injury and sur-
validity. Floor or ceiling effects were noted in gery, meniscal tears, cartilage lesions, knee
some subscales of HAGOS as described in their dislocation.
original paper, while there were no floor effects
for HAGOS ceiling effects that were noted in
HAGOS ADL (32%) and physical activity (28%) 46.6.2 International Knee
subscales between 12 and 24  months after sur- Documentation Committee
gery. In the end construct validity is satisfactory Objective Score (Objective
between the HAGOS and the SF-36 [57, 91]. IKDC)
Conditions: Young to middle-aged patients
with long-standing hip and/or groin pain. The objective IKDC form is a clinician-reported
score that evaluates all the aspects of knee find-
ings. Twenty-five items, grouped in seven sub-
46.6 Measures of Knee Function groups evaluating effusion, passive motion deficit,
ligament examination, crepitus, harvest site
The knee is one of the most investigated joints in pathology, X-ray findings, and functional evalua-
Orthopaedics and Sports Medicine; therefore many tion with one-leg hop test, compose it. Every item
outcome measures exist for clinical or research use. is rated in a four-grade Likert scale from A to D or
The most relevant are those evaluating pain, func- normal to severely abnormal, with respect to the
tion, quality of life, and activity. However, the contralateral healthy knee. The overall score is
­clinician-reported scales that have been described determined by the lowest value of considering
allow to register objective characteristics of the only the first three subgroups (swelling, passive
joint such as deformity, ROM, and stability. ROM, and ligament stability). However, all the
The basic psychometric characteristics, items should be compiled even if they did not con-
strength, and weakness of the most common tribute to the overall score. This form is frequently
scales for knee function are described (Table 46.4). used when evaluating ligamentous surgery and
allows the comparison of different groups of treat-
ment in a reliable m ­ anner. However, to increase
46.6.1 International Knee its precision, it requires instrument-assisted evalu-
Documentation Committee ation of knee stability. Moreover, in the case of
Subjective Score bilateral pathology or contralateral previous
(Subjective IKDC) injury, the score could not be used since it implies
the comparison to a healthy contralateral limb.
The subjective IKDC form is an 18-item patient- Usually, the grade C and D are considered as fail-
reported score that examines knee symptoms, ure of the treatment [1].
sport participation, and daily activities [53]. It Conditions: Especially knee ligament injury
was developed in 1994 and revised to its current and surgery, but also other knee conditions inves-
form in 2001. Its strength is the comprehensive tigated by subjective IKDC form such as menis-
assessment of the patient status and, above all, cal tears, cartilage lesions, knee dislocation.
responsiveness to change following surgical
interventions. Its limited administrative and
respondent burden makes it ideal in both clinical 46.6.3 Knee Injury and Osteoarthritis
and research settings. Moreover, it has a good Outcome (KOOS)
correlation with the Cincinnati knee rating sys-
tem, VAS for pain, WOMAC, Lysholm, and The KOOS is a 42-item patient-reported scale,
SF-36. On the other hand, it lacks in the psycho- which includes five domains, each one scored
metric testing, which makes it suboptimal for the separately: pain, symptoms, activity of daily life
evaluation of osteoarthritic patients [21]. (ADL), sports and recreational activities, and
Table 46.4  Measures of knee function
Time Time
compilation calculation Internal
Questionnaire Items Options Administration Range Cutoffs (min) (min) consistency Test-retest SRM MDC MCID
Subjective 18 Likert Patient 0 (worst) to 100 (best) No 10 5 0.92–0.97 0.87–0.89 0.94–1.5 6.7 11.5
IKDCa (5–10)
Objective 25 Likert (4) Clinician A (best) to D (worst) Yes 15 3 NR NR NR NR NR
IKDCa
KOOSa 25 Likert (5) Patient 0 (worst) to 100 (best) No 10 5 0.25–0.90 0.0–0.97 0.61–2.12 5–21 NR
Lysholma 8 Likert Patient 0 (worst) to 100 (best) Yes 5 3 0.65–0.73 0.88–0.97 0.90–1.10 8.9–10.1 NR
(3–6)
OKSa 12 Likert (5) Patient 0 (best) to 48 (worst) No 5 5 0.97–0.93 0.91–0.94 0.7 6.1 NR
CKRSb 13 Mix Patient + clinician 0 (worst) to 100 (best) Yes 20 10 NR 0.75–0.98 0.72–2.48 NR NR
WOMACa 24 Likert (5) Patient 0 (best) to 20 pain, 8 No 10 5 0.67–0.98 0.65–0.98 0.40–2.02 10.6–30.6 14–33
stiff, 68 funct (worst)
WOMAC 24 VAS/ Patient 0 (best) to 500 pain, No 5 3 NR NR NR NR NR
VASa NRS 200 stiff, 1700 funct
(worst)
KSSc 7 Likert Patient + clinician 0 (worst) to 100 (best) Yes 10 5 0.74–0.94 0.65–0.88 NR NR NR
(2–25)
HSSd 13 Mix Patient + clinician 0 (worst) to 100 (best) Yes 10 5 0.70 0.98–0.99 NR NR NR
46  Common Scales and Checklists in Sports Medicine Research

AKPSe 13 Likert Patient 0 (worst) to 100 (best) No 5 5 NR 0.81 NR NR 8–10


(3–5)
VISA-Pf 8 Mix Patient 0 (worst) to 100 (best) No 5 5 0.71–0.73 0.74 NR NR 14
Note: Subjective IKDC (international knee documentation committee subjective score), Objective IKDC (international knee documentation committee objective score), KOOS
(knee injury and osteoarthritis outcome), OKS (Oxford knee score), CKRS(Cincinnati knee rating system), WOMAC (Western Ontario and McMaster Universities index), KSS
(knee society score), HSS (hospital for special surgery score), AKPS (Kujala anterior knee pain scale), VISA-P (Victorian Institute of Sport Assessment – Patella)
a
Collins NJ, Misra D, Felson DT, Crossley KM, Roos EM. Measures of knee function. Arthritis Care Res (Hoboken). 2011 Nov;63 Suppl 11:S208–28
b
Barber-Westin SD, Noyes FR, McCloskey JW. Rigorous statistical reliability, validity, and responsiveness testing of the Cincinnati knee rating system in 350 subjects with
uninjured, injured, or anterior cruciate ligament-reconstructed knees. Am J Sports Med. 1999 Jul-Aug;27(4):402–16
c
Hamamoto Y, Ito H, Furu M, Ishikawa M, Azukizawa M, Kuriyama S, Nakamura S, Matsuda S. Cross-cultural adaptation and validation of the Japanese version of the new Knee
Society Scoring System for osteoarthritic knee with total knee arthroplasty. J Orthop Sci. 2015 Sep;20(5):849–53
d
Narin S, Unver B, Bakırhan S, Bozan O, Karatosun V. Cross-cultural adaptation, reliability and validity of the Turkish version of the Hospital for Special Surgery (HSS) Knee
Score. Acta Orthop Traumatol Turc. 2014;48(3):241–8
e
Crossley KM, Bennell KL, Cowan SM, Green S. Analysis of outcome measures for persons with patellofemoral pain: which are reliable and valid? Arch Phys Med Rehabil.
2004 May;85(5):815–22
f
Hernandez-Sanchez S, Hidalgo MD, Gomez A. Responsiveness of the VISA-P scale for patellar tendinopathy in athletes. Br J Sports Med. 2014 Mar;48(6):453–7
449
450 A. Grassi et al.

knee-related quality of life (QoL). It is a com- graded as excellent (95–100), good (84–94), fair
plete questionnaire, since it explores all the pos- (65–83), or poor (<65) [21].
sible domains of a multitude of possible knee Conditions: Ligament injuries and surgery,
pathologies [99]. However, for this reason, particularly knee conditions with symptoms of
acceptability and reliability could be different instability, but also meniscal tears, cartilage
based on the patient’s age and condition, espe- lesions, patellofemoral pain, and knee
cially on the sport subscale. It has good correla- osteoarthritis.
tion with the SF-36 score and the WOMAC. For
these reasons and its relative simplicity, the
KOOS is used extensively, especially in large- 46.6.5 Oxford Knee Score (OKS)
volume registries. Moreover, the individual score
for each subscale, rather than an aggregate score, The OKS is a 12-item patient-reported score
allows for clinical interpretation of different developed for patients undergoing TKR [34].
interventions in different dimensions. On the However, it could be used to evaluate OA and
other hand, the KOOS has not been validated for early OA. For these reasons it has a good correla-
telephone and interview administration, which tion with WOMAC, KOOS, and SF-36 scores. It
could limit its applicability due to the need for is valid, reliable and responsive to change of
direct patient involvement [21]. score, which makes it useful in research settings.
A short version of the KOOS which includes However, its development based on knee OA lim-
only seven items from the ADL and sport sub- its its use [21].
scales is available as the KOOS-PS (physical Conditions: TKR, OA, rheumatoid arthritis.
function short form), which is shorter, faster, and
easier to administer in clinical setting [89].
Conditions: Young and middle-aged patients 46.6.6 Cincinnati Knee Rating
with post-traumatic OA (undergoing TKA), System (CKRS)
patient with chondral, ligamentous, or meniscal
injuries. The CKRS, that has been proposed in 1983 and
further modified along the years, is a both patient-
and clinician-reported form that is composed of
46.6.4 Lysholm Score 13 scales assessing symptoms (pain, swelling,
and giving way), perception of overall knee con-
The Lysholm score is an eight-item patient- dition, daily life function (walking, stairs climb-
reported scale that evaluates knee symptoms such ing, and squatting), sport function (running,
as limp, locking, swelling, instability, pain, stair jumping, and pivoting), sport activity, and occu-
climbing, and squatting. It is one of the most pation. The evaluation is completed with physical
commonly used clinical scores for knee evalua- examination, functional testing with one-legged
tion, introduced in 1982 [74]. It is extremely hoop exercises, and radiographic measurement of
popular and widely used in clinical and research joint narrowing. The overall score, rated from 0
settings. It has a limited floor and ceiling effect, to 100, is obtained combining symptoms (20),
making it is useful for tracking improvement of functional activities (15), physical examination
interventions or deterioration over time. (25), stability (20), radiographic findings (10),
Moreover, it has a good correlation with the sub- and functional testing (10). Based on the results,
jective IKDC, Cincinnati knee ligament score, it could be graded as excellent (>80), good (55–
and the WOMAC. However, its main limitation is 79), fair (30–54), or poor (<30). It is a compre-
that it is clinician derived with no patient input. hensive and rigorous scale, with a good reliability
There are concerns about limited reliability and a and high responsiveness to detect changes.
lack of definition of MCID. The Lysholm score is However, it can be quite time-consuming. Its use
46  Common Scales and Checklists in Sports Medicine Research 451

is mostly limited to sports medicine ligamentous It is used mostly for TKA evaluation, and it has a
and meniscal knee conditions [4]. good correlation with SF-36 and the OKS score.
Conditions: Ligamentous injuries and sur- Based on the obtained values, it could be inter-
gery, meniscal allograft and repair. preted as excellent (80–100), good (70–79), fair
(60–69), or poor (<60) [43, 102].
Conditions: Knee OA.
46.6.7 Western Ontario
and McMaster Universities
Index (WOMAC) 46.6.9 Hospital for Special Surgery
Score (HSS)
The WOMAC is a 24-item patient-reported
scale that evaluates three domains, each one The HSS is a 13-item scale, both patient- and
with a dedicated subscale: pain, stiffness, and clinician-reported. It evaluates pain, function,
functional activities. It is available both as ROM, muscle strength, flexion deformity, insta-
five-point Likert scale and 100-mm VAS or bility and alignment. It has similar features to
NRS; therefore, based on the type of scoring, the KSS score and, also, could be graded as
different ratings are obtained for the three sub- excellent (85–100), good (70–84), fair (60–69),
scales. However, the obtained values could be or poor (<60). It offers a precise evaluation of
converted to a simple 0–100 scale. The knee function but lacks in general quality of life
WOMAC is one of the most common scales to assessment. Therefore, it should be used with
evaluate patients with knee OA and is validated other scores capable of depicting patient’s gen-
in numerous languages. Moreover, it has the eral condition [84].
advantage of being validated for use in person, Conditions: Knee OA.
over the telephone, or electronically. The indi-
vidual scores for the three domains, rather than
the aggregate value, enhance its interpretation 46.6.10  K
 ujala Anterior Knee Pain
for each domain. However, the presence of Scale (AKPS)
items related to uncommon tasks could result
in missing data, while the lack of difficult tasks The AKPS, also known as “Kujala score,” is a
makes it not optimal for more active patients. 13-item patient-reported scale that evaluates the
This scale is optimal for research purpose due subjective response to six activities, regarded as
to its reliability and ability to detect changes triggers for anterior knee pain such as walking,
especially after surgical and nonsurgical inter- running, jumping, climbing stairs, squatting, and
ventions for knee OA and chondral defects [6, sitting [66]. Moreover, the evaluation is inte-
7, 21]. grated with objective basic knee characteristics
Conditions: Knee OA, cartilage lesions, and such as swelling, thigh atrophy, flexion contrac-
ACL injury. ture, and patellar abnormal movements.
Therefore, it is specifically dedicated to painful
conditions of the anterior knee and specifically
46.6.8 Knee Society Score (KSS) patellofemoral pathologies. It is a simple and
fast questionnaire and has a good correlation
The KSS is a seven-item both patient- and clini- with Lysholm, KOOS, and VAS pain; however, it
cian-based score that integrates subjective assess- does not distinguish between patients with one
ment of pain with objective features such as episode of patellar dislocation and recurrent
flexion or extension lag, ROM, alignment, and instability [27].
laxity. For this reason, it is limited by low reli- Conditions: Anterior knee pain conditions,
ability and inter- and intra-observer variations. patellofemoral pathologies, especially instability.
452 A. Grassi et al.

46.6.11  V
 ictorian Institute of Sport to 100 [45, 64]. The AOFAS scores are not purely
Assessment-Patella patient-reported since it incorporates both sub-
(VISA-P) jective and objective data that requires the clini-
cal assessment. Despite its popularity the AOFAS
The VISA-P score is an eight-item patient- score has limitations due to lack of validation,
reported questionnaire composed of a VAS and a high inter-observer variability, and poor correla-
Likert portion, which assesses pain during activity tion with other generic PROMs. For these rea-
or functional tests and sport participation. It has sons the AOFAS society itself recommended the
been specifically developed for the measurement usage of more validated and standardized out-
of patellar tendon-related conditions. It has a good come scores [90].
reliability and repeatability and has a good corre- Conditions: These region-specific question-
lation with the VAS pain. Moreover, since its naires have been used to evaluate patients in a
MCID is available, the VISA-P represents one of wide variety of foot and ankle pathologies such
the most utilized scores for the assessment of as arthritis, cartilage defects, soft tissue patholo-
treatments for patellar tendinopathy [48]. gies, and toe and finger deformities.
Conditions: Patellar tendon disorders.

46.7.2 American Academy


46.7 M
 easures of Foot and Ankle of Orthopaedic Surgeons:
Function Foot and Ankle Model
(AAOS-FAM)
An extremely wide range of outcome measures
have been developed for the evaluation of the The AAOS-FAM was released in 2004, and it is a
foot and ankle in clinical research; during a patient-reported questionnaire composed of 25
10-year period, 139 different scales have been items divided into 5 subscales: pain, function,
described, and more recently, between 2012 and stiffness and swelling, giving way, and shoe com-
2016, as many as 89 measures have been used in fort. Each answer is measured on a scale of 1–5
literature for this anatomical region. This incred- or 6 and then calculated; the result is a percentage
ible variety might be detrimental for an evidence- (0–100) where higher numbers represent better
based decision-making and for comparing function. This scale is increasing in popularity
clinical results [50, 51]. among surgeons; it has good reliability and
The basic psychometric characteristics, repeatability [56, 116].
strength, and weakness of the most common Conditions: AAOS-FAM can be used to com-
scales for foot and ankle function are described pare clinical outcomes in specific foot and ankle
(Table 46.5). pathologies or surgical methods.

46.7.1 American Orthopaedic Foot 46.7.3 Foot Function Index (FFI)


and Ankle Society Score
(AOFAS Score) The FFI was developed in 1991 for senior patients
with foot-related pathologies; it was considered
First introduced in 1994, the AOFAS score is the specific for foot- and ankle-related conditions sec-
most used outcomes measure tools among clini- ondary to rheumatoid arthritis although there is no
cians. Four questionnaires are present for differ- specific item for this condition in the question-
ent parts of the foot: ankle/hind foot, mid-foot, naire. It is composed of 23 patient-reported ques-
hallux, and lesser toe; each one is composed of tions that assess foot function in three domains:
nine items divided into three domains (function, pain, disability, and activity limitation. It has
alignment, and pain) and rated on a scale from 0 moderate to high correlation with SF-36; this
Table 46.5  Measures of foot and ankle function
Time Time
compilation calculation
Questionnaire Items Options Administration Range Cutoffs (min) (min) Internal consistency Test-retest SRM MDC MCID
AOFASa 9 Likert Patient + clinician 0 (worst) to No 10 3 0.585 (0.863 for 0.89–0.97 0.79 1.7 NR
(3–4) 100 (best) function subscale)
AAOS-FAMb 25 Likert Patient 0% (worst) to No 15 10 0.83–0.93 0.79–0.99 NR NR NR
(5–6) 100% (best)
FFIb 23 VAS Patient 0 (best) to 230 No 5 7 0.73–0.96 0.70–0.87 NR 7 NR
(worst)
FAOSb 42 Likert (5) Patient 0 (worst) to No 10 10 0.88–0.94 0.70–0.87 NR NR NR
100 (best)
FAAMb 29 Mix Patient 0% (worst) to No 7 10 0.96–0.98 0.87–0.89 NR NR 8 ADL/9
100% (best) sport
FADIb 34 Likert (5) Patient 0% (worst) to No 7 10 0.84–0.89 NR NR NR NR
100% (best)
ACFASc 16 Likert (5) Patient + clinician 0 (worst) to No 15 10 NR NR NR NR NR
100 (best)
FHSQb 13 Likert Patient 0 (worst) to No 6 10 0.89–0.95 0.74–0.92 NR NR 7–14
(3–5) 100 (best)
ROFPAQb 39 Likert (5) Patient 1 (best) to 5 No 10 6 0.81–0.89 0.82–0.93 NR NR NR
46  Common Scales and Checklists in Sports Medicine Research

(worst)
QOLb 5 Likert (5) Patient 0 (worst) to 20 No 3 1 0.85–0.91 NR NR NR NR
(best)
OMASd 9 Likert Patient 0 (worst) to No 3 3 0.76 0.95 NR NR NR
(3–5) 100 (best)
VISA-Ae 8 Mix Patient 0 (worst) to No 5 7 0.73 0.79 NR NR NR
100 (best)
Note: AOFAS (American orthopaedic foot and ankle society score), AAOS-FAM (American Academy of Orthopaedic: foot and ankle), FFI (foot function index), FAOS (foot
and ankle outcomes score), FAAM (foot and ankle ability measure), FADI (foot and ankle disability index), ACFAS (American College of Foot and Ankle Surgeons), FHSQ
(foot health status questionnaire), ROFPAQ (Rowan foot pain assessment), QOL (sport ankle QOL), OMAS (Olerud-Molander ankle score), VISA-A (Victorian Institute of
sports assessment-Achilles)
a
De Boer AS, Meuffels DE, et al. Validation of the American Orthopaedic Foot and Ankle Society Ankle-Hindfoot Scale Dutch language version in patients with hindfoot frac-
tures. BMJ Open. 2017;7(11):e018314
b
Martin RL, Irrgang JJ. A survey of self-reported outcome instruments for the foot and ankle. J Orthop Sports Phys Ther. 2007;37(2):72–84
c
Cook JJ, Cook EA, et al. Validation of the American College of Foot and Ankle Surgeons Scoring Scales. J Foot Ankle Surg. 2011;50(4):420–9
d
Nilsson et al. The Swedish version of OMAS is a reliable and valid outcome measure for patients with ankle fractures. BMC Musculoskeletal Disorders. 2013;14:109
e
Iversen, J. V., Bartels, et al. Danish VISA-A questionnaire. Scand J Med Sci Sports. 2016; 26: 1423–1427
453
454 A. Grassi et al.

s­ uggests that FFI may be a good measure of both Conditions: Sport-related foot and ankle
health status and patients’ outcomes [13, 14, 104]. pathologies and trauma evaluation.
Conditions: Generally used in older patients,
rheumatoid patients, orthotics outcomes, poor
reliability in professional athlete due to the 46.7.7 American College of Foot
reported ceiling effect. and Ankle Surgeons (ACFAS)
Universal Evaluation Scoring
Scales
46.7.4 Foot and Ankle Outcomes
Score (FAOS) The American College of Foot and Ankle
Surgeons developed these anatomically based
The FAOS, released in 2001, is a 42-item patient- scoring scales in 2005 as clinical instruments to
reported outcomes measure that consists in five evaluate objective and subjective parameters
subscales (pain, symptoms, ADL, sport, and before and after surgery [106]. Four modules
ankle-related quality of life). Each subscale is exist for the first metatarsal-phalangeal joint and
graded separately and scored in a 0–100 value. first ray, the forefoot (excluded first ray), the rear
The FAOS demonstrated good reliability and foot, and the ankle; each questionnaire is com-
validity, but the length of this survey can create pleted by both patient and clinician and includes
significant burden for the patient [41, 98]. subjective (pain, appearance, and functional
Conditions: It has been validated for a variety capacity) and objective (radiographic and func-
of foot and ankle pathologies such as adult flat- tional) parameters, for a total of 100 points. This
foot deformity, hallux valgus, hallux rigidus. instrument has been validated and presents good
reliability and sensitivity to change [24].
Conditions: Foot and ankle musculoskeletal-
46.7.5 Foot and Ankle Ability related pathologies requiring surgical intervention.
Measure (FAAM)

The FAAM was developed in 2005; it is region- 46.7.8 Foot Health Status
specific and composed of 29 patient-reported Questionnaire (FHSQ)
items divided in activity of daily living (ADL)
and sport subscales. A recent study demonstrated The FHSQ was developed for individuals under-
that the FFI and FAAM are highly correlated for going surgical treatment for common foot condi-
foot and ankle trauma patients [40, 77]. tions. It consists of four subscales with a total of
Conditions: It is valid for a range of foot and 13 items representing the following four
ankle conditions as well as for chronic ankle domains: pain (four items), function (four items),
instability and diabetes mellitus-related footwear (three items), and general foot health
conditions. (two items). Scores from each subscale range
from 0 to 100, with a higher score representing
better outcomes [8].
46.7.6 Foot and Ankle Disability Conditions: Foot- and ankle-related disorders
Index (FADI) including those affecting skin and nail.

The FADI was first released in 1999; it is the for-


mer version of FAAM and includes four more 46.7.9 Rowan Foot Pain Assessment
items for pain assessment and one item for the (ROFPAQ)
ability to sleep (34 total items). FADI and FAAM
are appropriate to evaluate functional disabilities The ROFPAQ was developed as a disease-spe-
in athletes with chronic ankle instability [76]. cific instrument for chronic foot pain. It contains
46  Common Scales and Checklists in Sports Medicine Research 455

39 items in the following four subscales for pain ent domains: pain, stiffness, swelling, stair climb-
assessment: sensory (16 items), affective (ten ing, running, jumping, squatting, supports, and
items), cognitive (ten items), and comprehension work/activity level [87, 88].
(three items). Each subscale is scored indepen- Conditions: Ankle fracture, ligament ankle
dently from 1 through 5, and the item responses injury.
are merged together to produce a subscale score
ranging from 1 to 5, with a higher score repre-
senting more pain [100]. 46.7.12  V
 ictorian Institute of Sports
Conditions: Chronic foot and ankle pain. Assessment-Achilles
(VISA-A)

46.7.10  S
 port Ankle QOL (Quality The VISA-A is a disease-specific instrument
of Life) designed to evaluate the clinical severity for
patients with chronic Achilles tendinopathy. It
The sport ankle rating system quality of life mea- is an easily self-administered questionnaire
sure was developed as a region-specific measure that evaluates symptoms and their effect on
including self-reported and clinician-completed physical activity. The questionnaire contains
outcome measures. The QOL measure, the clini- eight questions, covering three necessary
cal rating score, and a single numeric evaluation domains: pain, functional status, and activity.
are the three outcome measures that could be The first six questions use a visual analog scale
used together or independently. The QOL is a so that the patient may report the magnitude of
self-reported questionnaire designed to assess an a continuum of subjective symptoms; the final
athlete’s quality of life after an ankle injury; it two questions used a categorical rating scale.
contains five items that evaluate symptoms, work The final results range from 0 to 100, with
and school activities, recreation and sports activi- asymptomatic persons expected to score 100
ties, activities of daily living, and lifestyle. points [95].
The clinical rating score is composed of 11 Conditions: Chronic Achilles tendinopathy.
items both patient and clinician based; finally,
with the numeric VAS evaluation, the patient is
asked to score his ankle function from 0 to 100 46.8 Measures of Activity Level
[113].
Conditions: Ankle injuries and specifically Making patients capable of an unlimited physical
ankle sprains. activity is the main focus of clinicians; for this
reason several scores have been created to assess
outcomes in terms of return to sport/activity
46.7.11  Olerud-Molander Ankle (RTS). While considering these instruments, a
Score (OMAS) factor to be outlined is that athletes are different
from the general population since they have
The OMAS is a disease-specific outcomes mea- higher level of physical function and perceived
sure developed for patients with ankle fractures health, often they do not perceive symptoms dur-
and has been frequently used to evaluate this ing the daily activities, and common outcome
group of subjects; furthermore, it has been measures may not detect problems that only
reported to be a valid item for recording short- result from high-intensity training and
term changes after an acute ankle ligament injury. competition.
OMAS is a self-administered patient question- The basic psychometric characteristics,
naire; the scale ranges from 0 points (totally strength, and weakness of the most common
impaired function) to 100 points (completely scales for activity level are described below
unimpaired function) and is based on nine differ- (Table 46.6).
Table 46.6  Measures of activity level
456

Time Time
compilation calculation Internal
Questionnaire Items Options Administration Range Cutoffs (min) (min) consistency Test-retest SRM MDC MCID
TEGNERa 1 Likert (11) Patient 0 (worst) to 10 (best) No 1 1 NR 0.8 1.0 1 1
UCLAb 1 Likert (10) Patient 1 (worst) to 10 (best) Yes 2 2 NR NR NR 1 1
ARSc 4 Likert (5) Patient 0 (worst) to 16 (best) No 1 1 8.87 0.81 NR NR NR
AASd 1 Likert (11) Patient 0 (worst) to 10 (best) No 1 1 1.0 NR NR 1 1
LEFSe 20 Likert (5) Patient 0 (worst) to 80 (best) No 5 3 0.96 0.86 NR 9 9
SASf 7 Likert Patient 0 (worst) to No 1 1 NR 0.92 NR NR NR
(4–5) 20(best)/A–D
HASg 44 Likert (6) Patient 0 (worst) to 220 (best) No 120 30 NR 0.75 NR NR NR
OSTRCh 4 (each Likert Patient 0 (best) to 100 (worst) No >5 (variable) 3–7 0.86–0.91 NR NR NR NR
joint) (4–5)
SQUASHi 11 Open Patient 1 min. to 9 max. (per No 3–5 10 NR NR NR NR NR
item) no upper limit
IPAQ-SFi 7 Open Patient 0 to No upper limit No 3–5 10 <0.8 NR NR NR NR
HAPi 94 Likert (3) Patient 1 (worst) to 94 (best) No 10 5 NR 0.84 MAS; NR 6.5 MAS; NR
0.79 AAS 8.4 AAS
Note: ARS (activity rating scale), AAS (ankle activity score), LEFS (lower extremity functional scale), SAS (shoulder activity scale), HAS (Heidelberg sport activity score), OSTRC
(Oslo Sports Trauma Research Centre overuse injury questionnaire), SQUASH (short questionnaire to assess health-enhancing physical activity), IPAQ-SH (international physical
activity questionnaire – short form), HAP (human activity profile)
a
Briggs KK, Lysholm J, et al. The reliability, validity, and responsiveness of the Lysholm score and Tegner activity scale for anterior cruciate ligament injuries of the knee: 25 years
later. Am J Sports Med. 2009;37(5):890–7
b
Terwee CB1, Bouwmeester W, et  al. Instruments to assess physical activity in patients with osteoarthritis of the hip or knee: a systematic review of measurement properties.
Osteoarthritis Cartilage. 2011;19(6):620–33
c
Hossein Negahban, Neda Mostafaee, Soheil Mansour Sohani, Masood Mazaheri, Shahin Goharpey, Mahyar Salavati, Shahla Zahednejad, Zohreh Meshkati & Ali Montazeri (2011)
Reliability and validity of the Tegner and Marx activity rating scales in Iranian patients with anterior cruciate ligament injury, Disability and Rehabilitation, 33:22–23, 2305–2310
d
Tamás Halasi, MD, Ákos Kynsburg, MD, et  al. Development of a New Activity Score for the Evaluation of Ankle Instability. The American Journal of Sports Medicine.
2004;32,899–908
e
Alcock GK, Stratford PW. Validation of the Lower Extremity Function Scale on athletic subjects with ankle sprains. Physiother Can. 2002;54:233–240
f
Brophy RH, Beauvais RL, et al. Measurement of shoulder activity level. Clin Orthop Relat Res. 2005;439:101–8
g
Seeger J, Weinmann S, et al. The Heidelberg Sports Activity Score - A New Instrument to Evaluate Sports Activity. The Open Orthopaedics Journal. 2013;7:25–32
h
Clarsen B, Myklebust G, et al. Development and validation of a new method for the registration of overuse injuries in sports injury epidemiology: the Oslo Sports Trauma Research
Centre (OSTRC) Overuse Injury Questionnaire. Br J Sports Med. 2013;47:495–502
i
Terwee CB, Bouwmeester W, et  al. Instruments to assess physical activity in patients with osteoarthritis of the hip or knee: a systematic review of measurement properties.
Osteoarthritis Cartilage. 2011;19(6):620–33
A. Grassi et al.
46  Common Scales and Checklists in Sports Medicine Research 457

46.8.1 Tegner Activity Score time/month) to 4 (>4 time/month), and the total
score range is 0–16. ARS is based on the idea of
First described in 1985 [105] for the prospective measuring specific components of function/
evaluation of the knee ligaments injuries, the movement (that apply universally to the lower
Tegner activity scale provides an arbitrary rank- limb) to allow more accurate comparison among
ing based on the level of sport and leisure time patients. This scale can be completed in a very
activities and competition at which the patient is short time frame [78].
currently participating. It is a simple scale in Conditions: Sport activity involving complex
which the subject indicates his/her current activ- articular motions of the knee and lower limb.
ity ranging from 0 (no physical activity/dis-
abled) to 10 (participation in competitive soccer
or pivoting sports). It was created as a comple- 46.8.4 Ankle Activity Score (AAS)
ment to the Lysholm score; but its use has also
extended into other joints, including the hip and The AAS is a joint-specific score that was pub-
ankle [42, 78]. lished in 2004; it was based on the Tegner score.
Conditions: Tegner score was developed and It contains 53 sports, three working activities,
is mostly used for knee ligamentous injuries and and four general activities; the patient is asked to
reconstructions. select his/her most appropriate sport/activity and
to indicate a level of participation (top level,
lower competitive level, recreational level). The
46.8.2 University of California at Los result, as with the Tegner score, is represented by
Angeles (UCLA) Activity a single number from 0 to 10 [42].
Rating Scale Conditions: Ankle injuries.

The UCLA activity rating scale is a simple scale


ranging from 1 (no activity) to 10 (participation 46.8.5 Lower Extremity Functional
in impact sports); it was developed in 1998 to Scale (LEFS)
assess physical activity after joint replacements.
Like the Tegner score, the patient is asked to rate The LEFS was developed to be a broad region-
his/her own most appropriate activity level. Four specific measure for individuals with muscular-
activity subgroups were defined: scores between skeletal disorders of the hip, knee, ankle, or foot.
0 and 4 (low activity), 4.1 and 6 (moderately low It consists of 20 items that specifically cover the
activity), 6.1 and 8 (moderately high activity), domains of activity and participation. The scale
and 8.1 and 10 (high activity) [114]. uses a Likert response format, with a higher score
Conditions: The UCLA is mostly used and representing a higher level of ability [10].
validated for hip and knee osteoarthritis and eval- Conditions: LEFS have been validated for
uation of joint replacement. several pathologies of the lower limb; moreover,
it has been translated in different languages.

46.8.3 Activity Rating Scale (ARS) or


Marx Scale 46.8.6 Shoulder Activity Scale (SAS)

The ARS/Marx questionnaire quantifies the fre- The Brophy-Marx SAS was developed in 2005 as
quency of activities that challenge the dynamic an easy instrument to evaluate the patient’s over-
stability of the knee; it consists in four questions all shoulder activity level that could be general-
about how frequently the patients perform activ- ized across different sports and completed in less
ity such as running, cutting, decelerating, and than 1 min. It is composed of two parts: the first
pivoting. Each question is scored from 0 (<1 five items describe five common activities of the
458 A. Grassi et al.

shoulder and the relative frequency, during the injury problems in any area of the body. The
patient’s previous year, for each item is scored instrument is designed in four items for each
from 0 to 4 (never or less than once a month, once affected joint; the final “severity score” ranges
a month, once a week, more than once a week, or between 0 and 100 (25 point for item) for each
daily). The total numerical activity score ranges overuse problem. In studies with multiple ana-
from a minimum of 0 points to a maximum of 20 tomical areas of interest, the four questions are
points. In the second part of the score, the patients repeated for each area. This questionnaire uses
are asked if they participate in contact sports and the term “problem” rather than “injury” since
sports that involve repetitive overhead throwing. there is greater variation in interpretation of the
The answers of these two questions range from A term “injury” [20].
(no) to D (yes, at professional level). SAS has Conditions: OSTRC is mainly used for the
shown good reliability and validity [11, 12]. evaluation of overuse problems in sports injury
Conditions: This score has been developed on epidemiology.
patients with rotator cuff tears.

46.8.9 Short Questionnaire to Assess


46.8.7 Heidelberg Sport Activity Health-Enhancing Physical
Score (HSAS) Activity (SQUASH)

The HAS was published in 2013; this validated The SQUASH was not designed to measure
instrument divides sport activities into 11 catego- energy expenditure but to give an indication of
ries: walking, swimming, cycling, running, cross- the habitual activity level. It consists of 11 ques-
country skiing, alpine skiing, golfing, dancing, tions on commuting activities, leisure time and
racket sports, ball sports, and miscellaneous. For sports activities, household activities, and activi-
each of these activities, the patient is asked to grade ties at work and school. The total activity score is
between 0 and 5 about frequency, duration, level of calculated by taking the sum of the activity scores
importance, and impairment from the affected for separate questions [112].
joint. For each activity, a core from 0 to 20 is calcu- Conditions: SQUASH is a short physical
lated with a formula: (frequency  +  dura- activity questionnaire with the general purpose to
tion) × (1 + impairment/10 + importance/10). The assess habitual physical activity.
scores are then added to obtain a final score
between 0 and 220. HAS has proven high validity,
sensitivity, reliability, and sensitivity. It can be used 46.8.10  International Physical
for elite-level athletes and athletes who perform Activity Questionnaire-
different sports and is valid for different joints; nev- Short Form (IPAQ-SF)
ertheless, its disadvantage is an extremely long
time for compilation (120 min) [103]. The IPAQ-SF consists in seven questions about
Conditions: Evaluation of activity after the frequency and durations of participation in
trauma or surgery; it can be used in elite-level strenuous, moderate, and walking activities in
athletes. addition to the time spent sitting during the past
week. The final score is expressed in metabolic
equivalents (METs) which represent the oxygen
46.8.8 Oslo Sports Trauma Research consumption of an individual sitting for 1  min
Center Overuse Injury (3.5 mL/kg/min) [26, 29].
Questionnaire (OSTRC) Conditions: IPAQ has been validated, and it
presents reasonable measurement properties for
The intention of the OSTRC was to create a monitoring population levels of physical activity
questionnaire that could be applied to overuse in diverse settings.
46  Common Scales and Checklists in Sports Medicine Research 459

46.8.11  H
 uman Activity Profile The basic psychometric characteristics,
(HAP) strength, and weakness of the most common
scales for global and mental health are described
The HAP is a 94-item self-report measure of (Table 46.7).
energy expenditure or physical fitness; it was
developed as an outcome measure for medical
rehabilitation for people with a wide spectrum of 46.9.1 36-Item Short-Form Health
physical disorders. It consists of a list of activi- Survey (SF-36) and Short-
ties for which patients should indicate if they are Form 12 (SF-12)
currently able to perform the activity, have
stopped performing the activity, or have never The SF-36 is a general health measure, intro-
performed the activity. Each of the selected duced in 1992 that includes 36 items addressing
activities has an estimate energy requirement eight domains of overall health status: physical
between approximately 1 and 10 METs. Two functioning (PF), bodily pain (BP), role limita-
scores are calculated: the maximum activity tions due to physical health problems (RP), role
score (MAS) and the adjusted activity score limitations due to personal or emotional prob-
(AAS) [38]. lems (RE), general mental health (MH), social
Conditions: Epidemiologic and population functioning (SF), energy/fatigue or vitality (VIT),
studies as well as rehabilitation medicine. and general health perceptions (GH). Although
this scale has been validated for orthopedic use,
experts recommend pairing the SF-36 with an
Fact Box 46.2 orthopedic-specific measure since it is a general
Athletes population is in many ways differ- health scale, and it could be difficult to isolate
ent from general one; they have much orthopedic outcomes from other unrelated health
higher level of physical functioning and conditions.
health status. For this reason, they do not The SF-12 is a shortened version of the SF-36,
perceive symptoms during the daily activi- developed in 1996, with the aim of reducing
ties and choosing ad adequate outcome redundancies and time burden on the patient. It
measure is compulsory. shortens the survey to 12 items and reports 2
scores in physical and mental domains. It has
been validated for orthopedic patients. The SF-12
was included as a recommended PRO measure
46.9 Measures of Global for “general quality of life” by the AAOS.
and Mental Health Both SF-36 and SF-12 are great questionnaires
for outcomes assessment in research; both need to
Generic measures of health-related quality of be administered in conjunction with orthopedic-
life are frequently used to evaluate the impact specific measures [15, 52, 110, 111, 115].
of treatments and clinical results and to monitor
population health. Often these scales are com-
posed of various independent domains/dimen- 46.9.2 EuroQol-5 Domains-3 Likert
sions that together represent the notion of (EQ-5D-3L)
health-related quality of life. The items are
weighted to indicate the relative importance The EQ-5D health status and quality-of-life
attributed to them by the respondents and then measure is composed of five items (mobility,
aggregated into a single number reflecting the self-care, usual activity, pain/discomfort, and
quality or value of a health state. To obtain such anxiety/depression), with three possible response
values, several instruments have been levels (no problems, some/moderate problems,
developed. extreme problems). The EQ-5D index is
460

Table 46.7  General and mental health measures


Time Time
compilation calculation Internal
Questionnaire Items Options Administration Range Cutoffs (min) (min) consistency Test-retest SRM MDC MCID
SF-36a 36 Likert (2–6) Patient 0 (worst) to 100 No 7 13 ≥0.70 (0.90 for <0.70 1.04 5 NR
(best) physical
function)
SF-12a 12 Likert (2–6) Patient 0 (worst) to 100 No 3 7 ≥0.82 PCS; ≥ 0.89 PCS; <0.2 NR NR
(best) 0.75 MCS 0.79 MCS
EQ-5D-3Lb 6 Mix Patient −0.594 (worst) No 5 10 NR NR 0.7/0.04 NR NR
to 1 (best)/0–100
AQoLc 12 Likert (4) Patient −0.04 (worst) to No 5 7 NR NR NR NR NR
1.0 (best)
NPHa 45 Dichotomous Patient 0 (best) to 100 No 5–10 10–15 0.83 (0.33 for 0.77–0.85 NR NR NR
(worst) mobility)
PROMIS 10b 10 Mix Patient t-score No 5 2 NR NR 0.72 NR NR
distribution
Note: SF-36 (36-Item Short Form Health Survey), SF-12 (12-Item Short Form Health Survey), EQ-5D-3L (EuroQol-5 Domains-3 Likert), AQoL (Assessment of Quality of Life),
NPH (Nottingham Health Profile), PROMIS (Patient-Reported Outcomes Measurement Information System 10 Global health)
a
Busija, L., Pausenberger, E., et al. (2011), Adult measures of general health and health-related quality of life: Medical Outcomes Study Short Form 36-Item (SF-36) and Short
Form 12-Item (SF-12) Health Surveys, Nottingham Health Profile (NHP), Sickness Impact Profile (SIP), Medical Outcomes Study Short Form 6D (SF-6D), Health Utilities
Index Mark 3 (HUI3), Quality of Well-Being Scale (QWB), and Assessment of Quality of Life (AQOL). Arthritis Care Res, 63: S383-S412
b
Oak SR, Strnad GJ, et al. Responsiveness Comparison of the EQ-5D, PROMIS Global Health, and VR-12 Questionnaires in Knee Arthroscopy. Orthop J Sports Med. 2016 Dec
17;4(12):2325967116674714
c
Kathryn Whitfield, Rachelle Buchbinder, et al. Parsimonious and efficient assessment of health-related quality of life in osteoarthritis research: validation of the Assessment of
Quality of Life (AQoL) instrumen. Health and Quality of Life Outcomes 2006, 4:19
A. Grassi et al.
46  Common Scales and Checklists in Sports Medicine Research 461

c­alculated from the five dimensions, ranging initiative develops and evaluates standard mea-
from −0.594 (worst) to 1.0. Moreover, to the sures for key patient-reported health indicators
EQ-5D index, the EQ-5D includes a VAS for rat- and symptoms. PROMIS measures are standard-
ing of overall health status from 0 (worst imagin- ized, allowing for assessment of many patient-
able health) to 100 (best imaginable health). A reported outcome domains such as pain, fatigue,
common criticism of this measure is the lack of emotional distress, physical functioning, and
sensitivity to change since only three levels of social role participation.
responses are available within each construct. Computerized adaptive testing (CAT) soft-
With the aim of addressing this issue, a version ware has been implemented; this allows tailoring
of the measure with five responses has been the PRO assessment to the individual patient by
developed, called the EQ-5D-5L [47]. selecting the most informative set of questions
based on responses to previous questions [17].

46.9.3 Assessment of Quality


of Life (AQoL) 46.10 Measures of Pain

The AQoL is a 12-item instrument which loads Pain is a complex and subjective experience and
onto four dimensions: independent living, social that implies several measurement challenges. It is
relationships, physical senses, and psychological important for the clinicians to utilize sensitive
well-being. These subscales are weighted and accurate pain outcome measures although
between 0.0 (death) and 1.0 (full health). With currently we rely mainly on self-report measures.
its emphasis upon psychosocial dimensions of The cutoff value for clinical significance of pain
health, it offers significant advantages for evalu- reduction must be determined on the minimal
ation studies where these dimensions are impor- amount of change being important to patients. A
tant [93]. reduction of 10–20% of pain can be considered
clinically significant [67].
The basic psychometric characteristics,
46.9.4 Nottingham Health strength, and weakness of the most common
Profile (NPH) scales for pain assessment are described
(Table 46.8).
The NHP questionnaire is a self-administered
questionnaire. It was developed in English and
consists of two parts: Part 1 contains 38 “yes/no” 46.10.1  V
 isual Analog Scale for Pain
questions covering six dimensions: pain, physical (VAS for Pain)
mobility, emotional reactions, energy, social iso-
lation, and sleep. Part 2 has seven “yes/no” ques- The VAS for pain was introduced in 1976; it is a
tions concerning problems of daily activities. It widely recognized and simple instrument that
has been shown to be internally consistent, valid, allows the patient to score his own pain level on a
reproducible, and sensitive [55]. straight 100-mm line with zero indicating “no
pain” and 100 “worst imaginable pain.” Its use-
fulness in orthopedic surgery has been recog-
46.9.5 Patient-Reported Outcomes nized, and VAS has proven high validity and
Measurement Information responsiveness; on the other hand, its low speci-
System 10 Global Health ficity has been shown with a 1.1 cm decrease cor-
(PROMIS-10 Global Health) responding to the minimal clinical important
difference for pain. The patient acceptable symp-
The PROMIS was established in 2004 with fund- tomatic state is considered with a value less than
ing from the National Institutes of Health; this 3 cm [37, 46, 59, 115].
462

Table 46.8  Pain measures


Time Time
compilation calculation Internal Test-
Questionnaire Items Options Administration Range Cutoffs (min) (min) consistency retest SRM MDC MCID
VASa 1 VAS Patient 0 mm (best) to Yes <1 <1 NR 0.71– NR 1 mm 1.1 cm
100 mm (worst) 0.94
NRSa 1 Likert Patient 0 (best) to 10 No <1 <1 NR 0.96 NR 1 2
(11) (worst)
VRSa 1 Likert Patient 1 (best) to 5 No <1 <1 NR NR NR 1 1
(5) (worst)
FPS-Rb 1 Likert Patient 0 (best) to 10 No <1 <1 NR NR NR 2 2
(6) (worst)
SF-MPQa 17 Mix Patient 0–45/0–5/0–10 No 2–5 1–2 0.73–0.89 0.79– >0.80 5.2, 4.5, 2.8, >5
(best) to (worst) 0.93 1.4, 1.4 cm
Note: VAS (visual analog scale for pain), NRS (numerical rating scale for pain), VRS (verbal rating scale for pain), FPS-R (faces pain scale revised), SF-MPQ (short-form McGill
pain questionnaire)
a
Hawker, G. A., Mian, S., et al. Measures of adult pain: Visual Analog Scale for Pain (VAS Pain), Numeric Rating Scale for Pain (NRS Pain), McGill Pain Questionnaire (MPQ),
Short-Form McGill Pain Questionnaire (SF-MPQ), Chronic Pain Grade Scale (CPGS), Short Form-36 Bodily Pain Scale (SF-36 BPS), and Measure of Intermittent and Constant
Osteoarthritis Pain (ICOAP). Arthritis Care Res, 2011;63: S240-S252
b
Ferreira-Valente MA, Pais-Ribeiro JL, et al. Validity of four pain intensity rating scales. Pain. 2011;152(10):2399–404
A. Grassi et al.
46  Common Scales and Checklists in Sports Medicine Research 463

46.10.2  N
 umerical Rating Scale severe. The SF-MPQ also has a single VAS item
for Pain (NRS for Pain) for pain intensity and a VRS for rating the over-
all pain experience. It is used particularly to
The NRS is an 11-point scale consisting of inte- measure the sensory and affective aspects of
gers from 0 through 10: 0 representing “no pain” pain and pain intensity in adults with chronic
and 10 representing “worst imaginable pain.” pathologies [46, 79].
Respondents select the single number that best
represents their pain intensity. It is considered to
be more comprehensive compared to the VAS Fact Box 46.3
for; however, it may capture the complex nature Since “pain” perception is maybe the most
of the pain experience [46]. relevant outcome in clinical practice,
research protocols should include sensitive
and accurate pain measures. Currently we
46.10.3  V
 erbal Rating Scale for Pain rely mainly on self-report measures where
(VRS for Pain) a reduction of 10–20% of pain can be con-
sidered as the minimal amount of change
The VRS is a single domain five-point scale con- being important to patients.
sisting of a list of sentences (no pain, mild pain,
moderate pain, intense pain, maximum pain)
describing increasing levels of pain severity.
Respondents select the single phrase that best 46.11 Measures of Sport-Related
characterizes their pain intensity [46]. Psychological Aspects

It has been demonstrated that while most athletes


46.10.4  F
 aces Pain Scale-Revised reach a normal physical function, less than half
(FPS-R) of them return to the same level of sport activity.
Possibly, psychological factors are involved in
The FPS-R is a six-point scale represented by six the rehabilitation processes and in the athlete’s
different faces showing increasing severity of self-perception of recovery. The following sec-
pain. Patients are asked to select the facial expres- tion describes some of the common scales for
sion that best resembles his or her pain intensity, sport activity assessment, energy expenditure,
from the left-most face (“no pain”) to the right- and psychological factors after injuries.
most face (“very much pain”). The FPS-R was The basic psychometric characteristics,
originally developed for pediatric patients, but its strength, and weakness of the most common
simplicity makes it a reliable instrument for indi- scales for sport-related psychological aspects are
viduals with cognitive and communication described (Table 46.9).
impairments as well [9, 49, 101].

46.11.1  Injury-Psychological
46.10.5  S
 hort-Form McGill Pain Readiness to Return
Questionnaire (SF-MPQ) to Sport Scale (I-PRRS)

The SF-MPQ is a multidimensional measure, The I-PRRS is an easy to use tool developed to
with extensive clinical research use. Patients rate measure the athlete’s psychological readiness to
their pain in sensory terms (e.g., sharp or stab- return to sport after injury. It is a six-item scale,
bing) and affective terms (e.g., sickening or fear- each item is scored from 0 (no confidence) to 100
ful), with 15 total descriptors. Each item is rated (maximum confidence) with intervals of 10. The
on a four-point scale that ranges from none to scores from the six items are summed and divided
464

Table 46.9  Psychological scores and measures


Time compilation Time calculation Internal Test-
Questionnaire Items Options Administration Range Cutoffs (min) (min) consistency retest SRM MDC MCID
IPRRSa 6 Likert Patient 0 (worst) to 60 Yes 8–10 5–8 0.63 0.97 NR NR NR
(11) (best)
RIAIb 28 Likert (4) Patient 0 (best) to 45 No 8–15 8 0.96–0.98 NR NR NR NR
(worst)
TSKc 11 Likert (4) Patient 11 (best) to 44 No 4–8 5 0.7–0.9 >0.7 NR NR NR
(worst)
Note: IPRRS (injury-psychological readiness to return to sport scale), RIAI (re-injury anxiety inventor), TSK (Tampa scale for kinesiophobia)
a
Naghdi S, Nakhostin Ansari N, et al. Cross-cultural adaptation and validation of the Injury-Psychological Readiness to Return to Sport scale to Persian language. Physiother
Theory Pract. 2016;32(7):528–35
b
Walker, Natalie et al. A preliminary development of the Re-Injury Anxiety Inventory (RIAI). Physical Therapy in Sport, Volume 11, Issue 1, 23–29
c
Roelofs J, van Breukelen G, et al. Norming of the Tampa Scale for Kinesiophobia across pain diagnoses and various countries. Pain. 2011;152:1090–1095
A. Grassi et al.
46  Common Scales and Checklists in Sports Medicine Research 465

by 10 to calculate the total score. The range of


scores is between 0 and a maximum score of 60. Clinical Vignette
A score of 60 indicates high confidence to return An innovative and minimally invasive
to sport; 40, moderate confidence; and 20, low surgical technique for complex osteo-
confidence [39]. chondral knee lesions was developed in
Conditions: Evaluation of psychological your institution. From the first outpatient
readiness to return to sport among athletes. follow-ups, you realize that the subjects
treated with this new technique seem to
be very happy about their health status
46.11.2  R
 e-injury Anxiety Inventory and knee function. Finally, you are
(RIAI) charged to design a study protocol to
compare the result of the new technique
The RIAI is a 28-item score designed to assess with the classic one. Beside an accurate
the athlete’s fear of experiencing a re-injury. It is imaging of the bone and cartilage and
composed of the rehabilitation anxiety (RIA-R, maybe a biochemical characterization of
13 items) and the reentry to competition anxiety the fluids, which clinical outcomes mea-
(RIA-RE, 15 items). The instrument is based on a sures can be included in our protocol?
four-point (0–3) Likert-response type; the final Which ones are more indicated to detect
score ranges from 0 (complete absence of anxi- an effective improvement in patient
ety) to 45 (extreme anxiety) [109]. conditions?
Conditions: RIAI can be used in studies aim- First one or more knee specific mea-
ing to evaluate athlete’s psychological readiness sures should be chosen. In this pathology,
for RTS. the KOOS score has demonstrated good
psychometric properties, and the WOMAC
score has an excellent reliability and ability
46.11.3  Tampa Scale to detect changes.
for Kinesiophobia (TSK) Secondly, we want to assess the
patient’s perceived pain level and health
The TSK is a self-reported measure developed to status. For this purpose, the SF-MPQ
assess “fear of movement-related pain” in score for pain is accurate and easy to com-
patients with musculoskeletal disorders. The plete; moreover it includes a VAS for gen-
original test, developed in English 1, has been eral pain assessment, while the SF-12
translated into ten languages. The TSK-11 is the score has a short-time compilation and
most widely used; it contains 11 items from the will give a precise overview of patient’s
original 17-item questionnaire. Each item is general health.
scored on a four-point Likert scale, ranging from Finally, since most of our patients used
1 “strongly disagree” to 4 “strongly agree”; total to be quite active before injury, our proto-
scores vary between 11 and 44, with higher col should include at least one activity
scores indicating higher levels of fear of move- level measure. We choose the Tegner score,
ment-related pain [96]. which was specifically designed for knee
Conditions: The TSK-11 is a reliable and injuries and is extremely intuitive in its
valid measurement tool that provides therapists compilation, and the LEFS score which
valuable information on activity avoidance and can be used for several lower limb patholo-
pathological somatic focus in patients with mus- gies and shows good reliability.
culoskeletal pain.
466 A. Grassi et al.

Take-Home Message by children: development, initial validation and


preliminary investigation for ratio scale properties.
• The development, testing, and implementa- Pain. 1990;41:139–50.
tion of tools to aid in the measurement of phe- 10. Binkley JM, Stratford PW, et  al. The Lower
nomena in medicine are central to clinical Extremity Functional Scale (LEFS): scale develop-
practice and clinical research; therefore ment, measurement properties, and clinical applica-
tion. Phys Ther. 1999;79:371–83.
PROMs are a key component to orthopedics 11. Brophy RH, Beauvais RL, et  al. Measurement of
research and may also be so for clinical prac- shoulder activity level. Clin Orthop Relat Res.
tice in orthopedic surgery. 2005;439:101–8.
• Health status measurement instruments must 12. Brophy RH, Levy B, et  al. Shoulder activity level
varies by diagnosis. Knee Surg Sports Traumatol
possess adequate measurement properties, Arthrosc. 2009;17:1516–21.
and it is fundamental to remember that a com- 13. Budiman-Mak E, Conrad K, et  al. Theoretical
plete picture of a condition or a treatment model and Rasch analysis to develop a revised Foot
effect on a patient could be provided only with Function Index. Foot Ankle Int. 2006;27(7):519–27.
14. Budiman-Mak E, Conrad KJ, et  al. The Foot
the combination of a “disease-specific mea- Function Index: a measure of foot pain and disabil-
sure” and a “generic measure.” ity. J Clin Epidemiol. 1991;44(6):561–70.
15. Busija L, Osborne RH, et al. Magnitude and mean-
ingfulness of change in SF-36 scores in four types
of orthopaedic surgery. Health Qual Life Outcomes.
References 2008;6:55.
16. Celik D.  Psychometric properties of the Mayo
1. Aichroth PM, Cannon WD Jr. International Knee Elbow Performance Score. Rheumatol Int.
Documentation Committee: knee ligament injury 2015;35(6):1015–20.
and reconstruction 209 evaluation. Knee surgery 17. Cella D, Riley W, et  al. Initial Adult Health Item
current practice. New  York: Raven Press; 1992. Banks and First Wave Testing of the Patient-
p. 759–60. Reported Outcomes Measurement Information
2. Amstutz HC, Sew Hoy AL, Clarke IC. UCLA ana- System (PROMISTM) network: 2005–2008. J Clin
tomic total shoulder arthroplasty. Clin Orthop. Epidemiol. 2010;63(11):1179–94.
1981;155:7–20. 18. Christensen CP, Althausen PL, Mittleman MA, et al.
3. Angst F, Schwyzer HK, et  al. Measures of adult The nonarthritic hip score: reliable and validated.
shoulder function. Arthritis Care Res. 2011;63(Suppl Clin Orthop Relat Res. 2003;406:75–83.
11):S174–88. 19. Chung KC, Pillsbury MS, Walters MR, Hayward
4. Barber-Westin SD, Noyes FR, et  al. Rigorous sta- RA. Reliability and validity testing of the Michigan
tistical reliability, validity, and responsiveness test- Hand Outcomes Questionnaire. J Hand Surg Am.
ing of the Cincinnati knee rating system in 350 1998;23(4):575–87.
subjects with uninjured, injured, or anterior cruciate 20. Clarsen B, Myklebust G, et  al. Development and
ligament-reconstructed knees. Am J Sports Med. validation of a new method for the registration of
1999;27(4):402–16. overuse injuries in sports injury epidemiology: the
5. Beaton DE, Wright JG, Katz JN, The Upper Oslo Sports Trauma Research Centre (OSTRC)
Extremity Collaborative Group. Development of Overuse Injury Questionnaire. Br J Sports Med.
the QuickDASH: comparison of three itemreduction 2013;47:495–502.
approaches. J Bone Joint Surg Am. 2005;87:1038–46. 21. Collins NJ, Misra D, Felson DT, Crossley KM, Roos
6. Bellamy N. WOMAC Osteoarthritis Index user guide. EM. Measures of knee function: International Knee
London: University of Western Ontario; 1995. p. 79. Documentation Committee (IKDC) Subjective Knee
7. Bellamy N.  WOMAC Osteoarthritis Index user Evaluation Form, Knee Injury and Osteoarthritis
guide. Version V.  Brisbane: CONROD, The Outcome Score (KOOS), Knee Injury and
University of Queensland; 2002. Osteoarthritis Outcome Score Physical Function
8. Bennett PJ, Patterson C, et  al. Development and Short Form (KOOS-PS), Knee Outcome Survey
validation of a questionnaire designed to mea- Activities of Daily Living Scale (KOS-ADL),
sure foot-health status. J Am Podiatr Med Assoc. Lysholm Knee Scoring Scale, Oxford Knee Score
1998;88:419–28. (OKS), Western Ontario and McMaster Universities
9. Bieri D, Reeve R, et al. The Faces Pain Scale for the Osteoarthritis Index (WOMAC), Activity Rating
self-assessment of the severity of pain experienced Scale (ARS), and Tegner Activity Score (TAS).
46  Common Scales and Checklists in Sports Medicine Research 467

Arthritis Care Res (Hoboken). 2011;63(Suppl 38. Fix AJ, Daughton DM. Human activity profile pro-
11):S208–28. https://doi.org/10.1002/acr.20632. fessional manual. Odessa: Psychological Assessment
22. Constant CR, Gerber C, Emery RJ, Sojbjerg JO, Resources, Inc.; 1988.
Gohlke F, Boileau P. A S186 Angst et al review of 39. Glazer DD.  Development and preliminary vali-
the Constant score: modifications and guidelines for dation of the Injury-Psychological Readiness to
its use. J Shoulder Elb Surg. 2008;17:355–61. Return to Sport (I-PRRS) Scale. J Athl Train.
23. Constant CR, Murley AH.  A clinical method of 2009;44(2):185–9.
functional assessment of the shoulder. Clin Orthop 40. Goldstein CL, Schemitsch E, et  al. Comparison of
Relat Res. 1987;214:160–4. 56. different outcome instruments following foot and
24. Cook JJ, Cook EA, et al. Validation of the American ankle trauma. Foot Ankle Int. 2010;31(12):1075–80.
College of Foot and Ankle Surgeons Scoring Scales. 41. Golightly YM, Devellis RF, et  al. Psychometric
J Foot Ankle Surg. 2011;50(4):420–9. properties of the foot and ankle outcome score
25. Cooney WP, Bussey R, Dobyns JH, Linscheid in a community-based study of adults with
RL.  Difficult wrist fractures. Perilunate fracture- and without osteoarthritis. Arthritis Care Res.
dislocations of the wrist. Clin Orthop Relat Res. 2014;66(3):395–403.
1987;(214):136–47. 42. Halasi T, Kynsburg A, et al. Development of a new
26. Craig CL, Marshall AL, et  al. International physi- activity score for the evaluation of ankle instability.
cal activity questionnaire: 12-country reliability and Am J Sports Med. 2004;32:899–908.
validity. Med Sci Sports Exerc. 2003;35(8):1381–95. 43. Hamamoto Y, Ito H, et  al. Cross-cultural adapta-
27. Crossley KM, Bennell KL, et  al. Analysis of out- tion and validation of the Japanese version of the
come measures for persons with patellofemoral new Knee Society Scoring System for osteoarthritic
pain: which are reliable and valid? Arch Phys Med knee with total knee arthroplasty. J Orthop Sci.
Rehabil. 2004;85(5):815–22. 2015;20(5):849–53.
28. Dacombe PJ, Amirfeyz R, Davis T. Patient-reported 44. Harris WH. Traumatic arthritis of the hip after dis-
outcome measures for hand and wrist trauma: is there location and acetabular fractures: treatment by
sufficient evidence of reliability, validity, and respon- mold arthroplasty. An end-result study using a new
siveness? Hand (New YorkNY). 2016;11(1):11–21. method of result evaluation. J Bone Joint Surg Am.
29. Daughton DM, Fix AJ, et al. Maximum oxygen con- 1969;51:737–55.
sumption and the ADAPT quality-of life scale. Arch 45. Hasenstein T, Greene T, et  al. A 5-year review of
Phys Med Rehabil. 1982;63:620–2. clinical outcome measures published in the Journal
30. Dawson J, Doll H, Boller I, et al. The development of the American Podiatric Medical Association and
and validation of a patientreported questionnaire to the Journal of Foot and Ankle Surgery. J Foot Ankle
assess outcomes of elbow surgery. J Bone Joint Surg Surg. 2017;56(3):519–21.
Br. 2008;90:466–73. 46. Hawker GA, Mian S, et al. Measures of adult pain:
31. Dawson J, Fitzpatrick R, Carr A, Murray Visual Analog Scale for Pain (VAS Pain), Numeric
D.  Questionnaire on the perceptions of patients Rating Scale for Pain (NRS Pain), McGill Pain
about total hip replacement. J Bone Joint Surg Br. Questionnaire (MPQ), Short Form McGill Pain
1996;78:185–90. Questionnaire (SF MPQ), Chronic Pain Grade Scale
32. Dawson J, Fitzpatrick R, Carr A. Questionnaire on (CPGS), Short Form 36 Bodily Pain Scale (SF 36
the perceptions of patients about shoulder surgery. J BPS), and Measure of Intermittent and Constant
Bone Joint Surg Br. 1996;78:593–600. Osteoarthritis Pain (ICOAP). Arthritis Care Res.
33. Dawson J, Fitzpatrick R, Carr A.  The assessment 2011;63:S240–52.
of shoulder instability. The development and vali- 47. Herdman M, Gudex C, et  al. Development and
dation of a questionnaire. J Bone Joint Surg Br. preliminary testing of the new five-level ver-
1999;81:420–6. sion of EQ-5D (EQ-5D-5L). Qual Life Res.
34. Dawson J, Fitzpatrick R, Murray D, Carr 2011;20(10):1727–36.
A.  Questionnaire on the perceptions of patients 48. Hernandez-Sanchez S, Hidalgo MD, et  al.
about total knee replacement. J Bone Joint Surg Br. Responsiveness of the VISA-P scale for patel-
1998;80:63–9. lar tendinopathy in athletes. Br J Sports Med.
35. Dawson J, Rogers K, Fitzpatrick R, Carr A.  The 2014;48(6):453–7.
Oxford Shoulder Score revisited. Arch Orthop 49. Hicks CL, von Baeyer CL, et  al. The Faces Pain
Trauma Surg. 2009;129:119–23. Scale – Revised: toward a common metric in pediat-
36. Dreiser R, Maheu E, Guillou GB, Caspard H, ric pain measurement. Pain. 2001;93:173–83.
Grouin JM.  Validation of an algofunctional index 50. Hunt KJ, Hurwit D. Use of patient-reported outcome
for osteoarthritis of the hand. Rev Rhum Engl Ed. measures in foot and ankle research. J Bone Joint
1995;62:43S–53S. Surg Am. 2013;95(16):e118(1–9.
37. Ferreira-Valente MA, Pais-Ribeiro JL, et  al. 51. Hunt KJ, Lakey E. Patient-reported outcomes in
Validity of four pain intensity rating scales. Pain. foot and ankle surgery. Orthop Clin North Am.
2011;152(10):2399–404. 2018;49(2):277–89.
468 A. Grassi et al.

52. American Academy of Orthopaedic Surgeons. 64. Kitaoka HB, Alexander I, et al. Clinical rating sys-
Instruments for collection of orthopaedic quality tems for the ankle-hindfoot, midfoot, hallux, and
data; 2016. lesser toes. Foot Ankle Int. 1994;15(7):349–53.
53. Irrgang JJ, Anderson AF, Boland AL, Harner CD, 65. Klassbo M, Larsson E, Mannevik E. Hip Disability
Kurosaka M, Neyret P, et  al. Development and and Osteoarthritis Outcome Score: an extension of
validation of the International Knee Documentation the Western Ontario and McMaster Universities
Committee subjective knee form. Am J Sports Med. Osteoarthritis Index. Scand J Rheumatol.
2001;29:600–13. 2003;32:46–51.
54. ISAKOS Scientific Committee, Audigé L, Ayeni 66. Kujala UM, Jaakola LH, Koskinen SK, Taimela S,
OR, Bhandari M, Boyle BW, Briggs KK, Chan K, Hurme M, Nelimarkka O. Scoring of patellofemoral
Chaney-Barclay K, Do HT FM, Fu FH, Goldhahn J, disorders. Arthroscopy. 1993;9:159–63.
Goldhahn S, Hidaka C, Hoang-Kim A, Karlsson J, 67. Lee JS, Hobden E, et al. Clinically important change
Krych AJ, RF LP, Levy BA, Lubowitz JH, Lyman S, in the visual analogue scale after adequate pain con-
Ma Y, Marx RG, Mohtadi N, Marcheggiani Muccioli trol. Acad Emerg Med. 2003;10(10):1128–30.
GM, Nakamura N, Nguyen J, Poehling GG, Roberts 68. Lequesne M.  Indices of severity and disease activ-
LE, Rosenberg N, Shea KP, Sohani ZN, Soudry M, ity for osteoarthritis. Semin Arthritis Rheum.
Voineskos S, Zaffagnini S, International Society of 1991;20(Suppl 2):48–54.
Arthroscopy, Knee Surgery and Orthopaedic Sports 69. Lequesne MG, Mery C, Samson M, Gerard
Medicine. A practical guide to research: design, P.  Indexes of severity for osteoarthritis of the hip
execution, and publication. Arthroscopy. 201127(4 and knee: validation—value in comparison with
Suppl):S1–112. other assessment tests. Scand J Rheumatol Suppl.
55. Jenkinson C, Fitzpatrick R, et  al. The Nottingham 1987;65:85–9.
Health Profile: an analysis of its sensitivity in 70. Lequesne MG. The algofunctional indices for hip and
differentiating illness groups. Soc Sci Med. knee osteoarthritis. J Rheumatol. 1997;24:779–81.
1988;27:1411–4. 71. Lippitt SB, Harryman DT, Matsen FA.  A practical
56. Johanson NA, Liang MH, et al. American Academy tool for evaluation of function: the Simple Shoulder
of Orthopaedic Surgeons lower limb outcomes assess- Test. In: Matsen FA, Fu FH, Hawkins RJ, editors.
ment instruments. Reliability, validity, and sensitivity The shoulder: a balance of mobility and stabil-
to change. J Bone Joint Surg Am. 2004;86-a(5):902–9. ity. Rosemont: American Academy of Orthopedic
57. Kemp JL, Collins NJ, Roos EM, Crossley Surgeons; 1993. p. 545–59.
KM.  Psychometric properties of patient-reported 72. Lo IKY, Griffin S, Kirkley A. The development and
outcome measures for hip arthroscopic surgery. Am evaluation of a disease-specific quality of life mea-
J Sports Med. 2013;41:2065–73. surement tool for osteoarthritis of the shoulder: The
58. Kennedy CA, Beaton DE, Solway S, McConnell Western Ontario Osteoarthritis of the Shoulder Index
S, Bombardier C.  The DASH outcome measure (WOOS). Osteoarthritis Cartilage. 2001;9:771–8.
user’s manual. 3rd ed. Toronto: Institute for Work & 73. Longo G, Franceschi F, Loppini M, Maffulli N,
Health; 2011. Denaro V.  Rating systems for evaluation of the
59. Kersten P, White PJ, et al. Is the pain visual analogue elbow. Br Med Bull. 2008;87(1):131–61.
scale linear and responsive to change? An exploration 74. Lysholm J, Gillquist J. Evaluation of knee ligament
using Rasch analysis. PLoS One. 2014;9(6):e99485. surgery results with special emphasis on use of a
60. Kirkley A, Griffin S, Alvarez C.  The development scoring scale. Am J Sports Med. 1982;10:150–4.
and evaluation of a disease-specific quality of life 75. Macdermid J.  Update: the Patient-rated Forearm
measurement tool for rotator cuff disease: The Evaluation Questionnaire is now the Patient-
Western Ontario Rotator Cuff Index (WORC). Clin J rated Tennis Elbow Evaluation. J Hand Ther.
Sport Med. 2003;13:84–92. 2005;18:407–10.
61. Kirkley A, Griffin S, et al. Scoring systems for the 76. Martin R, Burdett R, et al. Development of the Foot
functional assessment of the shoulder. Arthroscopy. and Ankle Disability Index (FADI). J Orthop Sports
2003;19(10):1109–20. Phys Ther. 1999;29(1):A32–3.
62. Kirkley A, Griffin S, McLintock H, Ng L. The devel- 77. Martin RL, Irrgang JJ, et al. Evidence of validity for
opment and evaluation of a disease-specific quality the Foot and Ankle Ability Measure (FAAM). Foot
of life measurement tool for shoulder instability: the Ankle Int. 2005;26(11):968–83.
Western Ontario Shoulder Instability Index (WOSI). 78. Marx RG, Stump TJ, et al. Development and evalu-
Am J Sports Med. 1998;26:764–72. ation of an activity rating scale for disorders of the
63. Kirkley A, Werstine R, Ratjek A, Griffin knee. Am J Sports Med. 2001;29(2):213–8.
S.  Prospective randomized clinical trial comparing 79. Melzack R.  The short-form McGill Pain
the effectiveness of immediate arthroscopic stabili- Questionnaire. Pain. 1987;30:191–7.
zation versus immobilization and rehabilitation in 80. Michener LA, McClure PW, Sennett BJ. American
first traumatic anterior dislocations of the shoulder: Shoulder and Elbow Surgeons Standardized
long-term evaluation. Arthroscopy. 2005;21:55–63. Shoulder Assessment Form, patient self-report
46  Common Scales and Checklists in Sports Medicine Research 469

section: reliability, validity, and responsiveness. J Centre for Health Economics, Monash University;
Shoulder Elb Surg. 2002;11(6):587–94. 2011.
81. Mohtadi NGH, Griffin DR, Pedersen ME, et  al. 94. Roach KE, Budiman-Mak E, Songsiridej N,
The development and validation of a self-adminis- Lertratanakul Y.  Development of a Shoulder
tered quality-of-life outcome measure for young, Pain and Disability Index. Arthritis Care Res.
active patients with symptomatic hip disease: 1991;4:143–9.
the International Hip Outcome Tool (iHOT-33). 95. Robinson JM, Cook JL, et  al. The VISA-A ques-
Arthroscopy. 2012;28(5):595–610.e1. tionnaire: a valid and reliable index of the clinical
82. Morrey BF, An KN.  Functional evaluation of the severity of Achilles tendinopathy. Br J Sports Med.
elbow. In: Morrey BF, editor. The elbow and its dis- 2001;35(5):335–41.
orders. Philadelphia: WB Saunders; 1993. 96. Roelofs J, van Breukelen G, et  al. Norming of
83. Murray DW, Fitzpatrick R, Rogers K, Pandit H, the Tampa Scale for Kinesiophobia across pain
Beard DJ, Carr AJ, et  al. The use of the Oxford diagnoses and various countries. Pain. 2011;152:
Hip and Knee Scores. J Bone Joint Surg Br. 1090–5.
2007;89:1010–4. 97. Rompe JD, Overend TJ, et  al. Validation of
84. Narin S, Unver B, et  al. Cross-cultural adaptation, the Patient-rated Tennis Elbow Evaluation
reliability and validity of the Turkish version of the Questionnaire. J Hand Ther. 2007;20(1):3–10.
Hospital for Special Surgery (HSS) Knee Score. 98. Roos EM, Brandsson S, et al. Validation of the foot
Acta Orthop Traumatol Turc. 2014;48(3):241–8. and ankle outcome score for ankle ligament recon-
85. Nilsdotter AK, Lohmander LS, Klassbo M, Roos struction. Foot Ankle Int. 2001;22(10):788–94.
EM.  Hip Disability and Osteoarthritis Outcome 99. Roos EM, Roos HP, Lohmander LS, Ekdahl C,
Score (HOOS): validity and responsiveness in Beynnon BD.  Knee Injury and Osteoarthritis
total hip replacement. BMC Musculoskelet Disord. Outcome Score (KOOS): development of a self-
2003;4:10. administered outcome measure. J Orthop Sports
86. Nilsdotter A, Bremander A.  Measures of hip func- Phys Ther. 1998;28:88–96.
tion and symptoms: Harris Hip Score (HHS), 100. Rowan K.  The development and validation of a
Hip Disability and Osteoarthritis Outcome Score multi-dimensional measure of chronic foot pain:
(HOOS), Oxford Hip Score (OHS), Lequesne Index the Rowan Foot Pain Assessment Questionnaire
of Severity for Osteoarthritis of the Hip (LISOH), (ROFPAQ). Foot Ankle Int. 2001;22:795–809.
and American Academy of Orthopedic Surgeons 101. Scott J, Huskisson EC.  Graphic representation of
(AAOS) Hip and Knee Questionnaire. Arthritis Care pain. Pain. 1976;2(2):175–84.
Res. 2011;63:S200–7. 102. Scuderi GR, Bourne RB, Noble PC, Benjamin
87. Nilsson, et  al. The Swedish version of OMAS is a JB, Lonner JH, Scott WN.  The New Knee Society
reliable and valid outcome measure for patients Knee Scoring System. Clin Orthop Relat Res.
with ankle fractures. BMC Musculoskelet Disord. 2012;470(1):3–19.
2013;14:109. 103. Seeger J, Weinmann S, et al. The Heidelberg Sports
88. Olerud C, Molander H. A scoring scale for symptom Activity Score - a new instrument to evaluate sports
evaluation after ankle fracture. Arch Orthop Trauma activity. Open Orthopaed J. 2013;7:25–32.
Surg. 1984;103:190–4. 104. SooHoo NF, Vyas R, et  al. Responsiveness of the
89. Perruccio AV, Stefan Lohmander L, Canizares M, foot function index, AOFAS clinical rating systems,
Tennant A, Hawker GA, Conaghan PG, et  al. The and SF-36 after foot and ankle surgery. Foot Ankle
development of a short measure of physical function Int. 2006;27(11):930–4.
for knee OA KOOS-Physical Function Shortform 105. Tegner Y, Lysholm J.  Rating systems in the evalu-
(KOOS-PS): an OARSI/OMERACT initiative. ation of knee ligament injuries. Clin Orthop Relat
Osteoarthr Cartil. 2008;16:542–50. Res. 1985;(198):43–9.
90. Pinsker E, Daniels TR.  AOFAS position statement 106. Thomas JL, Christensen JC, et  al. ACFAS
regarding the future of the AOFAS clinical rating Scoring Scale user guide. J Foot Ankle Surg.
systems. Foot Ankle Int. 2011;32(9):841–2. 2005;44(5):316–35.
91. Ramisetty N, Kwon Y, et  al. Patient-reported out- 107. Thorborg K, Holmich P, Christensen A, Petersen F,
come measures for hip preservation surgery: a sys- Roos EM. The Copenhagen Hip and Groin Outcome
tematic review of the literature. J Hip Preserv Surg. Score (HAGOS): development and validation
2015;2(1):15–27. according to the COSMIN checklist. Br J Sports
92. Richards RR, An KN, Bigliani LU, Friedman RJ, Med. 2011;45:478–91.
Gartsman GM, Gristina AG, et  al. A standardized 108. van der Linde JA, van Kampen DA, et al. The Oxford
method for the assessment of shoulder function. J Shoulder Instability Score; validation in Dutch
Shoulder Elb Surg. 1994;3:347–52. and first-time assessment of its smallest detectable
93. Richardson J, Sinha K, et al. Modeling the utility of change. J Orthop Surg Res. 2015;10:146.
health states with the Assessment of Quality of Life 109. Walker N, et  al. A preliminary development of the
(AQoL) 8D instrument: overview and utility scor- Re-Injury Anxiety Inventory (RIAI). Phys Ther
ing algorithm. Research paper, vol. 63. Melbourne: Sport. 2010;11(1):23–9.
470 A. Grassi et al.

110. Ware J Jr, Kosinski M, et al. A 12-item short-form i­ndividuals with acute lateral ankle sprains. Foot
health survey: construction of scales and prelimi- Ankle Int. 2003;24:274–82.
nary tests of reliability and validity. Med Care. 114. Zahiri CA, Schmalzried TP, et  al. Assessing activ-
1996;34(3):220–33. ity in joint replacement patients. J Arthroplast.
111. Ware JE Jr, Sherbourne CD. The MOS 36-item short- 1998;13(8):890–5.
form health survey (SF-36). I.  Conceptual frame- 115. Zampelis V, Ornstein E, et al. A simple visual ana-
work and item selection. Med Care. 1992;30(6): log scale for pain is as responsive as the WOMAC,
473–83. the SF-36, and the EQ-5D in measuring out-
112. Wendel-Vos GCW, Schuit AJ, et al. Reproducibility comes of revision hip arthroplasty. Acta Orthop.
and relative validity of the short questionnaire to 2014;85(2):128–32.
assess health-enhancing physical activity. J Clin 116. Zwiers R, Weel H, et  al. Large variation in use of
Epidemiol. 2003;56:1163e9. patient-reported outcome measures: a survey of 188
113. Williams GN, Molloy JM, et  al. Evaluation of the foot and ankle surgeons. Foot Ankle Surg. 2017. pii:
Sports Ankle Rating System in young, athletic S1268-7731(17)30050-4.
A Practical Guide to Writing (and
Understanding) a Scientific Paper:
47
Meta-Analyses

Alberto Grassi, Riccardo Compagnoni,
Kristian Samuelsson, Pietro Randelli, Corrado Bait,
and Stefano Zaffagnini

47.1 Introduction evidence are lacking, bias (usually unintentional)


is often a problem [19].
In the era of evidence-based medicine, clinicians Conversely, a systematic review represents a
are continuously facing the massive production higher and unbiased level of evidence, since it
of clinical studies, often with discordant evi- assimilates information about a topic or question
dence. To facilitate the spread of knowledge, nar- with more rigour, sophistication and transparency.
rative reviews, systematic reviews and A systematic review is a formal process for gath-
meta-analyses represent indispensable tools for ering and evaluating literature to answer a specific
summarising the evidence related to a specific question, beginning with the posing of the ques-
topic. tion, the definition of the inclusion and exclusion
In a narrative review, an “expert” summarises criteria of trials and extracting the necessary data
the important aspects relating to a topic. It is from each one. Moreover, methodological quality
assumed that this expert will be objective in pre- evaluation is usually performed [19].
senting the pertinent information. Since a specific A meta-analysis represents a further step, as
research question and a systematic search of the it statistically combines the data from clinical

A. Grassi (*)
Dipartimento Scienze Biomediche e Neuromotorie Department of Orthopaedics, Sahlgrenska University
DIBINEM, Università di Bologna, Bologna, Italy Hospital, Mölndal, Sweden
IIa Clinica Ortopedica e Traumatologica, IRCCS P. Randelli
Istituto Ortopedico Rizzoli, Bologna, Italy 1st Department, Azienda Socio Sanitaria Territoriale
Centro Specialistico Ortopedico Traumatologico
SIGASCOT Arthroscopy Committee, Florence, Italy
Gaetano Pini-CTO, Milan, Italy
e-mail: alberto.grassi3@studio.unibo.it
C. Bait
R. Compagnoni
Società Italiana del Ginocchio Artroscopia Sport
Società Italiana del Ginocchio Artroscopia Sport
Cartilagine Tecnologie Ortopediche (SIGASCOT)
Cartilagine Tecnologie Ortopediche (SIGASCOT)
Arthroscopy Committee, Florence, Italy
Arthroscopy Committee, Florence, Italy
Istituto Clinico Villa Aprica, Como, Italy
1st Department, Azienda Socio Sanitaria Territoriale
Centro Specialistico Ortopedico Traumatologico S. Zaffagnini
Gaetano Pini-CTO, Milan, Italy IIa Clinica Ortopedica e Traumatologica, IRCCS
Istituto Ortopedico Rizzoli, Bologna, Italy
K. Samuelsson
Department of Orthopaedics, Institute of Clinical Laboratori di Biomeccanica e Innovazione
Sciences, The Sahlgrenska Academy, University of Tecnologica, Istituto Ortopedico Rizzoli,
Gothenburg, Gothenburg, Sweden Bologna, Italy

© ISAKOS 2019 471


V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_47
472 A. Grassi et al.

trials, often randomised controlled trials 1. To provide a guide to prepare a meta-analysis


(RCT), obtained from a systematic review of for those clinicians approaching the meta-
the literature. The reason for combining the analysis study design
data is an attempt to increase the ability to see 2. To provide a summary and a source of refer-
a difference between two groups, reducing the ences for those clinicians already familiar
chance of type-II errors (missing the existence with the production of meta-analyses
of a true difference), and to increase the preci- 3. To help the reader understand a meta-analysis
sion of the estimated effect by increasing sam- and interpret its results
ple size. Meta-analyses are therefore a powerful 4. To help the journal reviewer identify the criti-
tool for accumulating and summarising knowl- cal aspects of meta-analyses
edge; however, their use is also controversial,
as there are several critical conditions and For a more detailed guide, consulting the
methodological considerations that could pro- Cochrane Handbook for Systematic Reviews of
duce misleading conclusions. For this reason, a Interventions [15] and the “Preferred Reporting
meta-analysis requires personal judgement and Items for Systematic Reviews and Meta-Analyses
expertise and the implementation of proce- (PRISMA)” [25] is warmly recommended.
dures and quality standards to reduce the risk
of bias that may influence the results [11, 30]
(Fact Box 47.1). 47.2 How a Meta-Analysis Works:
The aim of this review is to provide the reader The Concept of Effect Size
with the basic knowledge to write and understand
a meta-analysis, describing all the methodologi- To summarise, a classical meta-analysis of RCTs
cal steps in its preparation and providing a useful provides a single, overall measurement of the
reference for a detailed, in-depth understanding treatment effect, enhancing the clinical interpre-
of this complex and controversial field. tation of findings across several studies [17]. This
This guide has been developed with four dif- technique can prove especially useful when there
ferent aims: are several similar clinical trials with or without
consistent outcomes or when there are small- to
medium-sized trials with inconclusive results.
Fact Box 47.1 The statistical calculation behind the results of a
• Narrative review: when an “expert” meta-analysis is based on the concept of “effect
summarises the important aspects relat- size”. Effect size is defined as a quantitative mea-
ing to a topic without a systematic pre- sure of the magnitude of a phenomenon, and in
sentation of the evidences. It is usually statistics, it is a parameter that reflects the magni-
source of unintentional bias. tude and direction of the treatment effect for each
• Systematic review: formal process for study. For example, if all the studies in the meta-
gathering and evaluating literature to analysis measure a continuous outcome, such as
answer a specific question, beginning the Lysholm score after single- or double-bundle
with the posing of the question, the defi- ACL reconstruction, the mean difference between
nition of the inclusion and exclusion cri- the two groups can be used as the effect size and
teria of trials and extracting the therefore to express the effects of the treatment.
necessary data from each one. The overall effect size derived from the meta-
• Meta-analysis: process to statistically analysis is calculated by combining the effect
combine the data from clinical trials, sizes of the included studies.
often randomised controlled trials The magnitude and direction of the effect size
(RCT), obtained from a systematic are integrated with the size confidence intervals
review of the literature. (CI). CIs are reported as a probability (e.g. 95%
confidence interval, 95% CI), providing a range
47  A Practical Guide to Writing (and Understanding) a Scientific Paper: Meta-Analyses 473

with an upper and lower boundary that indicates


the precision of the treatment estimates of the effect Fact Box 47.2: Journal of Bone and Joint
size of the included studies. A wider CI, which may Surgery (JBJS) Requirements to Perform a
be caused by a small sample size or by imprecision Meta-analysis [36]
in the measurement (e.g. wide standard deviations –– JBJS will not accept meta-analyses or
in the considered outcome), indicates less precise systematic reviews on the same topic
estimates and could therefore question the applica- published within 5  years unless the
tion of the results in clinical practice. authors can demonstrate that the litera-
ture has dramatically changed.
–– Meta-analyses or systematic reviews
47.3 H
 ow to Start Your Meta- will not be accepted if the same (or
Analysis Properly largely the same) papers are used to
arrive at similar conclusions.
47.3.1 Define a Study Question –– For meta-analyses in which the authors
use statistical methods to combine and
The first and most important step in preparing a summarise results, only summaries of
meta-analysis is to identify an appropriate ques- randomised trials will be accepted.
tion to address [26]. According to the Journal of –– Only studies with sufficient homogene-
Bone and Joint Surgery guidelines (Fact Box ity of inclusion and exclusion criteria
47.2) [36], a meta-analysis should address a will be considered appropriate for
question that has not been considered in the pre- meta-analysis.
vious 5  years, unless the most recent literature –– Authors should familiarise themselves
has changed dramatically. Furthermore, the ques- with reporting checklists such as the
tion at the centre of a meta-analysis should not PRISMA to improve the quality of
have already been answered satisfactorily by the reporting.
results of multiple well-conducted RCTs. Finally,
the question should be addressed to the evalua-
tion of the effects of two different treatments for surements such as the Lysholm score, or dichoto-
the same clinical condition, to allow the inclusion mous variables, such as return to sport [9],
of only randomised or quasi-randomised clinical failures, or good vs. poor outcomes [10, 28],
trials. Typical topics of meta-analyses in ortho- could also be pooled, to obtain a mean value for a
paedic surgery could be a comparison of clinical wider population composed of the sum of the
results and the failure rate after single- or double- populations of single studies. In certain circum-
bundle and hamstring or bone-patellar tendon- stance, with wide and non-heterogeneous popu-
bone anterior cruciate ligament (ACL) lations, statistical calculations could be used to
reconstruction, the re-rupture rate and complica- compare the outcomes for different groups [7];
tions after the conservative or surgical treatment however, the results should be interpreted with
of Achilles tendon ruptures, the re-dislocation extreme caution, since the data derive from single-
rate after a brace or repair after primary patellar arm case series and not from RCTs.
dislocation, clinical outcomes and alignment
after patient-specific instrumentation (PSI) or
conventional total knee arthroplasty (TKA), 47.3.2 Perform an Appropriate
functionality and pain after hyaluronic acid or Literature Search
platelet-rich plasma (PRP) for knee osteoarthri-
tis, to mention just a few. The next practical step is to obtain the largest num-
An unconventional use of a meta-analysis ber of studies relevant to the study question [20].
could be employed to pool together the data from The literature search should usually be performed
single-arm case series studies. Continuous mea- by two independent investigators using at least
474 A. Grassi et al.

three databases, as there is an incomplete overlap


of the results from single databases. The three bib- Fact Box 47.3: What clinicaltrials.gov is?
liographic databases generally regarded as being ClinicalTrials.gov was created because of
the most important sources to search for the reports the Food and Drug Administration
of trials are CENTRAL, PubMed and EMBASE Modernization Act of 1997. It is a Web-
(Excerpta Medica Database). The Cochrane based resource that provides patients, their
Central Register of Controlled Trials (CENTRAL) family members, healthcare professionals,
serves as the most comprehensive source of reports researchers and the public with easy access
of controlled trials. CENTRAL is published as to information on publicly and privately
part of The Cochrane Library and its access is free supported clinical studies on a wide range
of charge. Medical Literature Analysis and of diseases and conditions. Each
Retrieval System Online (MEDLINE) currently ClinicalTrials.gov record presents summary
contains over 16 million references to journal arti- information about a study protocol includ-
cles from the 1950s and onwards. PubMed pro- ing disease, intervention, study design and
vides access to a free version of MEDLINE that contact information for the study locations
also includes up-to-date citations not yet indexed and, for some records, also includes infor-
for MEDLINE.  Lastly, EMBASE currently con- mation on the results of the study.
tains over 12 million records from 1974 and
onwards. EMBASE.com is Elsevier’s own version
of EMBASE which, in addition to the 12 million comprehensiveness and maintaining relevance,
EMBASE records from 1974 and onwards, also as increasing the comprehensiveness of a search
includes over seven million unique records from will reduce its precision and retrieve more non-
MEDLINE from 1966 to the present day. Access relevant articles. Usually, three sets of terms,
to EMBASE is only available by subscription. developed for the healthcare condition,
In addition to the previous three main data- intervention(s) and study design, should be com-
bases, other sources should be searched, such as bined using the Boolean “AND” operator. To be
national and regional databases, tables of con- as comprehensive as possible, it is necessary to
tents of relevant journals (such as the Journal of include a wide range of free-text terms for each
Bone and Joint Surgery, Arthroscopy, the of the selected concepts (e.g. “injury”, “rupture”
American Journal of Sports Medicine, Knee or “lesion” related to the ACL or Achilles ten-
Surgery Sports Traumatology Arthroscopy, don), combined using the Boolean “OR” opera-
Clinical Orthopaedics and Related Research for tor. The function of “truncation” (e.g. Menisc*
the orthopaedic field), grey literature of unpub- for Meniscus or Meniscal) could be a strategy
lished trials and a manual search or the reference for maximising the results. As this is one of the
list of the papers included in the meta-analysis. most important phases of the development of a
As clinical trials with positive results have more meta-analysis, the help of a librarian specialising
chance of being published, there could be a pub- in database searches could be useful.
lication bias. For this reason, many journals now
require every published RCT to be registered at
the clinicaltrials.gov website before their execu- 47.4 How to Obtain
tion, to promote the tracking of each RCT despite the Appropriate Data
its positive or negative results (Fact Box 47.3).
This source should also be searched to minimise 47.4.1 Define Precise Inclusion
the possibility of missing relevant results. and Exclusion Criteria
In terms of the search strategy, the choice of
the key terms should aim to produce the most The most common reference manager software
extensive search possible. However, it is neces- enables the pooling together of all the results
sary to strike a balance between striving for obtained from the systematic search in the
47  A Practical Guide to Writing (and Understanding) a Scientific Paper: Meta-Analyses 475

selected databases and the discarding of all the Table 47.1  Summary of the main items that should be
duplicates. To prepare a precise flow chart of extracted from each study included in a meta-analysis
study selection, all the numbers of studies found, Main items considered in data collection for a
removed and excluded should be noted and meta-analysis
Methods Study design and duration, sequence
justified.
generation, allocation sequence
The crucial step in the phase of study selection concealment, blinding, other concerns
is a clear and precise definition of inclusion and regarding bias, inclusion and
exclusion criteria [26]. Their goal is to create a exclusion criteria
homogeneous study population for the meta- Participants Total number, setting, diagnostic
criteria, age, gender, co-morbidities
analysis. The rationale for their choice should be Interventions Number of intervention groups,
stated, as it may not be apparent to the reader. specific interventions, intervention
Inclusion criteria may be based on study design, details
sample size and characteristics of the subject, Outcomes Outcomes and time points, definition,
unit, scale
type of treatment and follow-up. Examples of
Results Number of participants in each
exclusion criteria include studies not published in intervention group and for each
English or as full-length manuscripts, drop-out outcome, summary data for each
rate (usually >20%), doctoral theses and studies intervention group, eventual subgroup
published in non-peer-reviewed journals. analysis
Miscellaneous Funding source, key conclusion from
Moreover, the outcomes that should be analysed
the study authors
and the way they are presented could be a matter
of inclusion and exclusion criteria (e.g. radio-
graphic evaluation of osteoarthritis according case, the wider population or most recent report
only to the Kellgren-Lawrence scale). This search can be chosen. The main items considered in data
and data extraction (presented below) should be collection are presented in Table 47.1:
performed by two independent investigators; the If all the relevant information cannot be
results should therefore be compared and any dis- obtained from the full text, the study authors
agreement should be resolved by a third indepen- could be contacted to request the missing
dent investigator. Usually, the first screening of desired information. This step could be crucial
all the results is based on title and abstract evalu- if several parameters are missing in the outcome
ation. The full text of the potentially eligible report, since these data are fundamental for the
studies should then be obtained and carefully statistical calculation of the meta-analysis. For
evaluated. dichotomous outcomes, only the number of
patients that experience the outcome and the
total number of patients are needed. Moreover,
47.4.2 Extract All the Relevant categorical values (e.g. Kellgren-Lawrence
Information scale for knee osteoarthritis) should be analysed
as dichotomous variables, after defining a clini-
Two or more authors of a meta-analysis should cally meaningful cut-off value to determine two
abstract information from studies independently. groups. On the other hand, for continuous data,
“Data” is defined as any information about (or the mean values and especially the SDs are
deriving from) a study, including details of meth- needed as well. Since the SD is mandatory for
ods, participants, setting, context, interventions, effect-size calculation, if it is not reported in the
outcomes, results, publications and investigators original study, there are several artefacts to
[13]. Data should only be collected from separate approximate it from the known information,
sets of patients, and the authors should be careful such as standard error (SE), the range of values
to avoid studies that publish the same subjects or or the p-value [14] (Table 47.2). The last option
overlapping groups of subjects that appeared in is to impute the SD, borrowing the SD from a
different studies in duplicate publications. In this similar study included in the meta-analysis,
476 A. Grassi et al.

Table 47.2  Methods to derive the standard deviation from the data usually provided in a scientific paper
Obtain standard deviation from the available data
Obtaining standard deviation from the standard error
 Standard deviation: Standard error ´ sample size
Obtaining standard deviation from the ranges
 Standard deviation: (Upper range − lower range)/4
Obtaining standard deviation from the p-value
 Step 1: from p-value to t-value
   Excel formula: =tinv(P-value,degrees of freedom)
   Degrees of freedom isThe number of patients in an experimental group + control group − 2
 Step 2: from t-value to standard error
   Standard error: (mean of group 1 − means of group 2)/t-value
 Step 3: from standard error to standard deviation
  Standard deviation: Standard error / (1 / patient of group one + 1 / group two )

using the highest or a “reasonably high” SD, or Community, which is available free of charge and
using the average SD. All imputation techniques which has been designed to facilitate preparation
involve making assumptions about unknown of protocols and full reviews, including text,
statistics introducing a source of bias, and it is characteristics of studies, comparison tables and
best to avoid using them wherever possible. If study data. Moreover, it can perform meta-analysis
most of the studies in a meta-analysis have of the data entered and present the results graphi-
missing standard deviations, these values should cally. Another valid free alternative is
not be imputed, and the meta-analysis cannot be OpenMetaAnalyst, which is a simple open-
performed. A narrative presentation of the source software with an Excel-like interface for
results is then preferable. On the other hand, performing meta-analyses of binary, continuous
when SDs are derived, a sensitivity analysis (see or diagnostic data, using a variety fixed- and
further section) is recommended to establish random-effect methods, including Bayesian and
whether the derived imprecision could affect the maximum likelihood analysis. It also enables to
result of the meta-analysis and the effect of a perform meta-regression analysis, meta-analysis
treatment. of proportions and continuous variables from
single-arm studies and cumulative, leave-one-out
or subgroup analyses. Another simple software is
47.5 How to Analyse MedCalc (MedCalc Software, Ostend, Belgium),
the Obtained Data which offers also the possibility to perform a
multitude of statistical tests and analysis, with a
47.5.1 Choose Appropriate variety of graphic representations. However, its
Statistical Software options for meta-analysis purposes are limited,
and moreover it is only available freely as a
The tabulation of data and the calculation of sim- sample.
ple medians, means and SDs are possible with
Microsoft Excel or the equivalent. However, to
perform an appropriate statistical analysis of a 47.5.2 Define Correct Effect-Size
meta-analysis, dedicated software is necessary. Measurement
One of the most frequently used is Review
Manager (RevMan, Copenhagen: The Nordic As previously stated, the outcomes could be pre-
Cochrane Centre, The Cochrane Collaboration, sented in two ways: dichotomous or continuous.
2014), the official software of the Cochrane Based on this, the effect size of the treatment
47  A Practical Guide to Writing (and Understanding) a Scientific Paper: Meta-Analyses 477

should be presented properly, using one or more ratio in the context of a typical control group risk.
of the following parameters [5, 17]. Attention should be paid to not misinterpreting
The risk ratio (RR): also defined as the relative RR and OR.
risk, is the ratio of the risk of an event in the two The risk difference (RD): is defined as the dif-
groups. It is used for dichotomous outcomes and ference between the observed risks (proportions
ranges from 0 to infinity. The RR describes the of individuals with the outcome of interest)
multiplication of the risk that occurs using the between the two groups of experimental and
experimental intervention. For example, an RR control treatment. It is used for dichotomous out-
of 5 for a treatment implies that events with treat- comes and could be comprised between −1 and
ment are five times more likely than events with +1 (which means that it could be easily con-
the control treatment. Alternatively, we can say verted to per cent by multiplying it by 100). Like
that the experimental treatment increases the risk the RR, the clinical importance of an RD may
of events by 100  ×  (RR  −  1)%  =  400%. depend on the underlying risk of events in the
Conversely, an RR of 0.20 is interpreted as the control treatment, since an RD of 0.05 (or 5%)
probability of an event with experimental treat- may represent a small, clinically insignificant
ment being a fifth of that with control treatment. change from a risk of 55 to 60% or a proportion-
Alternatively, this reduction could be expressed ally much larger and potentially important
using the relative risk reduction (RRR) (see change from 1 to 6%.
below). The interpretation of the clinical impor- The number needed to treat (NNT): is defined
tance of a given RR should be made considering as the expected number of people who need to
the typical risk of events with the control treat- receive the experimental treatment to obtain one
ment, since an RR of 5 could correspond to a additional person either incurring or avoiding the
clinically important increase in events from 10 to considered event. It is used for dichotomous out-
50%, or a small, less clinically important increase comes and is always a positive number, usually
from 1 to 5%. rounded up to the nearest whole number. It is
The relative risk reduction (RRR): is an alter- derived from the absolute value of |RD| and is
native way of re-expressing the RR as a percent- calculated as 1/|RD|. An RD of 0.23 therefore
age of the reduction of the relative risk after the corresponds to an NNT of 4.34, which is rounded
experimental treatment compared with the con- up to 5, and it means that “it is expected that one
trol treatment. For example, an RR of 0.20 is additional (or less) patient will incur the event for
interpreted as the probability of an event with every five participants receiving the experimental
experimental treatment being a fifth of that with intervention rather than control”. Since the NNT
control treatment. Since the RRR is calculated as gives an “expected value”, it does not imply that
100 × (1 − RR)%, we can therefore say that the one additional event will necessarily occur in
experimental treatment reduces the risk of events every group of “n” patients treated with the
by 80%. experimental procedure.
The odds ratio (OR): considering “odds” as The mean difference (MD): is defined as the
the ratio between the probability that a particular absolute difference between the mean value of
event will occur and the probability that it will the experimental and control group. It is used for
not occur, the OR is the ratio of the odds of an continuous outcome measured with the same
event in the two groups. It is used for dichoto- scale and estimates the amount by which the
mous outcomes and ranges from 1 to infinity. The experimental intervention changes the outcome
OR is more difficult to interpret, as it describes on average compared with the control.
the multiplication of the odds of the outcome that The standardised mean difference (SMD): is
occur using the experimental intervention. To defined as the size of the intervention effect in each
understand what an OR means in terms of study relative to the variability observed in the
changes in the numbers of events, it is simplest to study. Since it is used when the same outcome is
convert to it into an RR and then interpret the risk measured with different scales (e.g. knee function
478 A. Grassi et al.

measured with Lysholm, subjective IKDC or no empirically developed cut-off points to deter-
KOOS scores), the results should be statistically mine when there is too much heterogeneity to
standardised to a uniform scale before being com- perform a meta-analysis, and it is left to the
bined. Care should be taken when the scales have a author’s discretion to determine whether a meta-
different “direction” (e.g. a scale increase while the analysis is appropriate, based on the results of the
others decrease with disease severity); in this case, test of heterogeneity and clinical judgement.
multiplying for −1 should be used when needed, to
ensure that all the scales point in same direction.
47.5.4 Apply Strategies to Address
High Heterogeneity:
47.5.3 Identify and Measure The Random Effect and Others
Heterogeneity
Since high heterogeneity implies dissimilarity in
Heterogeneity is a term used to describe the vari- the studies, a meta-analysis should be conducted
ability of studies. Variability in the studied par- with caution, and several strategies should be
ticipants (e.g. males or females, old adult or implemented to consider this situation. These
adolescent patients), interventions (open or mini- should be applied after checking whether data are
mally invasive procedures, different grafts for correct and no errors have been made in data
reconstructions) and outcomes (objective or sub- extraction [4, 5].
jective) may be described as clinical heterogene- Perform random-effect meta-analysis: the two
ity, variability in study design and risk of bias most frequently used models to conduct a meta-
may be described as methodological heterogene- analysis are the fixed- (Mantel-Haenszel, inverse
ity, while the variability in the treatment effects variance or Peto methods) [24] and random-
in the different studies is known as statistical het- effect (DerSimonian and Laird method) [6] mod-
erogeneity [17, 21]. The last one is usually a con- els. The fixed-effect model investigates the
sequence of clinical or methodological diversity, question “What is the best estimate of the popula-
or both, between the studies. Studies with meth- tion effect size?”, assuming a common treatment
odological flaws and small studies may overesti- effect and that the differences between studies
mate treatment effects and can contribute to are due to chance. This model should be used
statistical heterogeneity. The statistical heteroge- with low heterogeneity, as it gives greater weight
neity should therefore be examined and quanti- to larger studies. On the other hand, the rando-
fied using statistical tests, to implement measures meffect model investigates the question “What
to reduce the risk of bias [4]. The chi-square test is the average treatment effect?”, assuming the
(χ2) assesses whether observed differences in distribution of the treatment effect along a range
results are compatible with chance alone: a low of values. This model should be used with high
p-value provides evidence of heterogeneity in heterogeneity that cannot be explained, as it
intervention effects. Since this measurement did assumes that the treatment effect in the different
not provide the “amount” of heterogeneity, quan- studies is not identical due to clinical and meth-
tification according to the Higgins I2 statistic odological heterogeneity, and this model there-
should be performed. This test produces a fore gives less weight to larger studies. If the
0–100% value that represents the percentage of heterogeneity is not extreme, they frequently lead
total variation across studies due to heterogene- to similar results. On the other hand, they can
ity. According to the Cochrane Guidelines, this produce different conclusions, and the use of
value is interpreted as follows: 0–40% heteroge- fixed- or random-effect models should therefore
neity might be not important, 30–60% may repre- be carefully considered based on the amount of
sent moderate heterogeneity, 50–90% may heterogeneity, despite no guidelines existing in
represent substantial heterogeneity and 75–100% this direction (random effect is usually used with
considerable heterogeneity. However, there are I2 > 50%) (Fig. 47.1).
a Surgical Conservative Risk Ratio Risk Ratio
Study or Subgroup Events Total Events Total Weight M-H, Fixed, 95% CI M-H, Fixed, 95% CI
Ampollini et al. 86 800 117 800 58.8% 0.74 [0.57, 0.95]
Bait et al. 11 20 5 20 2.5% 2.20 [0.93, 5.18]
FIXED
Carulli et al. 2 15 6 15 3.0% 0.33 [0.08, 1.39]
Compagnoni et al. 5 33 6 33 3.0% 0.83 [0.28, 2.46]
Ferrua et al. 7 45 3 45 1.5% 2.33 [0.64, 8.46]
Fravisini et al. 14 51 8 51 4.0% 1.75 [0.80, 3.81]
Grassi et al. 4 23 6 23 3.0% 0.67 [0.22, 2.05]
Mazzitelli et al. 21 250 45 250 22.6% 0.47 [0.29, 0.76]
Simonetta et al. 2 11 3 11 1.5% 0.67 [0.14, 3.24]

Total (95% CI) 1248 1248 100.0% 0.76 [0.63, 0.93]


Total events 152 199
Heterogeneity: Chi2 = 18.53, df = 8 (P = 0.02), I2 = 57%
Test for overall effect: Z = 2.70 (P = 0.007) 0.02 0.1 1 10 50
Favours [Surgical] Favours [Conservative]

b Surgical Conservative Risk Ratio Risk Ratio


Study or Subgroup Events Total Events Total Weight M-H, Random, 95% CI M-H, Random, 95% CI
Ampollini et al. 86 800 117 800 22.0% 0.74 [0.57, 0.95]
Bait et al. 11 20 5 20 11.6% 2.20 [0.93, 5.18]
Carulli et al. 2 15 6 15 6.0% 0.33 [0.08, 1.39] RANDOM
Compagnoni et al. 5 33 6 33 8.9% 0.83 [0.28, 2.46]
Ferrua et al. 7 45 3 45 7.0% 2.33 [0.64, 8.46]
Fravisini et al. 14 51 8 51 12.8% 1.75 [0.80, 3.81]
Grassi et al. 4 23 6 23 8.5% 0.67 [0.22, 2.05]
Mazzitelli et al. 21 250 45 250 17.9% 0.47 [0.29, 0.76]
Simonetta et al. 2 11 3 11 5.2% 0.67 [0.14, 3.24]

Total (95% CI) 1248 1248 100.0% 0.89 [0.59, 1.33]


Total events 152 199
Heterogeneity: Tau2 = 0.18; Chi2 = 18.53, df = 8 (P = 0.02), I2 = 57%
Test for overall effect: Z = 0.57 (P = 0.57) 0.02 0.1 1 10 50
47  A Practical Guide to Writing (and Understanding) a Scientific Paper: Meta-Analyses

Favours [Surgical] Favours [Conservative]

Fig. 47.1  Example of forest plots evaluating the risk ratio of treatment failure between circle) (a). If a random-effect method is used to account for the high heterogeneity, the
the conservative and surgical treatment for a fictitious pathology. If a fixed-effect final result changes dramatically: the largest studies and smallest studies are given less
method was used, larger studies are represented with the largest squares (red circle) weight (red dotted line and circle) and more weight (purple dotted line and circle),
proportional to their weight (red line); similarly, the smallest studies have small squares respectively, and this produces an enlargement of confidence intervals for the overall
(purple circle) and weight (purple line). In this specific case, the diamond of the overall effect, which crosses the central line containing the null value (RR  =  1). So, when
effect (blue circle) crosses the central line and its confidence intervals do not contains random effect was used, we can affirm that there is no evidence of a significant effect
the null value (RR = 1); it could therefore be assumed that there is a significant reduc- of surgical treatment compared with conservative treatment in reducing failures
479

tion in failure risk after surgical treatment (P.0.007). However, the results should be (P = 0.57) (b)
interpreted with extreme caution due to the high and significant heterogeneity (green
480 A. Grassi et al.

a Surgical Conservative Risk Ratio Risk Ratio


Study or Subgroup Events Total Events Total Weight M-H, Fixed, 95% CI M-H, Fixed, 95% CI
Ampollini et al. 22 50 8 50 13.3% 2.75 [1.35, 5.58]
Bait et al. 19 75 12 75 20.0% 1.58 [0.83, 3.03]
Carulli et al. 8 20 5 20 8.3% 1.60 [0.63, 4.05]
Compagnoni et al. 5 15 4 15 6.7% 1.25 [0.41, 3.77]
Ferrua et al. 10 30 5 30 8.3% 2.00 [0.78, 5.15]
Fravisini et al. 9 50 8 50 13.3% 1.13 [0.47, 2.68]
Grassi et al. 6 40 6 40 10.0% 1.00 [0.35, 2.84]
Mazzitelli et al. 6 45 7 45 11.7% 0.86 [0.31, 2.35]
Simonetta et al. 6 18 5 18 8.3% 1.20 [0.45, 3.23]

Total (95% CI) 343 343 100.0% 1.52 [1.14, 2.02]


Total events 91 60
Heterogeneity: Chi = 5.70, df = 8 (P = 0.68); I2 = 0%
2

Test for overall effect: Z = 2.83 (P = 0.005) 0.1 0.2 0.5 1 2 5 10


Favours [Surgical] Favours [Conservative]

b Surgical Conservative Risk Ratio Risk Ratio


Study or Subgroup Events Total Events Total Weight M-H, Fixed, 95% CI M-H, Fixed, 95% CI
1.1.1 Open
Ampollini et al. 22 50 8 50 13.3% 2.75 [1.35, 5.58]
Bait et al. 19 75 12 75 20.0% 1.58 [0.83, 3.03]
Carulli et al. 8 20 5 20 8.3% 1.60 [0.63, 4.05]
Compagnoni et al. 5 15 4 15 6.7% 1.25 [0.41, 3.77]
Ferrua et al. 10 30 5 30 8.3% 2.00 [0.78, 5.15]
Subtotal (95% CI) 190 190 56.7% 1.88 [1.31, 2.71]
Total events 64 34
Heterogeneity: Chi2 = 2.04, df = 4 (P = 0.73); I2 = 0%
Test for overall effect: Z = 3.41 (P = 0.0007)

1.1.2 Minimally-Invasive
Fravisini et al. 9 250 8 250 13.3% 1.13 [0.44, 2.87]
Grassi et al. 6 40 6 40 10.0% 1.00 [0.35, 2.84]
Mazzitelli et al. 6 45 7 45 11.7% 0.86 [0.31, 2.35]
Simonetta et al. 6 18 5 18 8.3% 1.20 [0.45, 3.23]
Subtotal (95% CI) 353 353 43.3% 1.04 [0.63, 1.71]
Total events 27 26
Heterogeneity: Chi2 = 0.25, df = 3 (P = 0.97); I2 = 0%
Test for overall effect: Z = 0.16 (P = 0.88)

Fig. 47.2  Example of forest plot evaluating the risk ratio group analysis is performed separating open and mini-
of complications after the conservative or surgical treat- mally invasive surgery, the results differ from the main
ment for a fictitious pathology (a). In this case, it appears analysis (b). In the case of open surgery, the final overall
to be correct to use a fixed-effect model since the hetero- effect is similar to the main analysis (blue line and circle);
geneity is null (green circle); this is confirmed by the con- in the case of minimally invasive surgery, the confidence
cordance of the effect size of most of the studies, which intervals of the overall effect (dotted blue line and circle)
lies on the right side of the forest plot. Since the confi- contain the null value (RR = 1), suggesting that there is no
dence intervals do not contain the null value (RR = 1) (red evidence of a difference in complications after minimally
circle and line), it could be suggested that there is evi- invasive surgery or conservative treatment for the ficti-
dence of an increased risk of complications after surgical tious pathology
treatment for the fictitious pathology. However, if a sub-

Perform a subgroup analysis: subgroup anal- treatment within the subgroups, thereby explain-
yses may be conducted as a means of investigat- ing the heterogeneity to some extent. However,
ing heterogeneous results or to answer specific the subgroup analysis should be limited only to
questions about patient groups, types of inter- restricted cases, since it could increase the risk
vention or types of study (Fig. 47.2). They can of type-II error due to the reduction of patient
be performed for a subset of participants or a cohort size.
subset of studies, and they consider the meta- Perform a meta-regression: meta-regression
analysis results from each group separately. The is an extension of subgroup analyses that allows
non-overlap of the CI usually indicates statisti- the investigation of the effect of continuous or
cal significance and a different effect of the categorical characteristics simultaneously. Its
47  A Practical Guide to Writing (and Understanding) a Scientific Paper: Meta-Analyses 481

role is like that of simple regression, where the subgroup analysis, the sensitivity analysis is
outcome variable is the effect estimate (RR, OR, not designed to estimate the effect of the inter-
MD), and the explanatory variables, or covari- vention in the group excluded from the analy-
ates, are the study characteristics that might sis, so its report should be produced with a
influence the size of the intervention effect. The summary table.
regression coefficient obtained from a meta- Change the measurement of effect: the choice
regression will describe how the outcome vari- of the measurements of effect size may affect the
able changes with a unit increase in the covariate, degree of heterogeneity; however, it is unclear
while its statistical significance describes whether the heterogeneity of intervention effect
whether there is a linear relationship between alone is a suitable criterion for choosing between
them. It should be underlined that the character- the different measurements.
istics to investigate (covariates) should be justi- Exclude studies: since heterogeneity could be
fied by biological and clinical hypotheses and due to the presence of one or two outliers, studies
should be the lowest number possible. One limi- with conflicting results compared with the rest
tation of the meta-regression is that more than could be excluded, as their exclusion could
ten studies in the meta-analysis are generally address the problem of heterogeneity. However,
required for its use. is not appropriate to exclude a study based on its
Perform a sensitivity analysis: while the aim result since it may introduce a bias. It can there-
of the subgroup analysis is to estimate a treat- fore only be removed with confidence if there are
ment effect for a particular subgroup, the aim obvious reasons. Unfortunately, there are no tests
of the sensitivity analysis is to investigate to determine the extent of clinical heterogeneity,
whether the meta-analysis findings change and researchers must decide whether the studies
based on different arbitrary or unclear deci- contributing to a meta-analysis are similar
sions related to the meta-analysis process. The enough clinically to make meta-analysis feasible.
main decisions that can generate the need for a Refining inclusion criteria and excluding studies,
sensitivity analysis could be related to the eligi- even if this reduces heterogeneity, also decreases
bility criteria of the studies (e.g. study design or the total number of articles included on a topic. A
methodological issues), the data analysed (e.g. sensitivity analysis is suggested to check whether
imputation of missing SD) or analysis methods the excluded study/studies could alter the meta-
(fixed or random effect, choice of effect-size analysis results.
measurement). In practical terms, the sensitiv- Do not perform a meta-analysis: if high het-
ity analysis consists of the repetition of the erogeneity cannot be addressed using the pre-
meta-analysis, excluding the studies burdened sented strategies, the investigator should consider
by unclear or arbitrary decisions and in the whether the amount of heterogeneity is so large
informal comparison of the different ways the that the results of the meta-analysis are problem-
same thing is estimated. After the sensitivity atic. In this case, especially when there is incon-
analysis, when the overall conclusions are not sistency in the direction of the treatment effects
affected by the different decisions made during that could make the use of an average value mis-
the review process, more certainty can be leading, meta-analysis should be abandoned, and
assumed. On the other hand, if decisions are the evidence should be fairly expressed in a sys-
identified as influencing the findings, the results tematic review. Another and frequent reason to
must be interpreted with an appropriate degree avoid meta-analytic pooling of data is when too
of caution if it is not possible to improve the few studies with no new findings are obtained
process (Fig.  47.3). As different from the after the systematic search.
a Surgical Conservative Risk Ratio Risk Ratio
482

Study or Subgroup Events Total Events Total Weight M-H, Random, 95% CI M-H, Random, 95% CI
Ampollini et al. 86 800 117 800 22.0% 0.74 [0.57, 0.95]
Bait et al. 11 20 5 20 11.6% 2.20 [0.93, 5.18]
Carulli et al. 2 15 6 15 6.0% 0.33 [0.08, 1.39]
Compagnoni et al. 5 33 6 33 8.9% 0.83 [0.28, 2.46]
Ferrua et al. 7 45 3 45 7.0% 2.33 [0.64, 8.46]
Fravisini et al. 14 51 8 51 12.8% 1.75 [0.80, 3.81]
Grassi et al. 4 23 6 23 8.5% 0.67 [0.22, 2.05]
Mazzitelli et al. 21 250 45 250 17.9% 0.47 [0.29, 0.76]
Simonetta et al. 2 11 3 11 5.2% 0.67 [0.14, 3.24]

Total (95% CI) 1248 1248 100.0% 0.89 [0.59, 1.33]


Total events 152 199
Heterogeneity: Tau2 = 0.18; Chi2 = 18.53, df = 8 (P = 0.02), I2 = 57%
Test for overall effect: Z = 0.57 (P = 0.57) 0.02 0.1 1 10 50
Favours [Surgical] Favours [Conservative]

b
Total (95% CI) 1177 1177 100.0% 0.67 [0.50, 0.90]
Total events 127 186
Heterogeneity: Tau2 = 0.03; Chi2 = 7.24, df = 6 (P = 0.30), I2 = 17%
Test for overall effect: Z = 2.64 (P = 0.008) 0.02 0.1 1 10 50
Favours [Surgical] Favours [Conservative]

Fig. 47.3  Example of forest plots evaluating the risk ratio of treatment failure methodology and a high risk of bias have been found (Bait et al. and Fravisini
between conservative and surgical treatment for a fictitious pathology (a). After a et al.) and excluded through a sensitivity analysis (b). The results of the sensitivity
random-effect meta-analysis due to the high heterogeneity (green circle), the final analysis show a reduction in heterogeneity (green dotted circle) and the narrowing
result is that there is no evidence of effect of surgical treatment in reducing the of the confidence intervals of the overall effect (blue dotted line and circle) that no
risk of failures, since the confidence intervals of the overall effect contain the null longer contain the null value (RR = 1). In this case, the results of the meta-analysis
value (RR  =  1). However, after risk-of-bias assessment, two studies with poor should be interpreted with extreme caution since methodological issues and
biases were able to affect the entity of the overall effect of treatment
A. Grassi et al.
47  A Practical Guide to Writing (and Understanding) a Scientific Paper: Meta-Analyses 483

47.6 H
 ow to Present and Evaluate have little knowledge of the effect. The CI width
the Results for an individual study depends on the sample size,
SD (for continuous outcomes) and risk of the event
47.6.1 Prepare the Forest-Plot (dichotomous outcomes). When the CI crosses the
Graphic for the Main central line (indicating an MD or SMD of 0 and an
Outcomes OR or RR of 1), it is possible that the experimental
or control treatment has the same effect on the eval-
The result section of a meta-analysis should sum- uated outcome. If the effect size of most of the
marise the findings in a clear, logical order, explic- included studies lies on the same side of the
itly addressing the objective of the review [11, 30]. graphic, thus indicating a similar effect of the treat-
The characteristics of methods, participants, inter- ment, the overall heterogeneity is usually low.
vention and outcomes should be reported in a narra- The overall effect size of the meta-analysis is
tive manner or with reference to tables [31]. On the represented by a “diamond”. Its position indicates
other hand, the data analysis is better presented the value of the effect size, while its width indi-
through the so-called “forest-plot” graphic; this is a cates the CI. This width depends on the precision
simple, immediate and visually friendly method for of the individual study estimates, the number
describing the raw data, estimate and CI of the cho- of studies combined and the heterogeneity (in
sen effect measurement, the choice between fixed- random-effect models, precision will decline
or random-effect meta-analysis, the heterogeneity, with increasing heterogeneity). When the 95% CI
the weight of each study and a test for the overall for the effect of the meta-analysis does not cross
effect (Fig. 47.1). Forest plots should not be used the central line, it excludes the null value (MD or
when an outcome has only been investigated in a SMD of 0 and OR or RR of 1), and the p-value of
single study. the overall meta-analysis will therefore be <0.05.
The measurement of the effect of each study In this case, we can affirm that the observed effect
included in the meta-analysis and in the forest plot is very unlikely to have arisen purely by chance
is represented by a square, with the dimension pro- and, as a result, there are differences in the effect
portional to its weight (based on sample size and of experimental and control interventions.
the choice of a fixed- or random-effect model) and The forest plot, in a certain study design,
a horizontal line corresponding to its CI. Since the could present the effect size of continuous or
CI describes a range of values within which we dichotomous outcomes from single-arm case
can be reasonably sure that the true effect lies, a series; in this case, we only have the estimation
narrow CI indicates an effect size that is known of pooled outcomes, without the comparison
precisely, while a very wide CI indicates that we between two treatments (Fig. 47.4).

Studies Estimate (95% C.I.)


Ampollini et al. 90 . 000 (87 . 189, 92 . 811)
Baitet al. 86 . 000 (84 . 131, 87 . 869)
Carulli et al. 88 . 000 (85 . 370, 90 . 630)
Compagnoni et al. 85 . 000 (77 . 301, 92 . 699)
Ferrua et al. 89 . 000 (86 . 093, 91 . 907)
Fravisini et al. 93 . 000 (90 . 977, 95 . 023)
Grassi et al. 91 . 000 (88 . 470, 93 . 530)
Mazzitelli et al. 90 . 000 (88 . 891, 91 . 109)
Simonetta et al. 95 . 000 (85 . 382, 104 . 618)

Overall (I∧2=74.24 % ,P< 0.001) 89 . 547 (87 . 872, 91 . 222)

80 85 90 95 100

Fig. 47.4  Example of a forest plot of a continuous out- treatment is evaluated in the included studies; the main
come from single-arm case series. In this case, no com- result is therefore the mean of the considered score or out-
parison between treatments is made, since a single come, based on the weight of each study
484 A. Grassi et al.

47.6.2 Perform a Methodological • Performance bias: the systematic difference


Assessment and Bias between groups in the care that is provided
Evaluation • Attrition bias: the systematic difference
between groups in withdrawals from a study
The fundamental measurement to guarantee • Detection bias: the systematic difference
credibility in a meta-analysis is the evaluation of between groups in how outcomes are
the methodology and bias of the included stud- determined
ies: this is a necessary step that should not be • Reporting bias: the systematic difference
missed, because it could generate a misinterpre- between reported and unreported findings
tation of the results [12]. First, the level of evi-
According to these types of bias, seven
dence (Table  47.3) should be immediately and
domains are evaluated and rated as a low, unclear
clearly reported. For single-arm case series,
and high risk of bias.
many methodological questionnaires are avail-
able, and one of the most frequently used is the • Random sequence generation (selection bias):
Coleman Score [2] or its modifications [22] describe the method used to generate the allo-
(Table  47.4). For non-randomised controlled cation sequence in sufficient detail to allow an
studies, authors usually refer to the Newcastle- assessment of whether it should produce com-
Ottawa Scale (NOS) [34] or its modifications parable groups.
(Table  47.5). For RCTs, there is also a vast • Allocation concealment (selection bias):
choice of scores and checklists [27], with those describe the method used to conceal the allo-
obtained from the Consolidated Standards for cation sequence in sufficient detail to deter-
Reporting Trials (CONSORT) guidelines mine whether intervention allocations could
regarded as some of the most authoritative [16] have been foreseen in advance of, or during,
(Table 47.6). However, the indispensable action enrolment.
to ensure scientific strictness is the risk-of-bias • Blinding of participants and personnel (per-
evaluation, which is performed using the formance bias): describe all the measures
“Cochrane Risk of Bias Tool” [12]. used, if any, to blind study participants and
A bias is defined as a systematic error (or a personnel to knowledge of the intervention a
deviation from the truth) in results, and it can participant received. Provide any information
lead to an underestimation or overestimation of relating to whether the intended blinding was
the true intervention effect. The types of bias effective. An evaluation should be made for
considered in the Cochrane Risk of Bias Tool are: each main outcome.
• Blinding of outcome assessment (detection
• Selection bias: the systematic difference bias): describe all the measures used, if any, to
between the baseline characteristics of the blind outcome assessors to knowledge of the
groups intervention a participant received. Provide
any information relating to whether the
Table 47.3  List of the five levels of evidence intended blinding was effective. An evaluation
Level of evidence for therapeutic studies should be made for each main outcome.
Level I Randomised controlled trial (RCT) • Incomplete outcome data (attrition bias):
Level Prospective cohort studies (non-randomised describe the completeness of outcome data for
II comparative study) each main outcome, including attrition and
Level Retrospective cohort study (non-randomised
exclusions from the analysis. State whether
III comparative study); case-control study
Level Case series attrition and exclusions were reported, the
IV numbers in each intervention group (com-
Level Mechanism-based reasoning pared with total randomised participants), rea-
V sons for attrition/exclusions where reported
47  A Practical Guide to Writing (and Understanding) a Scientific Paper: Meta-Analyses 485

Table 47.4  Modified Coleman Score used for orthopaedic case series (adapted from: Magnussen RA, Carey JL,
Spindler KP. Does autograft choice determine intermediate-term outcome of ACL reconstruction? Knee Surg Sports
Traumatol Arthrosc. 2011;19(3):462–472)
Modified Coleman Methodology Score
Outcome Option Points
Part A: Only one score to be given for each section (total = 60)
1. Study size: number of patients >120 10
81–120 7
40–80 4
<40 or not stated 0
2. Mean follow-up (years) >6 years 5
3–6 years 3
<3 years, not stated, unclear 0
3. Percentage of patients with follow-up >90% 5
80–90% 3
<80% 0
4. No. of interventions per group or separate outcomes One procedure 10
should be reported More than one procedure but consistent 5
Among all patients in each group
Unclear or multiple interventions 0
Among patients in the same group
5. Type of study Randomised controlled trial 15
Prospective cohort study 10
Retrospective cohort study 5
6. Diagnostic certainty In all 5
In >80% 3
In <80%, not stated, unclear 0
7. Description of surgical technique Technique stated with details 5
Technique named without elaboration 3
Not stated, unclear 0
8. Description of postoperative rehabilitation Well described 5
Described without complete detail 3
Protocol not reported 0
Part B: Scores could be given for each option in each of the three sections (total = 40)
1. Outcome criteria Outcome measurements clearly defined 2
Timing of outcome assessment clear 2
Use of outcome with good reliability 3
Use of outcome with good sensitivity 3
2. Procedure for assessing outcomes Subjects recruited 5
Independent investigators 4
Written assessment 3
Patient-centred data collected 3
3. Description of subject selection process Selection criteria reported and unbiased 5
Recruitment rate reported >80% 5
Eligible subjects not included in the 5
Study satisfactorily accounted for
The total score (max = 100) of the evaluated study is calculated as the sum of Parts A and B
486 A. Grassi et al.

Table 47.5  Modified Newcastle-Ottawa Scale for non-randomised studies (adapted from: http://www.uphs.upenn.
edu/cep/methods/Modified%20Newcastle-Ottawa.pdf)
Question Y\N
Study population:
 1. All study groups derived from similar source/reference populations?
 2. Attrition not significantly different across study groups?
Study validity:
 3. The measurement of exposure is valid?
 4. The measurement of outcome is valid?
 5. Investigators blinded to end-point assessment?
Confounders:
 6. Potential confounders identified (e.g. co-morbidities)
 7. Statistical adjustment for potential confounders made?
 8. Funding source(s) disclosed and no obvious conflict of interests?
The scale does not require a calculation of a numeric score but can be used to map the study characteristics visually

Table 47.6  Reporting quality scale for randomised controlled trials based on the Consolidated Standards for Reporting
Trials (CONSORT) guidelines (adapted from: Huwiler-Müntener K, Jüni P, Junker C, Egger M. Quality of reporting of
randomized trials as a measure of methodologic quality. JAMA. 2002 Jun 5;287(21):2801–4)
Reporting quality scale based on 1996 CONSORT statement
Question Y\N
1. Does the title identify the study as a randomised controlled trial?
2. Is the abstract presented in structured format?
3. Are objectives stated?
4. Is hypothesis stated?
5. Is the study population described?
6. Are the inclusion and exclusion criteria described?
7. Are the interventions described?
8. Are the outcome measurements described?
9. Is a primary outcome specified?
10. Is a minimum (clinically?) important difference for the primary outcome reported?
11. Are power calculations described?
12. Is the rationale for the statistical analyses explained?
13. Are the methods for statistical analyses described?
14. Are stopping rules described?
15. Is the unit of randomisation described?
16. Is the method used to generate the allocation schedule described?
17. Is the method of allocation concealment described?
18. Is the timing of assignment described?
19. Is the method for separating those generating the allocation sequence from those assigning participants
to groups described?
20. Are the mechanisms of blinding described?
21. Is the number of eligible patients reported?
22. Is the number of randomised patients reported for each comparison group?
23. Are prognostic variables by treatment and control group described?
24. Is the number of patients receiving an intervention as allocated reported for each group?
25. Is the number of patients analysed reported for each comparison group?
26. Are withdrawals and drop-outs described for each comparison group?
27. Are protocol deviations described for each comparison group?
28. Is the estimated effect of intervention on primary and secondary outcomes stated, including
measurements of precision?
29. Are the results stated in absolute numbers?
30. Are summary data and inferential statistics presented in sufficient detail to permit alternative analyses
and replication?
The total score (max = 30) of the evaluated study is calculated as the sum of the answer “Yes”
47  A Practical Guide to Writing (and Understanding) a Scientific Paper: Meta-Analyses 487

Table 47.7  Strategies to identify high, low or unclear risk of bias for the main domains of the “Cochrane Risk of Bias
Tool” (adapted from: Table 8.5c of Higgins JPT, Altman DG (editors). Chapter 8: Assessing risk of bias in included
studies. In: Higgins JPT, Green S (editors). Cochrane Handbook for Systematic Reviews of Interventions. Chichester
(UK): John Wiley & Sons, 2008)
Risk-of-bias assessment according to the “Cochrane Risk of Bias Tool”
Selection Random sequence generation: method used to generate the allocation sequence in sufficient detail to
allow an assessment of whether it should produce comparable groups
Low risk Randomisation with random number table, coin toss, computer
generator, dice…
High risk Randomisation with date of birth, admission day, clinic record
number, judgement of clinician…
Unclear risk Insufficient information on randomisation process
Allocation concealment: method used to conceal the allocation sequence to avoid intervention
allocations being foreseen in advance or during enrolment
Low risk Participants could not foresee assignation due to central
allocation, identical drug containers, opaque sealed envelopes
High risk Participants could foresee assignation due to open allocation
schedule, alternation, unsealed envelopes
Unclear risk Insufficient information or concealment not described
Performance Blinding of participants and personnel: measures used to blind study participants and personnel from
knowing the intervention received
Low risk Blinding or use of outcomes not influenced by lack of blinding
High risk No or incomplete blinding that could influence outcomes
Unclear risk Insufficient information to permit judgement
Detection Blinding of outcome assessment: measures used to blind outcome assessors from knowledge of the
intervention a participant received
Low risk Blinding or use of outcomes not influenced by lack of blinding
High risk No or incomplete blinding that could influence outcomes
Unclear risk Insufficient information to permit judgement
Attrition Incomplete outcome data: completeness of outcome data for each main outcome, including attrition
and exclusion from the analysis
Low risk No missing outcome data, or missing data balanced between
groups and not related to outcomes
High risk Reason for missing data related to outcomes, or imbalance
between intervention group
Unclear risk Insufficient reporting of exclusions to permit judgement
Reporting Selective reporting: possibility of selective outcome reporting
Low risk All the prespecified outcomes of interest have been reported
High risk Not all prespecified outcomes have been reported, or outcomes
reported incompletely, or lack of key outcomes
Unclear risk Insufficient information to permit judgement
Other Other bias: any important concerns about bias not addressed in the other domains in the tool
Low risk The study appears to be free from other sources of bias
High risk Bias related to specific study design, or fraudulent, or other
problems
Unclear risk Insufficient information to assess whether other biases exist

and any re-inclusions in analyses performed • Other (other bias): state any important con-
by the review authors. cerns about bias not addressed in the other
• Selective reporting (reporting bias): state how domains in the tool.
the possibility of selective outcome reporting
According to the quality and methodology of
was examined by the review authors and what
the study, each domain should be rated
was found.
(Table  47.7). The overall risk of bias should
488 A. Grassi et al.

Blinding of participants and personnel (performance bias)


Fact Box 47.4: Bias Included and Evaluated
in the “Cochrane Risk of Bias Tool”

Blinding of outcome assessment (detection bias)


• Selection bias: the systematic differ-

Random sequence generation (selection bias)


ence between the baseline characteris-
tics of the groups

Incomplete outcome data (attrition bias)


Allocation concealment (selection bias)
• Performance bias: the systematic dif-

Selective reporting (reporting bias)


ference between groups in the care that
is provided
• Attrition bias: the systematic differ-
ence between groups in withdrawals
from a study
• Detection bias: the systematic differ-
ence between groups in how outcomes

Other bias
are determined
• Reporting bias: the systematic differ-
ence between reported and unreported
Ampollini et al. + + + – +
findings
Bait et al. + + + + – +
Carulli et al. – – – – + + –

therefore be determined based on a low risk of Compagnoni et al. + + + + –


bias for all key domains (low risk), a high risk of Ferrua et al. + – + + +
bias for one or more key domain (high risk) or an
Fravisini et al. + + – – + +
unclear risk for one or more key domain (unclear
risk). Finally, the risk across studies could be Grassi et al. + + + + +
defined as low if most information comes from Mazzitelli et al. – – + +
studies with a low risk of bias, high if the propor- Simonetta et al. + + + –
tion of information from studies with a high risk
of bias is sufficient to affect the interpretation of Fig. 47.5  In this risk-of-bias summary, it is possible to
the results and unclear if most information is have a visual presentation of the risk of each bias for all
from studies with a low or unclear risk of bias. To the included studies. The risk could be low (green plus),
help the presentation of this information, a risk- high (red minus), or unclear (no sign). In this specific
table, it is possible to observe that the study by Carulli
of-bias summary (Fig.  47.5) and risk-of-bias et al. is the most biased and the study by Grassi et al. is the
graphs (Fig. 47.6) are extremely useful. one with the lowest risk of bias
Other types of bias exist, due to imbalance in
the dissemination of research findings due to the
• Multiple publication bias (duplicate): when
nature and direction of results. They are known
the multiple or singular publication of research
as reporting biases [35] and can be:
findings depends on the nature and direction
• Publication bias: when the publication or non- of the results
publication of research findings depends on • Location bias: when the publication of research
the nature and direction of the results; as an findings in journals with different ease of access
example, studies with negative results are or levels of indexing in standard databases
often not published depends on the nature and direction of results
• Time-lag bias: when the rapid or delayed pub- • Citation bias: when the citation or non-citation
lication of research findings depends on the of research findings depends on the nature and
nature and direction of the results direction of the results
47  A Practical Guide to Writing (and Understanding) a Scientific Paper: Meta-Analyses 489

Random sequence generation (selection bias)

Allocation concealment (selection bias)

Blinding of participants and personnel (performance bias)

Blinding of outcome assessment (detection bias)

Incomplete outcome data (attrition bias)

Selective reporting (reporting bias)

Other bias

0% 25% 50% 75% 100%

Low risk of bias Unclear risk of bias High risk of bias

Fig. 47.6  In these risk-of-bias graphs, it is possible to see to be the bias with a lower risk, while the “performance
a visual presentation of the recurrent bias based on each bias” and the “detection bias” appear to be those with a
domain. In this specific case, the “reporting bias” appears higher risk

a 0 b 0

0.05 0.05
SE(RD)

SE(RD)

0.1 0.1

0.15 0.15

0.2 0.2
–1 –0.5 0 0.5 1 –1 –0.5 0 –0.5 1
RD RD

Fig. 47.7  In a funnel plot where the risk of bias is low, tion of small studies without statistical significance, the
the symmetrical shape of an inverted funnel is seen (a). funnel plot results are asymmetrical, with a gap in the bot-
When the publication bias tends to exclude the publica- tom corner (bottom left corner, in this case) (b)

• Language bias: when the publication of simple scatter plot of the intervention effect esti-
research findings in a language other than mates from individual studies against some mea-
English are sometimes be regarded as of sec- surement of each study’s size of precision; the
ondary importance, while studies publishing effect estimates are plotted on the horizontal
positive results might also be more likely to scale and the measurement of study size on the
publish in English vertical axis (Fig. 47.7). As effect estimates from
small studies scatter more widely at the bottom of
Outcome reporting bias: when the selective the graph and those of larger studies are scattered
reporting of some outcomes but not others more narrowly at the top of the graph, the plot
depends on the type of results found, if they are should assume the shape of a symmetrical
positive or negative or if they introduce new or inverted funnel. If, due to publication bias,
repetitive findings. smaller studies without statistical significance
One practical way to detect reporting bias is remain unpublished, the plot will be asymmetri-
the use of the funnel plot graph [35]. This is a cal with a gap in a bottom corner. In this case, the
490 A. Grassi et al.

meta-analysis will tend to overestimate the inter- For this purpose, “summary of findings”
vention effect. Apart from publication bias, tables could be useful, since it presents the main
asymmetry of the funnel plot could also be due to findings in a simple format, providing key infor-
poor methodological quality, true heterogeneity, mation on the quality of evidence, the magnitude
artefactual or chance. Funnel plot asymmetry of the effect of the interventions and the sum of
should only be used if at least ten studies are available data on the main outcomes [31]. Six
included in the meta-analysis, when all the stud- elements should be reported: a list of all impor-
ies do not have similar sizes. tant outcomes, a measurement of the typical bur-
den of these outcomes, the absolute and relative
magnitude of effect, numbers of participants and
47.6.3 Correctly Approach studies addressing these outcomes, a rating of the
and Evaluate Non-randomised overall quality of evidence for each outcome and
Studies a space for comments. Special mention should be
made of the quality of evidence, which is assessed
Finally, a few words should be devoted to the through the Grades of Recommendation,
meta-analysis of non-randomised controlled Assessment, Development and Evaluation
studies. Pooling together the results of non- (GRADE) tool [1]. It describes the body of evi-
randomised studies could be appropriate when dence as “High”, “Moderate”, “Low” or “Very
they have a large effect; however, combining Low”, based on the methodological quality,
RCT and non-randomised studies is not recom- directness of evidence, heterogeneity, precision
mended, as their results should be expected to of effect estimates and risk of publication bias.
differ systematically, resulting in increased het-
erogeneity [29].
Meta-analyses of non-randomised studies 47.7.2 Pay Attention: What Is
have greater potential bias, and their results “Statistically Significant” Is
should therefore be interpreted with caution. In Also “Clinically Significant”
fact, serious concerns could be related especially
to the differences between people in different When numerical results are going to be inter-
intervention groups (selection bias), caused by preted, attention should be paid to the 95% CI,
the lack of randomisation. because, if it is narrow, the effect size is known
If both RCTs and non-randomised studies of an precisely, while, if it is wider, the uncertainty is
intervention are available and the author also greater. The CI and the p-value of the meta-
wants to include a non-randomised study due to analysis are strictly linked, as a value of <0.05
the small number of RCTs, they should be pre- will exclude the null value (OR, RR of 1 or MD,
sented separately, or the findings of the non- SMD of 0) from the interval between the CIs,
randomised studies should be discussed in the thus suggesting that the experimental treatment
final discussion with the meta-analysis findings. has an effect compared with the control
treatment.
However, even if the findings are statistically
47.7 How to Interpret Your significant, the clinical meaning of the benefit of
Findings Critically the experimental treatment should be accurately
weighted. When the treatment effect is measured
47.7.1 Summarise Your Main with RR or RD, an interpretation of the clinical
Findings importance cannot be made without knowledge
of the typical risk of events without treatment. In
After the results have been correctly and clearly fact, a risk ratio of 0.75, for example, could cor-
reported and methodology and bias adequately respond to a clinically important reduction in
evaluated, the main findings of the meta-analysis events from 80 to 60% or a small, less clinically
can be critically interpreted. important reduction from 4 to 3%. Conversely,
47  A Practical Guide to Writing (and Understanding) a Scientific Paper: Meta-Analyses 491

when dealing with continuous scales and mean with single-bundle demonstrated in a Level I
differences, the proper minimum clinically RCT has been questioned, since a difference of
important difference (MCID) of the considered less than 1 mm in an arthrometric evaluation and
outcome should be considered. Since the MCID around two points in a subjective IKDC were not
represents the smallest change in a treatment out- considered clinically meaningful, despite being
come that a patient would identify as important, statistically significant [23].
it is possible that a mean difference, despite being Furthermore, conclusions should not be drawn
statistically significant, could be irrelevant from a too quickly without performing an accurate eval-
clinical point of view (e.g. an MD of 4 points in uation of heterogeneity through subgroup or sen-
the subjective IKDC, where the MCID is 11.5 sitivity analysis. For example, Soroceanu et  al.
points) [3, 8] (Table  47.8). Moreover, the mini- [33] reported a relative risk of re-rupture of 0.4 in
mum detectable change (MDC), which is the favour of surgical repair compared with conser-
minimum amount of change in a patient’s score vative treatment in the case of Achilles tendon
that ensures the change is not the result of mea- rupture. However, since the authors found a not
surement error, should be considered [3]. negligible heterogeneity of 35%, they identified
Recently, in a JBJS commentary, the superiority the item of “functional rehabilitation” as a cause
of double-bundle ACL reconstruction compared of heterogeneity through meta-regression. So,

Table 47.8  The minimum detectable change (MDC) and minimum clinically important difference (MCID) of the
main clinical scores used in knee surgery (adapted from: Collins NJ1, Misra D, Felson DT, Crossley KM, Roos
EM. Measures of knee function: International Knee Documentation Committee (IKDC) Subjective Knee Evaluation
Form, Knee Injury and Osteoarthritis Outcome Score (KOOS), Knee Injury and Osteoarthritis Outcome Score Physical
Function Short Form (KOOS-PS), Knee Outcome Survey Activities of Daily Living Scale (KOS-ADL), Lysholm Knee
Scoring Scale, Oxford Knee Score (OKS), Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC),
Activity Rating Scale (ARS) and Tegner Activity Score (TAS). Arthritis Care Res (Hoboken). 2011 Nov;63 Suppl
11:S208–28. doi: https://doi.org/10.1002/acr.20632)
The clinical scores most used for knee evaluation
Score Condition MDC MCID
Subjective IKDC Injuries 8.8–15.6 6.3 (6 m) to 16.7 (12 m)
Subjective IKDC Mixed pathologies 6.7 11.5 (sensitive) to 20.5
(specific)
KOOS pain Injuries 6–6.1 –
KOOS symptoms Injuries 5–8.5 –
KOOS ADL Injuries 7–8 –
KOOS sport/rec Injuries 5.8–12 –
KOOS Qol Injuries 7–7.2 –
KOOS pain OA 13.4 –
KOOS symptoms OA 15.5 –
KOOS ADL OA 15.4 –
KOOS sport/rec OA 19.6 –
KOOS Qol OA 21.1 –
Lysholm Injuries 8.9–10.1 –
Lysholm Mixed pathologies – –
Oxford Knee Scale OA 6.1 –
WOMAC pain OA 14.4–16.2 22.87 (TKR 6 m) to 27.98
(TKR 24 m)
WOMAC symptoms OA 22.9–30.6 14.43 (TKR 6 m) to 21.35
(TKR 24 m)
WOMAC function OA 10.6–15 19.01 (TKR 6 m) to 20.84
(TKR 24 m)
Tegner Activity Scale Injuries 1 –
Tegner Activity Scale OA – –
MDC minimum detectable change, MCID minimum clinically important difference
492 A. Grassi et al.

after performing a subgroup analysis separating When results are instead counter-intuitive,
patients undergoing functional or conventional clinical judgement based on experience, educa-
rehabilitation, they found no difference in the re- tion and current practices will be needed to deci-
rupture rate between surgical treatment and con- pher the unexpected results. The decision to
servative treatment with functional rehabilitation. determine whether to accept the findings or ques-
On the other hand, as Foster et al. [7] intended to tion the statistical technique could be taken after
include as many data as possible in their meta- looking back at the original articles, reassessing
analysis of irradiated vs. nonirradiated allografts their inclusion and evaluating whether assump-
for ACL reconstruction, they performed a sensi- tions about the original research question are not
tivity analysis to evaluate whether imputing SD lost when the studies are combined.
would have influenced the final results. After per-
forming an analysis of only the studies reporting
SD, they repeated the analysis also adding those 47.8 C
 onclusion: How to Prepare
studies in which the SD was imputed as the mean the Manuscript
value, reporting no substantial differences in the
results. In their final evaluation, they therefore The very last step in meta-analysis is to prepare a
disclosed this issue and presented the data rela- manuscript that is complete, essential, clear to
tive to all the studies, independently of the source the reader and suitable for publication in a peer-
of the SD. reviewed journal. First, the guideline of the target
journal should be consulted to “tailor” the manu-
script accordingly. Then, the PRISMA guidelines
47.7.3 Translate Your Findings into (Table 47.9) should be followed to fulfil the high-
Clinics est quality standard [25].
Title: should be concise and focused on the
The final difficulty in interpreting the meta- topic, identifying the paper as a meta-analysis.
analysis results lies in applying the results to Abstract: should be structured, including all
clinical practice [32]. It is important to correctly the sections of the paper.
disclose whether the individual studies pooled in Introduction: should be short and focused on
the meta-analysis can be generalised to a specific the topic, expressing the rationale of the meta-
clinical scenario. This includes ensuring similar analysis, the purpose and the hypothesis.
patient populations, interventions and outcomes Methods: should mention all the information
of interest. For example, Jiang et al. [18] reported regarding the databases used, the timing of the
no differences in the rates of return to sport search, the keywords, the eligibility criteria, the
between patients undergoing surgical repair or methods of data extraction and the items evalu-
non-surgical treatment after Achilles tendon rup- ated. The statistical method used to combine the
ture. However, since the RCTs included in the results, the bias evaluation and eventual sensitiv-
meta-analysis evaluated patients with a mean age ity or subgroup analysis should be mentioned as
of around 40 years, this recommendation should well.
be applied with extreme caution in young, pro- Results: should be clear and easy to under-
fessional athletes. stand. All included and excluded studies should
Finally, attention should be paid when inter- be described in a flow diagram. It is recom-
preting inconclusive or counter-intuitive results, mended to present the data through forest plots
which is one of the most common errors in scien- and summary tables. The results of sensitivity or
tific manuscripts. When there is inconclusive evi- subgroup analysis and of bias evaluation should
dence, it is not appropriate to state that “there is be provided in this section.
evidence of no effect”. It is instead more appro- Discussion: should not be too long and should
priate to state that “there is no evidence of an preferably focus on the main findings of the
effect”. meta-analysis. The evidence should be sum-
47  A Practical Guide to Writing (and Understanding) a Scientific Paper: Meta-Analyses 493

Table 47.9  Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) checklist as guideline
for the final production of a meta-analysis (adapted from: Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gøtzsche PC,
Ioannidis JP, Clarke M, Devereaux PJ, Kleijnen J, Moher D. The PRISMA statement for reporting systematic reviews
and meta-analyses of studies that evaluate healthcare interventions: explanation and elaboration. BMJ. 2009 Jul
21;339:b2700)
Preferred reporting items for systematic reviews and meta-analyses (PRISMA) checklist
Section Item Description Page
Title 1 Title Identify the report as a systematic review, meta-analysis,
or both
Abstract 2 Structured summary Include, as applicable: background; objectives; data
sources; eligibility criteria, participants and interventions;
synthesis methods; results; limitations; conclusions
Introduction 3 Rationale Describe the rationale for the review in the context of
what is already known
4 Objectives Provide an explicit statement of questions being addressed
with reference to participants, interventions, comparisons,
outcomes and study design (PICOS)
Methods 5 Protocol and registration Indicate if a review protocol exists and, if available,
provide registration information including registration
number
6 Eligibility criteria Specify study characteristics and report characteristics
used as criteria for eligibility, giving rationale
7 Information sources Describe all information sources and date last searched
8 Search Present full electronic search strategy for at least one
database, including any limits used, such that it could be
repeated
9 Study selection State the process for selecting studies
10 Data collection process Describe the method of data extraction from reports
11 Data items List and define all variables for the data that were sought
and any assumptions and simplifications made
12 Risk of bias in individual Describe the methods used for assessing risk of bias of
studies individual studies and how this information is to be used
in any data synthesis
13 Summary measurements State the principal summary measurements (e.g. risk ratio,
difference in means)
14 Synthesis of results Describe the methods for handling data and combining
results of studies
15 Risk of bias across studies Specify any assessment of risk of bias that may affect the
cumulative evidence (e.g. publication bias)
16 Additional analyses Describe methods of additional analyses (e.g. sensitivity
or subgroup analyses, meta-regression)
Results 17 Study selection Give numbers of studies screened, assessed for eligibility
and included in the review, with reasons for exclusion at
each stage (flow diagram)
18 Study characteristics For each study, present characteristics of the data that
were extracted
19 Risk of bias within studies Present data on risk of bias of each study
20 Results of individual For all outcomes considered, present for each study
studies simple summary data for each intervention group and
effect estimates with CI (forest plot)
21 Synthesis of results Present results of each meta-analysis conducted, including
confidence intervals and measurements of consistency.
22 Risk of bias across studies Present results of any assessment of risk of bias across
studies
23 Additional analysis Give results of additional analyses, if performed (e.g.
sensitivity or subgroup analyses, meta-regression)
(continued)
494 A. Grassi et al.

Table 47.9 (continued)
Preferred reporting items for systematic reviews and meta-analyses (PRISMA) checklist
Section Item Description Page
Discussion 24 Summary of evidence Summarise the main findings including the strength of
evidence for each main outcome
25 Limitations Discuss limitations at study and outcome level and at
review level
26 Conclusions Provide a general interpretation of the results in the
context of other evidence and implications for future
research
Funding 27 Funding Describe sources of funding for the systematic review and
other support; role of funders for the systematic review

marised and discussed, together with the main


limitations. The conclusions, which must be are pooled in a formal meta-analysis with the
based exclusively on the findings without specu- help of the biostatistician of your university.
lation, should delineate clinical and research Due to the several differences in surgical pro-
implications. cedure and patients’ inclusion criteria, you
Figures and tables: forest plots and funnel opted to perform a more conservative statisti-
plots are very useful for result presentation, such cal analysis using a random-effect model,
as summary tables. considering also the high degree of statistical
References: should be updated and formatted heterogeneity revealed with the I2 test.
according to journal guidelines. Analysing the relative risk of complication
Funding: authors should always disclose any and the mean differences of the main disease-
conflict of interests and eventual funding. specific scales, you find out a significant supe-
riority of the new device. However, when you
performed the bias evaluation, the lack of
Clinical Vignette blinding of patients and clinicians raises some
After attending an International conference, concerns due to the high risk of detection and
you discover that the attention of Sports performance bias. Overall, after the critical
Medicine surgeons is on a new device to evaluation of results and bias, you agree with
treat a specific type of ankle fractures, which the chief of your clinic that the implementa-
has been introduced few years ago. You are tion of this new device in your clinical prac-
aware of a couple of pilot RCT and, after tice, with specific indications, could improve
performing a quick PubMed search, you the quality of your treatments.
find out at least three new RCTs published
in the last year and a completed trial on clin-
icaltrials.gov, which is held by some of your Take-Home Message
overseas colleagues. Therefore, you plan to • Meta-analysis can be a powerful tool to com-
perform a systematic search to run a meta- bine results from studies with similar design
analysis comparing this new device with the and patient populations that are too small or
standard of care. With the help of your underpowered individually.
librarian and an orthopaedic resident in your • However, there are many potential threats that can
hospital, you can define a broad and appro- limit the internal validity and real clinical impact
priate search strategy, analysing three data- of conclusions reported in a meta-analysis.
bases and the website clinicaltrials.gov. • An appropriate study question and design, the
Defining as inclusion criteria only RCT proper management of heterogeneity and the
comparing the new device with the classic methodological evaluation of included studies
approach, you can find nine studies which with bias assessment are necessary to ensure
the highest quality.
47  A Practical Guide to Writing (and Understanding) a Scientific Paper: Meta-Analyses 495

 ppendix: Internet Links and Websites Useful for the Various Steps


A
in Preparing a Meta-Analysis
Appendix: useful links
Guides to meta-analyses
Cochrane Handbook for http://handbook.cochrane.org/
Systematic Review of
Interventions
PRISMA Guidelines for http://www.prisma-statement.org/
Systematic Reviews and
Meta-Analyses
GRADE Handbook http://gdt.guidelinedevelopment.org/app/handbook/handbook.html
Databases
Cochrane Library http://www.thecochranelibrary.com
PubMed https://www.ncbi.nlm.nih.gov/pubmed/
Embase http://store.elsevier.com/embase
Clinical Trials Database https://clinicaltrials.gov/
Statistical software
Cochrane RevMan http://tech.cochrane.org/revman/download
OpenMetaAnalyst http://www.cebm.brown.edu/openmeta/download.html
MedCalc https://www.medcalc.org/download.php
Methodological evaluation
Oxford Level of Evidence http://www.cebm.net/
oxford-centre-evidence-based-medicine-levels-evidence-march-2009/
JBJS Level of Evidence http://jbjs.org/level-of-evidence
Newcastle-Ottawa Scale for http://www.ohri.ca/programs/clinical_epidemiology/oxford.asp
Non-randomised Studies
Modified Newcastle-Ottawa http://www.uphs.upenn.edu/cep/methods/Modified%20Newcastle-Ottawa.pdf
Scale
CONSORT checklist for http://www.consort-statement.org/
randomised controlled trials
(RCT)
PEDRO scale for https://www.pedro.org.au/english/downloads/pedro-scale/
randomised controlled trials
(RCT)
AMSTAR score for https://amstar.ca/Amstar_Checklist.php
systematic reviews
COSMIN guidelines for http://www.cosmin.nl/downloads.html (for studies of measurement instruments)
studies of measurement
instrument

References Institute of Sport Tendon Study Group. Scand J Med


Sci Sports. 2000;10:2–11.
3. Collins NJ, Misra D, Felson DT, Crossley KM, Roos
1. Brozek JL, Akl EA, Alonso-Coello P, Lang D,
EM.  Measures of knee function: International Knee
Jaeschke R, Williams JW, Phillips B, Lelgemann M,
Documentation Committee (IKDC) Subjective Knee
Lethaby A, Bousquet J, Guyatt GH, Schünemann HJ,
Evaluation Form, Knee Injury and Osteoarthritis
GRADE Working Group. Grading quality of evidence
Outcome Score (KOOS), Knee Injury and
and strength of recommendations in clinical practice
Osteoarthritis Outcome Score Physical Function Short
guidelines. Part 1 of 3. An overview of the GRADE
Form (KOOS-PS), Knee Outcome Survey Activities
approach and grading quality of evidence about
of Daily Living Scale (KOS-ADL), Lysholm Knee
interventions. Allergy. 2009;64(5):669–77 . Review.
Scoring Scale, Oxford Knee Score (OKS), Western
https://doi.org/10.1111/j.1398-9995.2009.01973.x.
Ontario and McMaster Universities Osteoarthritis
2. Coleman BD, Khan KM, Maffulli N, Cook JL, Wark
Index (WOMAC), Activity Rating Scale (ARS),
JD.  Studies of surgical outcome after patellar tendi-
and Tegner Activity Score (TAS). Arthritis Care Res
nopathy: clinical significance of methodological defi-
(Hoboken). 2011;63(Suppl 11):S208–28. https://doi.
ciencies and guidelines for future studies. Victorian
org/10.1002/acr.20632.
496 A. Grassi et al.

4. Deeks JJ, Altman DG, Bradburn MJ.  Statistical 16. Huwiler-Müntener K, Jüni P, Junker C, Egger

methods for examining heterogeneity and combin- M.  Quality of reporting of randomized trials
ing results from several studies in meta-analysis. as a measure of methodologic quality. JAMA.
In: Egger M, Davey Smith G, Altman DG, editors. 2002;287(21):2801–4.
Systematic reviews in health care: meta-analysis in 17. Israel H, Richter RR. A guide to understanding meta-
context. 2nd ed. London: BMJ Publication Group; analysis. J Orthop Sports Phys Ther. 2011;41(7):496–
2001. 504. https://doi.org/10.2519/jospt.2011.3333.
5. Deeks JJ, Higgins JPT, Altman DG.  Chapter 9: 18. Jiang N, Wang B, Chen A, Dong F, Yu B. Operative
Analysing data and undertaking meta-analyses. In: versus nonoperative treatment for acute Achilles ten-
Higgins JPT, Green S, editors. Cochrane handbook don rupture: a meta-analysis based on current evi-
for systematic reviews of interventions. Chichester: dence. Int Orthop. 2012;36(4):765–73. https://doi.
Wiley; 2008. org/10.1007/s00264-011-1431-3.
6. DerSimonian R, Laird N. Meta-analysis in clinical tri- 19. Koretz RL, Lipman TO.  Understanding systematic
als. Control Clin Trials. 1986;7:177–88. reviews and meta-analyses. JPEN J Parenter Enteral
7. Foster TE, Wolfe BL, Ryan S, Silvestri L, Kaye Nutr. 2016. pii: 0148607116661841.
EK.  Does the graft source really matter in the out- 20. Lefebvre C, Manheimer E, Glanville J.  Chapter 6:
come of patients undergoing anterior cruciate liga- Searching for studies. In: Higgins JPT, Green S, edi-
ment reconstruction? An evaluation of autograft tors. Cochrane handbook for systematic reviews of
versus allograft reconstruction results: a systematic interventions. Chichester: Wiley; 2008.
review. Am J Sports Med. 2010;38(1):189–99. https:// 21. Lefaivre KA, Slobogean GP. Understanding system-
doi.org/10.1177/0363546509356530. atic reviews and meta-analyses in orthopaedics. J
8. Grassi A, Ardern CL, Marcheggiani Muccioli GM, Am Acad Orthop Surg. 2013;21(4):245–55 . Review.
Neri MP, Marcacci M, Zaffagnini S.  Does revision https://doi.org/10.5435/JAAOS-21-04-245.
ACL reconstruction measure up to primary surgery? 22. Magnussen RA, Carey JL, Spindler KP.  Does auto-
A meta-analysis comparing patient-reported and graft choice determine intermediate-term outcome
clinician-reported outcomes, and radiographic of ACL reconstruction? Knee Surg Sports Traumatol
results. Br J Sports Med. 2016;50(12):716–24. Arthrosc. 2011;19(3):462–72.
https://doi.org/10.1136/bjsports-2015-094948. 23. Marx RG. Anatomic double-bundle anterior cruciate
9. Grassi A, Zaffagnini S, Marcheggiani Muccioli GM, ligament reconstruction was superior to conventional
Neri MP, Della Villa S, Marcacci M.  After revision single-bundle reconstruction. J Bone Joint Surg Am.
anterior cruciate ligament reconstruction, who returns 2013;95(4):365. https://doi.org/10.2106/JBJS.9504.
to sport? A systematic review and meta-analysis. Br ebo804.
J Sports Med. 2015;49(20):1295–304. https://doi. 24. Mantel N, Haenszel W. Statistical aspects of the anal-
org/10.1136/bjsports-2014-094089. ysis of data from retrospective studies of disease. J
10. Grassi A, Zaffagnini S, Marcheggiani Muccioli
Natl Cancer Inst. 1959;22:719–48.
GM, Roberti Di Sarsina T, Urrizola Barrientos F, 25. Moher D, Liberati A, Tetzlaff J, Altman DG, PRISMA
Marcacci M.  Revision anterior cruciate ligament Group. Preferred reporting items for systematic
reconstruction does not prevent progression in one reviews and meta-analyses: the PRISMA statement.
out of five patients of osteoarthritis: a meta-analysis BMJ. 2009;339:b2535. https://doi.org/10.1136/bmj.
of prevalence and progression of osteoarthritis. J b2535. No abstract available.
ISAKOS. 2016;1(1):16–24. https://doi.org/10.1136/ 26. O’Connor D, Green S, Higgins JPT.  Chapter 5:

jisakos-2015-000029. Defining the review question and developing crite-
11. Greco T, Zangrillo A, Biondi-Zoccai G, Landoni
ria for including studies. In: Higgins JPT, Green S,
G.  Meta-analysis: pitfalls and hints. Heart Lung editors. Cochrane handbook of systematic reviews of
Vessel. 2013;5(4):219–25. Review. interventions. Chichester: Wiley; 2008.
12. Higgins JPT, Altman DG.  Chapter 8: Assessing risk 27. Olivo SA, Macedo LG, Gadotti IC, Fuentes J, Stanton
of bias in included studies. In: Higgins JPT, Green S, T, Magee DJ. Scales to assess the quality of random-
editors. Cochrane handbook for systematic reviews of ized controlled trials: a systematic review. Phys Ther.
interventions. Chichester: Wiley; 2008. 2008;88(2):156–75.
13. Higgins JPT, Deeks JJ.  Chapter 7: Selecting studies 28. Parlamas G, Hannon CP, Murawski CD, Smyth

and collecting data. In: Higgins JPT, Green S, editors. NA, Ma Y, Kerkhoffs GM, van Dijk CN, Karlsson
Cochrane handbook for systematic reviews of inter- J, Kennedy JG.  Treatment of chronic syndesmotic
ventions. Chichester: Wiley; 2008. injury: a systematic review and meta-analysis. Knee
14. Higgins JPT, Deeks JJ, Altman DG.  Chapter 16:
Surg Sports Traumatol Arthrosc. 2013;21(8):1931–9.
Special topics in statistics. In: Higgins JPT, Green S, Review. https://doi.org/10.1007/s00167-013-2515-y.
editors. Cochrane handbook for systematic reviews of 29. Reeves BC, Deeks JJ, Higgins JPT, Wells GA. Chapter
interventions. Chichester: Wiley; 2008. 13: Including non-randomized studies. In: Higgins
15. Higgins JPT, Green S, editors. Cochrane handbook JPT, Green S, editors. Cochrane handbook for sys-
for systematic reviews of interventions. Chichester: tematic reviews of interventions. Chichester: Wiley;
Wiley; 2008. 2008.
47  A Practical Guide to Writing (and Understanding) a Scientific Paper: Meta-Analyses 497

30.
Russo MW.  How to review a meta-analysis. analysis of randomized trials. J Bone Joint Surg
Gastroenterol Hepatol (N Y). 2007;3(8):637–42. Am. 2012;94(23):2136–43. https://doi.org/10.2106/
31. Schunemann HJ, Oxman AD, Higgins JPT, Vist GE, JBJS.K.00917.
Glasziou P, Guyatt GH. Chapter 11: Presenting results 34. Stang A. Critical evaluation of the Newcastle-Ottawa
and ‘summary of findings’ tables. In: Higgins JPT, scale for the assessment of the quality of nonran-
Green S, editors. Cochrane handbook for systematic domized studies in meta-analyses. Eur J Epidemiol.
reviews of interventions. Chichester: Wiley; 2008. 2010;25(9):603–5. https://doi.org/10.1007/
32. Schunemann HJ, Oxman AD, Vist GE, Higgins
s10654-010-9491-z.
JPT, Deeks JJ, Glasziou P, Guyatt GH.  Chapter 12: 35. Sterne JAC, Egger M, Moher D.  Chapter 10:

Interpreting results and drawing conclusions. In: Addressing reporting biases. In: Higgins JPT, Green
Higgins JPT, Green S, editors. Cochrane handbook S, editors. Cochrane handbook for systematic reviews
for systematic reviews of interventions. Chichester: of interventions. Chichester: Wiley; 2008.
Wiley; 2008. 36. Wright JG, Swiontkowski MF, Tolo VT.  Meta-

33. Soroceanu A, Sidhwa F, Aarabi S, Kaufman A,
analyses and systematic reviews: new guidelines for
Glazebrook M.  Surgical versus nonsurgical treat- JBJS.  J Bone Joint Surg Am. 2012;94(17):1537. No
ment of acute Achilles tendon rupture: a meta- abstract available.
A Practical Guide to Writing (and
Understanding) a Scientific Paper:
48
Clinical Studies

Riccardo Compagnoni, Alberto Grassi,
Stefano Zaffagnini, Corrado Bait,
Kristian Samuelsson, Alessandra Menon,
and Pietro Randelli

Fact Box 48.1 48.1 Introduction


To become a medical writer, it is necessary
to understand medical concepts and termi- Writing a scientific paper is a relevant part of the
nology, be familiar with relevant guidelines activities of medical doctors. Publishing and
and the structure and content of specific reviewing is becoming increasingly important for
documents, and, finally, have a good set of the progress of medical knowledge, offering the
writing skills. A topic of interest, supported opportunity to share the results of the work that
by current literature, should be identified has been done or studying the work of other
and the work planned in an accurate way researchers. This is critical for the evolution of
before starting the enrollment of the patients. modern science, considering that the work of one
scientist is based upon the results of others.
Clinical outcomes can only be improved through
research, education, and patient care. All these
R. Compagnoni (*) · A. Menon
1° Clinica Ortopedica, ASST Centro Specialistico
experiences are shared with the global commu-
Ortopedico Traumatologico Gaetano Pini-CTO, nity, primarily through peer-reviewed research
Milan, Italy papers and review articles [11, 15].
A. Grassi · S. Zaffagnini Two aspects are the most challenging. First the
Dipartimento Scienze Biomediche e Neuromotorie, long time needed to obtain a paper of good quality
Università di Bologna, Bologna, Italy and second the style of writing, generally regarded
IIa Clinica Ortopedica e Traumatologica, Istituto as less attractive when compared with surgical pro-
Ortopedico Rizzoli, Bologna, Italy cedures. Some studies have attempted to analyze
C. Bait the most challenging topic at the beginning, identi-
Istituto Clinico Villa Aprica, Como, Italy fying the cognitive burden, group support, and
K. Samuelsson mentoring, the difficulty involved in distinguishing
Department of Orthopaedics, Institute of Clinical between content and structure, and the backward
Sciences, The Sahlgrenska Academy, University of
Gothenburg, Gothenburg, Sweden
design of manuscripts as the most relevant [18].
When a medical doctor starts to write a paper, the
P. Randelli
1° Clinica Ortopedica, ASST Centro Specialistico
motivation is crucial; however, the time that is
Ortopedico Traumatologico Gaetano Pini-CTO, needed to write without taking too much time from
Milan, Italy everyday clinical activities requires commitment.
Laboratorio di Biomeccanica ed Innovazione To become a medical writer, it is necessary to
Tecnologica, Istituto Ortopedico Rizzoli, understand medical concepts and terminology, be
Bologna, Italy

© ISAKOS 2019 499


V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_48
500 R. Compagnoni et al.

familiar with relevant guidelines and the ­structure free digital archive of articles, accessible to any-
and content of specific documents, and, finally, one from anywhere via a basic web browser. The
have a good set of writing skills [19]. full text of all PubMed Central articles is free to
The aim of this paper is to provide tips read, with varying provisions for reuse.
acquired from authors attempting to help to write To obtain more effective results, there are some
good-quality clinical papers without wasting tips to take into consideration in terms of using
their time and how to publish them in the appro- PubMed. The home page has a link to some tutori-
priate journals. We have selected specific topics, als, which will help the researcher to use the search
which are analyzed in dedicated paragraphs. The engine correctly. MeSH (Medical Subject
different selections are how to choose the topic, Headings) is the NLM (National Library of
find the current literature, analyze the data, struc- Medicine)-controlled vocabulary thesaurus used
ture the paper, write the paper, and handle refer- for indexing articles for PubMed. This vocabulary
ences and some final tips on how to manage the helps to identify the correct words that should be
submission process. used in medical research and helps the authors to
find the correct subject headings. It is very impor-
tant to remember that, in MeSH, the subject head-
48.1.1 How to Choose the Topic ings are arranged in a hierarchy and a search for a
of Your Work descriptor will include all the descriptors in the
hierarchy below the given one [8].
Before beginning to enroll patients, to collect Regional health and medical databases have
data, or to research the literature, the first step is been compiled by the WHO (World Health
to choose a topic of interest for your work [17, Organization) to complement the internationally
23]. To do this, the best way is to ask your leader known bibliographic indices such as MEDLINE.
what is of interest, considering the trends in sur- The regional medical indexes, published by or
gical procedures or the introduction of innovative under the auspices of WHO Regional Offices,
techniques [21]. Once a topic has been identified, provide access to bibliographic information
an accurate search of the literature, using the about the health material published locally. They
most common databases, can help the researcher thus add a further dimension to the retrieval of
to identify the actual “hot” topic in that specific information from developed country-oriented
field of research. It is necessary to know the time- databases [10].
lines to follow, the data that should be collected, The Cochrane Library is a collection of data-
and the review/approval process to be followed. bases in medicine and other health-care special-
ties provided by Cochrane and other organizations.
At its core is the collection of Cochrane Reviews,
48.1.2 Find Current Literature a database of systematic reviews and meta-analy-
ses that summarize and interpret the results of
Medical science is based on data available in the medical research. The Cochrane Library aims to
literature. The widespread availability of informa- make the results of well-conducted controlled tri-
tion due to the use of the Internet has given clini- als readily available and is a key resource in evi-
cians an opportunity to access the literature well. dence-based medicine [6, 9].
The most used database in which life sciences and
biomedical papers are collected is MEDLINE
(Medical Literature Analysis and Retrieval 48.1.3 How to Decide on the Kind
System Online, MEDLARS Online). PubMed is a of Clinical Paper
free search engine primarily accessing the
MEDLINE database of references and abstracts There are different types of scientific article, some
on life sciences and biomedical topics [8]. of which require original research (primary litera-
It is important to remember that PubMed ture) and some that are based on other published
should not be confused with PubMed Central, a work (secondary literature). It is important to have
48  A Practical Guide to Writing (and Understanding) a Scientific Paper: Clinical Studies 501

a clear idea about the different types of article that


you can publish in a specific journal. Original Fact Box 48.2
research comprises studies based on clinical activ- Clinical studies are written following the
ities and is classified as primary literature. This IMRAD structure since the 1980s
group includes original treatment studies or obser- (Introduction, Methods, Results, and
vational studies. Treatment studies are mainly ran- Discussion). The abstract section should
domized, controlled, clinical trials (RCT) or emphasize new and important aspects of
adaptive controlled trials. Observational studies the study or observations, without over-
are cohort studies that are prospective, retrospec- interpreting findings. The introduction
tive, case control, and cross-sectional and finally should describe the background of the
case reports. A review article surveys and summa- study and should finish with a statement of
rizes previously published studies, rather than the clear aim of the study. Methods should
reporting new facts or analysis, and is for this rea- explain the structure of the study, the way
son called secondary literature. in which the results were obtained, and all
One important step is to determine the level of the steps relating to patient management,
evidence, i.e., a ranking system used to describe from enrollment to final evaluation.
the strength of the results measured in a clinical
trial or research study. The levels of evidence are
an important component of evidence-based medi- IMRAD is the format encouraged for the text of
cine (EBM), and understanding the levels and why observational and experimental studies by the
they are assigned to publications and abstracts “Uniform Requirements for Manuscripts
helps the reader to prioritize the available informa- Submitted to Biomedical Journals,” which has
tion. Many different ways of grading the level of become the most important, widely accepted
evidence are described, and many journals assign guide to writing, publishing, and editing in
a level to papers that they publish. Randomized, international biomedical publications [4]. The
controlled trials are generally defined as level 1 of Uniform Requirements are released by the
evidence, but some aspects have to be considered International Committee of Medical Journal
to assess the quality of the paper, including ran- Editors (ICMJE). Some types of articles, such
domization, blinding, a description of the random- as meta-analyses, may require different for-
ization and blinding process, and a description of mats, while case reports, narrative reviews, and
the number of subjects who withdraw or drop out editorials may have less structured or even
of the study; the confidence intervals relating to unstructured formats [15]. Detailed suggestions
study estimates; and a description of the sample on how to write the specific parts of the paper
size calculation [5]. are available online at the ICMJE website. The
Although the goal is to improve the overall UK National Knowledge Service provided
level of evidence in medical practice, this does funding to start the EQUATOR (Enhancing the
not mean that all lower-level evidence should be Quality and Transparency of Health Research)
discarded. Case series and case reports are impor- project. This initiative seeks to improve the
tant for hypothesis generation and can form the reliability of medical publications by promot-
basis of more relevant studies [5, 13]. ing the transparent, accurate reporting of health
research [1]. In the present article, the tech-
niques most used by the authors are reported. If
48.2 How to Write the Paper writers have a specific journal in mind in which
to publish their work, it is crucial to read the
Many techniques for writing a scientific paper instructions to authors and remember that many
in an effective way have been described [12, journals have open access instructions for
16]. The formal Introduction, Methods, Results, reviewers. This is very useful, because it lets
and Discussion (IMRAD) structure of scientific the researchers know which reviewer is going
papers was adopted in the 1980s. Nowadays, to check their work. The language must be
502 R. Compagnoni et al.

correct and, if researchers do not write fluent Many reviewers regard the introduction as a
English, it is suggested that they should use section of little interest and general—and
online services or mother tongue colleagues/ lengthy—considerations about the topic or com-
copy editors. There are limits to word count, mon knowledge should be avoided.
usually 3000–4000 words (different for differ- The introduction should finish with a state-
ent journals), and many papers are too long and ment of the clear aim of the study, with primary
may be rejected for that reason alone. and secondary outcomes considered, and the
The following section will provide some tips hypothesis of the investigation, at least for clini-
for the correct writing of the different sections of cal studies.
a manuscript.

48.2.3 Methods
48.2.1 Abstract
The methods section is an exact description of
The usual sections defined in a structured abstract the work done by researchers involved in the
are background/purpose, methods, results, and study and should explain the structure of the
conclusions. The abstract should provide the study, the way in which the results were obtained,
background to the study and should state the aim, and all the steps relating to patient management,
the basic procedures/methods (selection of study from enrollment to final evaluation.
participants, settings, measurements, and analyti- The first statements have to describe the
cal methods), the main findings (giving specific research design, the clinical diagnosis of the
effect sizes and their statistical and clinical sig- patients recruited, and in most cases the setting of
nificance—but not repeat which statistical meth- the study.
ods were used), and principal conclusions. The If the study compares the results of the treat-
results section is the most important part of the ment in different groups, these groups have to be
abstract, and nothing should compromise its described carefully. Randomization is of central
range and quality. It should emphasize new and importance in clinical trials because it reduces bias
important aspects of the study or observations, and represents a basis for ensuring the validity of
without overinterpreting findings. Most journals data analysis using statistical testing. The genera-
require abstracts that conform to a formal struc- tion of an unpredictable allocation sequence repre-
ture within a word count of 200–250 words. Even sents the first crucial element of randomization in
if the abstract is the first part of any article and a randomized, controlled trial [20]. The two funda-
the only section freely available in the most used mental characteristics of randomization are that
search engine, it should be written after the paper researchers must be unable to predict the group to
is finished. which a patient will be assigned until the patient is
The ICMJE recommends that journals publish unambiguously registered in the study and that
the clinical trial registration number at the end of researchers are unable to change a patient’s alloca-
the abstract [3, 7]. Level of Evidence is also fre- tion once he/she has been randomized. Remember
quently added at the end of the abstract. that randomization with a low-quality allocation
sequence can result in a biased estimation of the
treatment effect [14, 22].
48.2.2 Introduction The treatment given to patients has to be
described in a detailed manner. The length of the
The scope of the introduction is to show what is study from enrollment to conclusion has to be
already known on the subject of the paper and— reported. Outcome measurements using instru-
more important—to introduce what is unknown ments or scoring systems should be described
to justify the structure of the study and what is and explained. Care should be taken to select the
intended to be examined. appropriate outcomes for the specific disease or
48  A Practical Guide to Writing (and Understanding) a Scientific Paper: Clinical Studies 503

intervention; the outcome should have good reli-


ability, validity, and sensitivity. When patient- Fact Box 48.3
reported questionnaires are used, it is important If there is a large volume of data, it is sug-
to report whether patients were blinded to the gested that results should be reported in
treatment received in the event of a randomized tables, while repetition in the text section
trial, in order to minimize the bias. It is also must be avoided. The aim of the discussion
important to report whether the investigators that section is to interpret and describe the signifi-
are dedicated to objective evaluations (e.g., clini- cance of the study findings in the light of
cal findings, radiographic measurements) were what was already known about the research
blinded to patient treatment. The primary and problem being investigated and to explain the
secondary outcomes must be clearly specified. impact of the results on that specific topic.
A crucial and often neglected section in a sci-
entific manuscript is the statistical section. The
software used for the statistical computation 48.2.4 Results
should be reported. Before performing the statisti-
cal analysis, the data should be tested for normal- The results section is the most important and is
ity (e.g., using the Kolmogorov-Smirnov test), often the only section that is of interest to a
since the normal or non-normal distribution influ- stressed reader. All data must be reported care-
ences both the way measurements are reported fully, like the number of patients that completed
(mean  ±  standard deviation for normal, median, the study, the dropout rates in the different
and interquartile ranges for non-normal distribu- groups, and eventually the dropout rate associ-
tion) and the choice of statistical tests. When ated with the specific treatment. It must be
comparing two independent groups, the “inde- remembered that most quality score question-
pendent sample t-test” and the “Mann-Whitney naires regard a dropout rate exceeding 20% as an
test” should be used for normally or non-normally indicator of a low-quality study. The primary out-
distributed continuous measurements, respec- come results have to be expressed with statistical
tively. When similar comparisons relate to the data. Any negative findings or unexpected collat-
same group (e.g., pre- vs. posttreatment), the eral effects have to be reported. Based on normal
“dependent sample t-test” or the “Wilcoxon test” or non-normal distribution, the standard devia-
should be used analogously. If categorical vari- tions and the interquartile ranges should always
ables (e.g., IKDC grade) are compared, the “chi- accompany the mean and median value, respec-
square test” should be used. If three or more tively. If the number of patients included in the
groups are being compared, the “ANOVA” or study is limited, individual patient data can be
“Kruskal-Wallis test” should be used in the event reported in a table.
of normal or non-normal distribution, respec- If there is a large volume of data, it is sug-
tively. For correlation analysis, the Pearson or gested that they should be reported in tables,
Spearman tests should be used when the distribu- while repetition in the text section must be
tion of the variables is normal or not normal. avoided. However, a summary of the main and
In the case of RCTs, a sample size calculation most relevant findings is useful to the reader in
is mandatory to establish the sample size, since order to focus on the results. All details can be
there is a great risk that an underpowered study given in tables and repetitions must be avoided.
will suffer from a so-called beta error (type-2
error), thereby possibly missing the chance to
show a statistically significant finding. The sam- 48.2.5 Discussion
ple size calculation must be performed prior to
the start of the study; there are commercially The purpose of the discussion section is to
available programs that can easily be used for this explain what your results mean and what contri-
purpose. bution your paper makes to the field of study [22].
504 R. Compagnoni et al.

The aim of the discussion section is to interpret references should be formatted and ordered
and describe the significance of the study find- according to the specific journal guidelines. All
ings in the light of what was already known about too often this is not adhered to.
the research problem being investigated and to
explain the impact of the results on that specific
topic. Each of the main results of the study should 48.3 How to Manage
be analyzed and discussed, possibly in the same the Submission Process
order as they are reported in the results section.
The discussion section is based on interpretation, The submission process is the last step in the
which is a subjective exercise. For this reason, preparation of the article. This step is often com-
the author must avoid overinterpreting the results plex, due to the different submission manager
of the study, one of the most frequent errors platforms at different journals. One suggestion to
made. On the other hand, the discussion section help researchers to make it easier is to prepare all
is not a repetition of the introduction or methods the materials before starting the submission, also
sections. The discussion section has to acknowl- collecting information on ethics committees and
edge any limitations of the study, especially in clinical trial registration. More and more journals
terms of the methodology, and any alternative nowadays require a disclosure of financial con-
explanations of the findings [2]. A concluding flicts of interest, which should be collected from
take-home message can restate the answer one every author, possibly before submission, espe-
last time and/or indicate the importance of the cially in the case of multicenter studies.
work by stating implications, applications, or Sometimes, minor mistakes such as line number-
recommendations [24]. ing or line spacing determine the need for revi-
sion and time loss.
One useful tip is to check the PDF generated
48.2.6 References by the submission system to check for errors or
mistakes.
Authors should provide direct references to origi-
nal research sources whenever possible.
Secondary references should be avoided. Science 48.4 Conclusion
is based on the results produced by the scientific
community, and the method the community uses Becoming a good scientific writer requires a long
to share results is publication in scientific jour- learning curve. The authors of this short guide
nals. This database of information has become feel that, to obtain good results in writing, a pas-
larger in recent years, thanks to the use of the sion for clinical practice and being interested in
Internet. This is an important opportunity for analyzing the results of the clinical work are cru-
researchers who can access a great deal of infor- cial. Writing and sharing the results of clinical
mation about a specific topic. Every published work with others means taking part in something
paper cites previous articles supporting the state- greater, whose aim is to obtain better results in
ments and background of the research, and these the treatment of the patients.
articles are cited at the end of the paper. Good luck!
References should not be used by authors, edi-
tors, or peer reviewers to promote self-interest.
One suggestion for researchers is to use only ref- References
erences useful for the article, supporting specific
purposes. Considering that science is always 1. Altman DG, Simera I, Hoey J, Moher D, Schulz
K.  EQUATOR: reporting guidelines for health
evolving and the impact of journals is different, research. Lancet. 2008;371(9619):1149–50.
references must be updated and, if possible, they 2. Annesley TM.  The discussion section: your closing
should attempt to cite articles in the most relevant argument. Clin Chem. 2010;56(11):1671–4. https://
journals [15]. It is not superfluous to mention that doi.org/10.1373/clinchem.2010.155358.
48  A Practical Guide to Writing (and Understanding) a Scientific Paper: Clinical Studies 505

3. Andrade C.  How to write a good abstract for a sci- 16. O’Connor TR, Holmquist GP.  Algorithm for writ-
entific paper or conference presentation. Indian ing a scientific manuscript. Biochem Mol Biol Educ.
J Psychiatry. 2011;53(2):172–5. https://doi. 2009;37(6):344–8.
org/10.4103/0019-5545.82558. 17. Randelli PS, et al. Needs and wishes from the arthros-
4. Barron JP. The uniform requirements for manuscripts copy community. In: Karahan M, Kerkhoffs G,
submitted to biomedical journals recommended by Randelli P, Tuijthof G, editors. Effective training of
the International Committee of Medical Journal arthroscopic skills. Berlin: Springer; 2015.
Editors. Chest. 2006;129(4):1098–9. 18. Shah J, Shah A, Pietrobon R. Scientific writing of nov-
5. Burns PB, Rohrich RJ, Chung KC.  The levels of ice researchers: what difficulties and encouragements
evidence and their role in evidence-based medicine. do they encounter? Acad Med. 2009;84(4):511–6.
Plast Reconstr Surg. 2011;128(1):305–10. https://doi. https://doi.org/10.1097/ACM.0b013e31819a8c3c.
org/10.1097/PRS.0b013e318219c171. 19. Sharma S.  How to become a competent medical

6. http://www.cochranelibrary.com. writer? Perspect Clin Res. 2010;1(1):33–7. PubMed
7. http://www.icmje.org/recommendations/browse/man- PMID: 21829780.
uscript-preparation/preparing-for-submission.html. 20. Schulz KF, Grimes DA.  Generation of allocation

8. https://www.ncbi.nlm.nih.gov/pmc/. sequences in randomised trials: chance, not choice.
9. https://en.wikipedia.org/wiki/Cochrane_Library. Lancet. 2002;359:515–9.
10. http://www.who.int/library/country/regional/en/. 21. Tuijthof G, Cabitza F, Ragone V, Compagnoni R,
11. Kallestinova ED.  How to write your first research Dutch Arthrocopy Society Teaching Committee,
paper. Yale J Biol Med. 2011;84(3):181–90. Randelli P. What arthroscopic skills need to be trained
12. Liumbruno GM, Velati C, Pasqualetti P, Franchini before continuing safe training in the operating room?
M. How to write a scientific manuscript for publica- J Knee Surg. 2017;30(7):718–24.
tion. Blood Transfus. 2013;11(2):217–26. 22. Vickers AJ.  How to randomize. J Soc Integr Oncol.
13. Levin KA.  Study design IV.  Cohort studies. Evid
2006;4(4):194–8.
Based Dent. 2006;7(2):51–2. 23. Whitlock EP, Lopez SA, Chang S, et al. Identifying,
14. Moher D, Pham B, Jones A, et  al. Does quality of selecting, and refining topics. In: Methods guide for
reports of randomised trials affect estimates of inter- effectiveness and comparative effectiveness reviews
vention efficacy reported in meta-analyses? Lancet. [Internet]. Rockville: Agency for Healthcare Research
1998;352(9128):609–13. and Quality (US); 2008.
15. Nahata MC. Tips for writing and publishing an article. 24.
Zeiger M.  Essentials of writing biomedical
Ann Pharmacother. 2008;42(2):273–7. https://doi. research papers. New  York: McGraw-Hill; 2000.
org/10.1345/aph.1K616. p. 176–219.
Reporting Complications in
Orthopaedic Trials
49
S. Goldhahn, Norimasa Nakamura,
and J. Goldhahn

49.1 Background regards as a complication. The great variability of


reported complications for specific indications
Nobody likes to report complications in a clinical illustrates this fact. A survey among orthopaedic
study. However, complications in orthopaedic trials surgeons supports this observation by demon-
are an essential source of information. They may strating different awareness levels of complica-
terminate unsuccessful treatment strategies, help to tions [5]. A standardized approach for the
identify potential for development and form the documentation, assessment and reporting of
basis for shared decision-making with patients. complications in orthopaedic trials is suggested.
Nevertheless, complications implicate differ-
ent things to different parties. For surgeons, they
seem to cause trouble in the first instance. In 49.2 Different Perspectives
addition, they impair any success rate, may need on Complications
re-intervention, often require extensive commu-
nication with patients, sometimes lead to legal For legal authorities, complications are the so-­
problems and are often associated with more called adverse events that must be reported
problems and high costs. Given their perception according to the guidelines of good clinical prac-
as failure, it is not surprising that some surgeons tice (GCP). They are interested in information,
tend to neglect them—especially in reporting. whether the complication, for instance, leads to
Other surgeons are more critical, document and death or another stay in hospital (serious adverse
report more complications. So far, it is up to the event) or whether they are device related [6].
surgeon’s understanding and awareness what one Reported complications may lead to a stop of a
study, implant withdrawal from the market or
S. Goldhahn (*)
legal consequences.
Goldhahn GmbH, Baden, Switzerland
e-mail: goldhahn@goldhahn.swiss For patients, complications mean a decrease
in quality of life in the first instance. A treatment
N. Nakamura
Department of Rehabilitation Science, Osaka Health may take longer than usual, may cause more
Science University, Osaka, Japan pain than expected, may result in an inferior
e-mail: norimasa.nakamura@ohsu.ac.jp result and may lead to long-term sequelae. It
J. Goldhahn could also result in a re-intervention to correct
Institute for Translational Medicine, ETH Zurich, these conditions or to prevent long-term conse-
Zurich, Switzerland
quences. Primarily, patients are neither inter-
e-mail: jgoldhahn@ethz.ch

© ISAKOS 2019 507


V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_49
508 S. Goldhahn et al.

Fig. 49.1  Hierarchy of


complications. The
pyramid illustrates the
hierarchy from causal Legal implication Severity, Legal
factor to patient harm Adverse event relation to perspective
until legal adverse event implant
classification. This or/and treatment
corresponds to the
different perspectives at Consequence Re-intervention, Patient
the right side of the Harm to patient prolonged treatment, perspective
diagram (Source: severe pain,
J. Karlsson et al. A limited function,
practical guide to long-term sequelae
research: design,
execution, and Causal factor Surgical
publication. Surgical/treatment perspective
Arthroscopy. 2011 April complication Surgical technique,
27, 4 Suppl:S92) implant/treatment related,
patient/tissue related

ested in the surgeons’ perspective nor in the On top of the hierarchy, the legal perspective
legal ­perspective. They simply want to get func- determines the relation to any tested implant or
tion and quality of life re-established, and they treatment and classifies the severity according to
regard everything as a complication that devi- established guidelines. It is mostly a subset of the
ates from the normal course of healing and reha- complications that may matter to the patients as
bilitation. In addition, they should get unbiased described above.
information about expected complication risk as In accordance with the guidelines of the
a base for shared decision-making [1]. International Conference on Harmonization
The outlined consequences demonstrate that (ICH) [E2A and E6(R2)] Integrated Addendum
it seems almost impossible to satisfy all per- to GCP guideline] and the ISO 14155:2011(E), a
spectives at the same time. Therefore, a prag- serious adverse event (SAE) is defined as any
matic approach is required that should untoward medical occurrence that:
acknowledge relevance. A severe complication
may lead to a decrease in surgical reputation • Results in death
and/or a withdrawal of an implant with some • Is life-threatening (note: the term life-­
financial consequences for the manufacturer. threatening in the definition of “serious” refers
However, a patient may suffer from the conse- to an event in which the patient was at risk of
quences of a complication for the rest of his/her death at the time of the event; it does not refer
life or even die. Therefore, complications have to an event which hypothetically might have
the highest relevance for the person experienc- caused death if it were more severe)
ing it. Consequently, definitions of complica- • Requires inpatient hospitalization or prolon-
tions should be patient-centred. gation of existing hospitalization
This leads to a hierarchical approach to com- • Results in persistent or significant disability/
plications (see Fig.  49.1). Whereas the surgical incapacity
perspective is based on experience and always • Necessitates medical or surgical intervention
includes reasoning and causality, the patient per- to prevent permanent impairment to a body
spective serves as a filter. Any event without any structure or a body function
harm or consequences to the patients might not • Leads to foetal distress, foetal death or con-
be considered as a complication. genital abnormality or birth defect
49  Reporting Complications in Orthopaedic Trials 509

is not clearly defined and depends on many


Case Vignette confounding variables and on the assessment
In the treatment of an unstable trochanteric method [2, 3, 8]. Therefore, thresholds are
fracture using a dynamic hip screw, the required that distinguish normal from pathologi-
screw was misplaced very close to the cal course of healing. The same is valid for pain
articular surface. The patient claims severe and return to function.
pain during weight bearing. Whereas a certain amount of pain caused by
wound and tissue healing after a surgical inter-
• Surgical perspective: The complication vention is associated with the normal course of
is defined as a screw cut-out. Possible healing, prolonged pain has another cause in
causes can be initial misplacement (sur- most cases. The same is valid for return to func-
gical technique) and/or poor bone qual- tion and activities of daily living. A certain
ity (patient/tissue related). improvement of function with a wide range is
• Patient perspective: The patient experi- expected at given time points after intervention.
ences severe pain and reduced function, However, complete loss of function or signifi-
may have long-term consequences if cantly lower function than expected and subse-
untreated or will face a re-intervention quently impaired activities of daily living should
to prevent them. be considered as a complication.
• Legal perspective: The severity classifi- Thus, for both pain and return to function,
cation depends on the possible re-­ thresholds should be determined for the normal
intervention. Possible relation to implant expected course of healing. Everything outside
depends on the judgement of the sur- should be considered as a complication or the
geon, whether the malpositioning was consequence of a complication. Pain and low
related to poor surgical technique and/or function are often only the symptom of an
device. underlying, often anatomical problem (e.g.
articular step, valgus deformity). If patients
The case example demonstrates differ- report severe pain and/or limitation of function,
ent issues: (1) The patient suffers under it is necessary to search for the underlying
all circumstances regardless of the caus- problem.
ing factor or the legal classification. (2)
The surgeon can influence the classifica-
tion of adverse events, e.g. by accepting 49.4 Essentials of Complication
poor functional outcome or neglecting Reporting
re-intervention.
For each study, the normal course of healing
and rehabilitation including an evidence-based
range should be defined. This includes pain and
49.3 T
 he Normal Course functional status at each follow-up and healing
of Healing of any investigated tissue such as cartilage or
bone.
If the patient perception of a complication is any
deviation from the normal course of healing and
rehabilitation, then a definition of “normal” is Anticipated complications/adverse events
required. Healing of any tissue like bone, carti- should be listed in all study protocols with
lage or tendon has a broad range depending on clear and objective definitions along with
patient characteristics as well as on the specific appropriate scientific references.
intervention. For instance, time to fracture union
510 S. Goldhahn et al.

Table 49.1  For each complication, a minimum set of


information should be documented due to the regulations
• For each complication, a minimum set
to allow clinically meaningful evaluation and reporting
of information should be documented
Domain Variables
due to the regulations to allow clinically
Identification 1. Investigator’s name and phone
meaningful evaluation and reporting. number
• In clinical research, these variables 2. Study name
should be presented as a standard adverse 3. Patient identification (trial number,
event/complication case report form initials, age, gender)
Treatment 4. The treatment number (if applicable,
(CRF) that is adapted for each study. such as in a randomized clinical trial)
5. The name of the suspect medical
product and date of treatment
It is of importance to quantify the standard 6. Product serial number (in case of
SADE)
complication rate known from the clinical litera-
Complication 7. Complication type
ture, the common salvage procedures and the 8. Date of occurrence or onset
outcome that can be expected. 9. Short description (open text field)
In Table  49.1, minimal requirements for Action(s) 10. Subsequent action taken (e.g.
record keeping of complications in clinical stud- operative)
Outcome(s) 11. Outcome of the complication at the
ies are listed. Investigators are asked to fill in one
time of reporting (or end of the
form for each complication; however more than study)
one event may be recorded on the same form if Assessment 12. Seriousness of the event
they occurred simultaneously and were unam- 13. Most likely causative factor, e.g.
biguously causally related (e.g. an implant failure relation to the surgical intervention
or the implant used. We recommend
simultaneously with a loss of reduction). using the four categories presented
Because complications occur as part of a com- in this chapter
plex chain of events, a clear distinction should be Note: This is the minimum information to be collected by
made between: means of an adverse event form/complication case report
form (CRF) to be adapted for each study. Investigators are
asked to fill in one form for each complication; however
• The complication/adverse events themselves
more than one event may be recorded on the same form if
• Their most likely causal trigger factors they occurred simultaneously and were unambiguously
• Their treatment (which could be no action) causally related
• Their consequences or outcomes as illustrated SADE serious adverse device effect
in Fig. 49.2

Causative Complication Action


Treatment Outcome
Factor(s) Adverse Event

Insufficiency in the Changes of planned Functional


operative technique treatment Deficit
(„Surgical standard“) (eg, new surgery)
+ Malalignment
Unplanned (Valgus/varus, etc...)
new surgical Death
intervention

Fig. 49.2  Clear distinctions should be made between the J.  Karlsson et  al. A practical guide to research: design,
complication and adverse event themselves, their most execution, and publication. Arthroscopy. 2011 April 27, 4
likely causal factors, their treatment (which could actually Suppl:S94)
be no action) and their consequences or outcomes (Source:
49  Reporting Complications in Orthopaedic Trials 511

49.5 Classification In clinical studies, a follow-up adverse event/


of Complications complication CRF should be distributed to inves-
tigators to capture this information until compli-
We propose the two main categories and subse- cations are resolved or finally evaluated at the
quent two classes outlined in Table  49.2 for a end of the studies.
classification of complications based on their
most likely causative factor.
Of course, many cases remain where the 49.7 Quality Control
causal relationship is the topic of debate. For
instance, it is still not clear, whether an avascular Active monitoring and quality control are essen-
head necrosis is the result of the surgical treat- tial to avoid or limit under-reporting and mislead-
ment of a humeral head fracture or would corre- ing complication results. To favour completeness
spond to the normal course of disease. and correctness of documentation of complica-
However, careful planning combined with pro- tions, the following measures should be imple-
spective definition of complications and their mented in orthopaedic trials:
causal relationship increases the study quality.
This planning phase may lead to an extensive list 1. Source data verification during monitoring

of anticipated complications [4] but helps to cate- visits.
gorize complications prior to the study and will 2.
Active reporting: implement systematic
result in an unbiased complication analysis at the assessment of any complication at each exam-
end of the study. ination visit (e.g. using standard CRF or ask-
ing if another physician was visited other than
for routine checks).
49.6 Follow-Up 3. Incentive to report: facilitate simple recording
process, and ensure anonymous reporting of
• If an original complication record states complication statistics outside the involved
that the complication was resolved or clinics so that results cannot be traced back to
that the recovery process is completed the individual treating surgeon.
(with or without damages), no further 4. If necessary, additional information on puta-
data are required. tive events may be obtained from the patient’s
• Alternatively, it is necessary to follow family doctor.
up the complication until it is resolved, 5. Evaluation of reported complications by the
in term of its treatment, outcome and study principal investigator, an independent
assessment, and all new information experienced clinician or any specifically
must be documented. established Complication Review Board
(CRB).

Table 49.2  Proposed classification of complications based on their most likely causative factor
Category Class Number Example
Treatment Related to the surgical technique 1a Malpositioning of screws, wrong procedure
related Related to the device/treatment 1b Loosening of polyethylene glenoid due to wear
Patient related Related to local tissue condition 2a Cut-out of correct placed screw due to poor
bone quality
Related to overall patient condition, e.g. 2b Myocardial infarction
systemic
512 S. Goldhahn et al.

The final complication review should be con- Table 49.3  Example of presentation of complication
ducted based on complication/AE forms, as well risks
as additional diagnostics to complete the case. Type of complication n Risk (%) 95% CI
Complication data is reviewed for its clinical per- Post-operative local implant/ 18 10.2 (6.1–15.6)
bone complications
tinence, classification, severity as well as relation
Implant
to the investigated treatment or medical device. Blade migration 1 0.6 (0.01–3.1)
All changes and data corrections should be thor- Implant breakage 3 1.7 (0.35–4.9)
oughly justified and documented. Cut-out 2 1.1 (0.14–4.0)
Other implant complications 2 1.1 (0.14–4.0)
Bone/fracture
49.8 Analysis Loss of reduction 1 0.6 (0.01–3.1)
Neck shortening 8 4.5 (2.0–8.7)
Other bone complications 6 3.4 (1.3–7.2)
A minimum set of complication analyses should
Number of patients N = 177
be conducted in any study. However, it should be n—number of patients with at least one complication (it
noted that if regulatory requirements oblige means the patient can have more than one complication,
investigators to document all complications but for the risk calculation, the number of patients experi-
occurring during a study, only a specific clini- encing complication(s) is used)
Risk—number of patients having a specific complication
cally relevant subset may be analysed to answer a divided by the number of patients being enrolled in the
study objective. It is critical to clearly define study
which complications are included in such a sub- 95% CI—95% binomial exact confidence interval
set and specify the period of observation (e.g.
intraoperative, post-operative, follow-up periods) related to potential causal factors or severity [7];
to allow appropriate interpretation of the results. even mild anticipated complications in the frame-
In the context of prospective clinical investiga- work of clinical research require official reporting
tions, the timing of observation for each patient to authorities if their rate of occurrence is higher
starts with the initiation of treatment or primary than what can be reasonably be expected in any
surgery and ends at the end of the study. For the study. While all complications must be docu-
report of complications, complication risks can mented from a regulatory viewpoint, the primary
be calculated and presented as shown in analysis can be focused on the patient-­ relevant
Table 49.3. complications.
Conducting an independent review of compli-
cations is important for the credibility of safety
Complication risks should be presented data. Complication rates in the literature are
based on the number of patients experienc- most often elusive [5]. In addition, they are likely
ing complications and not on the total num- underestimated when documented by the
ber of documented complications. inventor(s) of any surgical techniques. Despite
all efforts at standardization, the assessment and
reporting of complications will always require
According to our experience, for many surgeons clinical judgement and therefore remain partly
an event that is unrelated to the treatment may not subjective. A Complication Review Board
be considered as a complication and must therefore (CRB) can address such limitation, and can be
not be documented. In addition, clinicians may established also for single-­centre studies. A CRB
sometimes feel they do not need to document can, for instance, consist of two to four orthopae-
events that have no or limited consequences for the dic surgeons (at least one of them should not be
patients to avoid documentation overload. involved in the study), a radiologist and a meth-
Nevertheless, harmonized standards for the con- odologist. It is to be distinguished from any Data
duct of clinical trials define complication as “any Monitoring Committee (DMC) [9] established
untoward medical occurrence” not necessarily as part of large multicentre studies; while the
49  Reporting Complications in Orthopaedic Trials 513

CRB is set to control the relevance and integrity References


of the complication records, the DMC is set to
review the occurrence of complications (i.e. 1. Carlesso LC, MacDermid JC, Santaguida
assess the validated data and decide on the con- LP. Standardization of adverse event terminology and
reporting in orthopaedic physical therapy: applica-
tinuation of a study). The primary role of the tion to the cervical spine. J Orthop Sports Phys Ther.
CRB as proposed is to perform a quality control 2010;40(8):455–63.
and consolidate complication data before their 2. Corrales LA, Morshed S, Bhandari M, Miclau T
analyses. III.  Variability in the assessment of fracture-healing
in orthopaedic trauma studies. J Bone Joint Surg Am.
2008;90(9):1862–8.
3. Davis BJ, Roberts PJ, Moorcroft CI, Brown MF,
Thomas PB, Wade RH.  Reliability of radiographs in
At the end of a clinical study, complica- defining union of internally fixed fractures. Injury.
2004;35(6):557–61.
tions/adverse events should be assessed 4. Goldhahn S, Kralinger F, Rikli D, Marent M,
and discussed by a Complication Review Goldhahn J. Does osteoporosis increase complication
Board in a complication assessment meet- risk in surgical fracture treatment? A protocol com-
ing. The according complication case bining new endpoints for two prospective multicen-
tre open cohort studies. BMC Musculoskelet Disord.
report forms, additional documentary 2010;11:256.
material and all images of the patients 5. Goldhahn S, Sawaguchi T, Audige L, Mundi R,
should be available for such a meeting. Hanson B, Bhandari M, Goldhahn J.  Complication
reporting in orthopaedic trials. A systematic review of
randomized controlled trials. J Bone Joint Surg Am.
2009;91(8):1847–53.
Take-Home Messages 6. Hutchinson D, editor. The Trial Investigator’s GCP
Handbook: a practical guide to ICH requirements.
• For each orthopaedic study, the normal course Richmond: Brookwood Medical Publications;
of healing including an evidence-based range 1997.
should be defined. 7. ISO. ISO_14155:2011(E). Clinical investigation of
• Anticipated complications/adverse events medical devices for human subjects - Good clinical
practice. International Standard Organization; 2011.
should be defined prior to study start together p. 1–66.
with a minimum set of documentation. 8. Morshed S, Corrales L, Genant H, Miclau T
• It is necessary to follow up a complication III. Outcome assessment in clinical trials of fracture-­
until it is resolved. healing. J Bone Joint Surg Am. 2008;90(Suppl
1):62–7.
• An independent Complication Review Board 9. USFDA.  Guidance for clinical trial sponsors.
should review and analyse all information Establishment and operation of clinical trial data mon-
about potential orthopaedic complications. itoring committees. CBER; 2006. p. 1–34.
Understanding and Addressing
Regulatory Concerns in Research
50
Jason L. Koh, Denise Gottfried, Daniel R. Lee,
and Sandra Navarrete

50.1 Introduction appropriate protocols can be quite expensive and


laborious involving repeated meetings with regu-
Addressing regulatory concerns is important latory officials. The contact of the trial must be
when any clinical trial is performed and espe- performed with respect to strict compliance with
cially if a new drug or device is being tested. the protocol and can be subject to audit.
Typically, these regulations are intended to pro- Other devices that are largely similar to previ-
tect the health and safety of human or animal ously approved or predicate devices can be
subjects and patients but also can be related to assigned a different classification, and the regula-
ethical concerns. These help determine the appro- tory requirements may be much less.
priate conduct of a trial. Regulatory requirements New drugs or materials also face different
typically depend on the known levels of safety regulatory burdens depending on their classifica-
and efficacy of a drug or device and also on the tion. An entirely new drug typically has to go
subjects being tested. Extensive requirements through a multiphase approval process, including
need to be met for testing on human or animal demonstration of safety and efficacy. Even an
subjects. Significant training and an approved existing drug that is seeking additional indica-
protocol are necessary before performing tions for use will have to go through a formal
research on humans or animals. approval process.
With device testing, the regulatory burden Extensive regulation also guides the use of
depends on the classification of the device. An cell therapies and tissue. In the United States and
entirely novel device will typically have to go in many other countries, the use of more than
through a government-regulated multistage pre- minimally manipulated cell therapies requires
market approval process. The development of the approval through an extensive formal regulatory
process. This has contributed to the relatively
lower amount of trials in such areas as stem cell
J. L. Koh (*)
research in the United States when compared to
Department of Orthopaedic Surgery, NorthShore
University HealthSystem, Evanston, IL, USA some other nations.
Ethical reasons may also drive regulation.
University of Chicago, Chicago, IL, USA
Embryonic stem cell research in the United States
D. Gottfried
has been significantly curtailed by policy that pro-
TekTeam, LLC, Palo Alto, CA, USA
hibited federal funding of such research due to ethi-
D. R. Lee
cal reasons related to the embryo source of the cells.
Scaffold Biologics, Inc., San Antonio, TX, USA
Many countries ban aspects of human cloning, typi-
S. Navarrete
cally citing “human dignity” as the reason.
Independent Consultant, Austin, TX, USA

© ISAKOS 2019 515


V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_50
516 J. L. Koh et al.

The unregulated use of medical technologies are protected, consistent with the principles that
can result in significant harm to patients as well have their origin in the Declaration of Helsinki,
as poor research. Failure by clinicians or research- and that the clinical trial data are credible. The
ers to comply with regulatory requirements can objective of the ICH GCP Guideline is to provide
result in significant penalties, including being a unified standard for the European Union (EU),
banned from performing certain research or clini- Japan, and the United States to facilitate the
cal activities, civil liability, or criminal punish- mutual acceptance of clinical data by the regula-
ment including fines or even imprisonment. tory authorities in these jurisdictions. The guide-
Therefore it is critical to be aware of the different line was developed with consideration of the
regulatory issues regarding research and to be current good clinical practices of the European
meticulously compliant. Union, Japan, and the United States, as well as
The burden of meeting regulations may be those of Australia, Canada, the Nordic countries,
quite high and can limit the speed of progress in and the World Health Organization (WHO) [1].
research. There is a balance between the goals of The ICH E6 (R1) [2, 3] guidelines were devel-
protecting the public from inadequately tested oped in 1996 to harmonize the requirements for
drugs or devices and permitting the expeditious registering medicines in Europe, Japan, and the
approval of new therapies that are of benefit to United States. Since they are internationally rec-
patients. The kind and type of regulation can vary ognized, they permit all clinical trial evidence
related to politics, societal needs, or different from one country to be accepted by another coun-
weightings of ethical considerations. try. The general principles of ICH expand on the
Declaration of Helsinki which are a set of ethical
principles applicable to human research devel-
50.2 Good Clinical Practice oped by the World Medical Association (WMA).
The ICH GCP E6 (R2) [4, 5] was finalized in
The new researchers must be aware of their moral 2016 and addresses the increased complexity of
and ethical obligation to conduct research in a clinical trials and electronic data recording and
manner that assures the public that the rights, reporting. The new draft is the biggest revision of
safety, and well-being of human research partici- the guideline in over 20 years.
pants (i.e., subjects) are protected. A new clinical researcher should be aware of
Good Clinical Practice (GCP) is defined by the terms sponsor and investigator to understand
the International Conference on Harmonisation his/her role and responsibilities. Clinical trials
(ICH) as an international ethical and scientific may be initiated by a sponsor, investigator, or
quality standard for designing, conducting, someone serving as both. Clinical trials involve
recording, and reporting trials that involve the several teams responsible for specific roles. The
participation of human subjects. Compliance investigator may delegate certain roles to his/her
with this standard provides public assurance that team, but is responsible for the conduct of the
the rights, safety, and well-being of trial subjects clinical trial.

Fact Box 50.1: ICH Guidance Is Divided into Four Categories, and ICH Topic Codes Are Assigned
According to These Categories [1]
Q: quality S: safety E: efficacy M: multidisciplinary
Chemical and Safety of the medicinal Clinical studies in human Issues that do not fall in
pharmaceutical quality of product (toxicology, subjects (dose response, the other categories
a drug (stability, carcinogenicity, GCP, trial design, (MedDRA—standardized
validation, impurity genotoxicity) conduct, analysis, AE medical coding for
testing) reporting) adverse events)
50  Understanding and Addressing Regulatory Concerns in Research 517

Fact Box 50.2


The 13 principles of ICH GCP [2]
1. Clinical trials should be conducted in accordance with the ethical principles that have their origin in the
Declaration of Helsinki, and that are consistent with GCP and the applicable regulatory requirement(s)
2. Before a trial is initiated, foreseeable risks and inconveniences should be weighed against the anticipated
benefit for the individual trial subject and society. A trial should be initiated and continued only if the
anticipated benefits justify the risks
3. The rights, safety, and well-being of the trial subjects are the most important considerations and should
prevail over interests of science and society
4. The available nonclinical and clinical information on an investigational product should be adequate to
support the proposed clinical trial
5. Clinical trials should be scientifically sound, and described in a clear, detailed protocol
6. A trial should be conducted in compliance with the protocol that has received prior institutional review
board (IRB)/independent ethics committee (IEC) approval/favourable opinion
7. The medical care given to, and medical decisions made on behalf of, subjects should always be the
responsibility of a qualified physician or, when appropriate, of a qualified dentist
8. Each individual involved in conducting a trial should be qualified by education, training, and experience to
perform his or her respective task(s)
9. Freely given informed consent should be obtained from every subject prior to clinical trial participation
10. All clinical trial information should be recorded, handled, and stored in a way that allows its accurate
reporting, interpretation and verification
11. The confidentiality of records that could identify subjects should be protected, respecting the privacy and
confidentiality rules in accordance with the applicable regulatory requirement(s)
12. Investigational products should be manufactured, handled, and stored in accordance with applicable good
manufacturing practice (GMP). They should be used in accordance with the approved protocol
13. Systems with procedures that assure the quality of every aspect of the trial should be implemented

Fact Box 50.3: The ICH GCP E6 (R2) Addendum Introduced 26 New Items Covering 3 Main Areas of
Clinical Research: Data Management and Sponsor and Investigator Responsibilities [6, 7]
Number of
amended Name of the amended
items section Number and name of the amendment
1 Introduction NA
4 Glossary • Certified Copy 1.11.1
• Monitoring Plan 1.38.1
• Monitoring Report 1.39
• Validation of computerized systems 1.60.1
1 The principles of ICH • The principles of ICH GCP 2.10
GCP
3 Investigator • Adequate Resources 4.2.5, 4.2.6
• Records and Reports 4.9.0
16 Sponsor • Quality Management 5.0, 5.0.1, 5.0.2, 5.0.3, 5.0.4, 5.0.5, 5.0.6,
5.0.7
• Contract Research Organization (CRO) 5.2.1, 5.2.2
• Trial Management, Data Handling, and Record Keeping 5.5.3(b),
5.5.3 (h)
• Monitoring 5.18.3, 5.18.6(e), 5.18.7
• Noncompliance 5.20.1
1 Essential documents for • Essential documents for the conduct of Clinical Trial 8.1
the conduct of a clinical
trial
518 J. L. Koh et al.

ICH GCP defines an investigator as a person


responsible for the conduct of the clinical trial at Fact Box 50.4: Examples of GCP Training
a trial site. If a trial is conducted by a team of Resources
individuals at a trial site, the investigator is the • CITI Program online training course [8, 9].
responsible leader of the team and may be called • Barnett International Good Clinical
the principal investigator [4, 5]. Practice: A Question & Answer
ICH GCP defines sub-investigator as any indi- Reference Guide 2018 [10, 11]
vidual member of the clinical trial team designated • Clinical Device Group—Workshops on
and supervised by the investigator at a trial site to CD [12, 13].
perform critical trial-related procedures and/or to
make important trial-related decisions (e.g., asso-
ciates, residents, research fellows) [4, 5]. documents that must be maintained by the
ICH GCP defines sponsor as an individual, investigator include the protocol and amend-
company, institution, or organization which takes ments, financial aspects of the trial, institutional
responsibility for the initiation, management, review board (IRB) or independent ethics com-
and/or financing of a clinical trial [4, 5]. mittee (IEC) approvals, decoding procedures
ICH GCP defines sponsor-investigator as an for blinded trials, curriculum vitae for investi-
individual who both initiates and conducts, alone gators and sub-­ investigators, and signature
or with others, a clinical trial and under whose sheets to name a few. Since these files are sub-
immediate direction the investigational product is ject to audit or inspection at any time, they must
administered to, dispensed to, or used by a sub- not be destroyed, and the duration they must be
ject. The term does not include any person other kept should be outlined at the start of each clin-
than an individual (e.g., it does not include a cor- ical investigation.
poration or an agency). The obligations of a Compliance to GCP will help ensure scientific
sponsor-investigator include both those of a quality data when conducting trials that involve
sponsor and those of an investigator [4, 5]. human participation because they provide the
It is important to note that corporate sponsors framework for ethical conduct and assure study
may ask investigators to complete GCP training documentation is complete. Regardless if the trial
prior to enrolling clinical research subjects as a is a single-center or multicenter, small patient
means to ensure he/she understands their obliga- population or large, or whether it is sponsor initi-
tions during the study. Typically these training ated or investigator initiated, GCP should be fol-
courses can be taken online, but they do have a lowed by all parties involved in the clinical trial.
cost associated with them. Individual countries or institutions may have
It is worth mentioning that the ICH GCP additional or similar GCP framework, so the
Guideline clearly outlines essential documents for investigator should refer to the appropriate legis-
the conduct of a clinical trial. Essential documents lation in which the research is being conducted.
are those documents which individually and col- Following ICH GCPs facilitate regulatory
lectively permit evaluation of the conduct of a approvals across countries, and in turn new treat-
trial and the quality of the data produced. These ment options become available to patients.
documents serve to demonstrate the compliance
of the investigator and sponsor and monitor with
the standards of Good Clinical Practice and with 50.3 ICMJE: Clinical Trial
all applicable regulatory requirements [4, 5]. Registration and Data
The Trial Master File (commonly referred to Sharing
as the Regulatory File at the investigational
site) houses the essential documents of the clin- The International Committee of Medical Journal
ical trial and must be maintained with both the Editors (ICMJE) requires clinical research studies
sponsor and investigator. Examples of essential that began enrolling subjects on or before July 1,
50  Understanding and Addressing Regulatory Concerns in Research 519

Fact Box 50.5: Clinical Trial Registry Fact Box 50.6: Important Dates Related to
Resources ICMJE Data Sharing [14, 15]
www.clinicaltrials.gov [16]: ClinicalTrials. • As of July 1, 2018, manuscripts submit-
gov is a resource provided by the US ted to ICMJE journals that report the
National Library of Medicine. results of clinical trials must contain a
http://www.isrctn.com [17]: The data-sharing statement.
ISRCTN registry is a primary clinical trial • Clinical trials that begin enrolling par-
registry recognized by WHO and ICMJE ticipants on or after January 1, 2019,
that accepts all clinical research studies must include a data-sharing plan in the
(whether proposed, ongoing, or com- trial’s registration. If the data-sharing
pleted), providing content validation and plan changes after registration, this
curation and the unique identification num- should be reflected in the statement sub-
ber necessary for publication. mitted and published with the manu-
http://www.who.int/ictrp/network/pri- script and updated in the registry record.
mary/en/ [18]: Primary Registries in the
WHO Registry Network meet specific cri-
teria for content, quality and validity, (ISRCTN) [17] meet those criteria and have
accessibility, unique identification, techni- become the most widely used registries to meet
cal capacity, and administration. Primary the ICMJE requirement.
Registries meet the requirements of the These registries contain information about the
ICMJE. clinical studies such as the investigation product
under research, study objectives, study design,
participating investigators and institutions, and
2005, will be entered in a public registry at the funding.
beginning of enrollment in order to be considered The purpose of registering a clinical trial is to
for publication. The ICMJE did not specify a regis- prevent bias and selective reported outcomes, to
try, but provided criteria for a qualifying registry. prevent unnecessary duplication of research, to
The ICMJE defines a clinical trial as any keep the public informed about planned or
research project that prospectively assigns people ongoing trials, and to give ethics boards a plat-
or a group of people to an intervention, with or form to review similar works they may be
without concurrent comparison or control groups, considering.
to study the relationship between a health-related The ICMJE now requires that authors include a
intervention and a health outcome. Health-related plan for data sharing as a component of clinical
interventions are those used to modify a biomedi- trial registration. This plan must include where the
cal or health-related outcome; examples include researchers will house the data and, if not in a pub-
drugs, surgical procedures, devices, behavioral lic repository, the mechanism by which they will
treatments, educational programs, dietary inter- provide others access to the data, as well as other
ventions, quality improvement interventions, and data-sharing plan elements outlined in the 2015
process-of-care changes. Health outcomes are any Institute of Medicine report (e.g., whether data
biomedical or health-related measures obtained in will be freely available to anyone upon request or
patients or participants, including pharmacoki- only after application to and approval by a learned
netic measures and adverse events. The ICMJE intermediary, whether a data use agreement will be
does not define the timing of first participant required) [19, 20]. ClinicalTrials.gov has provided
enrollment, but best practice dictates registration data-sharing plans.
by the time of first participant consent [14, 15]. Data-sharing statements must indicate the fol-
ClinicalTrials.gov [16] and the International lowing: whether individual de-identified partici-
Standard Randomized Controlled Trial Number pant data (including data dictionaries) will be
520 J. L. Koh et al.

shared; what data in particular will be shared; and related institutional and government require-
whether additional, related documents will be ments for the protection of the human subjects
available (e.g., study protocol, statistical analysis involved in the research.
plan, etc.); when the data will become available The ethics of clinical research have developed
and for how long; and by what access criteria data from historical lessons that had few rules and
will be shared (including with whom, for what regulations to protect the clinical subject. Some
types of analyses, and by what mechanism). ethical guides that have influenced current ethics
As the clinical research industry moves toward principles include the Nuremberg Code (1947),
increased transparency, the new researcher Declaration of Helsinki (2000), and the Belmont
should be aware of the considerations when reg- Report (1979). The National Institutes of Health
istering clinical trial results and the parameters [21, 22] defines seven main principles as a guide
surrounding manuscript publication. to conduct clinical research. We review these
below, along with examples of questions that the
researcher should consider in the development of
50.4 Basic Principles, an ethical and robust clinical program:
Considerations,
and Essential Requirements 1. Social and clinical value: what is the potential
of Any Clinical/Human value of this research to the subjects involved,
Research Project to the community, and to the future of medi-
cine and improvement in medical care?
Prior to the start of any human clinical research, 2. Scientific validity: what study design features
it is important to define the research and hypoth- will reduce bias, and support appropriate
eses, understand the experimental vs. investiga- hypotheses testing for the results to be inter-
tional nature of the research, understand and be pretable, and applied to future research and/or
able to articulate risks, and to have a plan for risk medical practice?
mitigation to minimize risks prior to and during 3. Fair subject selection: what a priori criteria
the study. Additionally, there are local, regional, should be in place to avoid inclusion of sub-
and federal requirements for conduction of jects that are outside the specific population
research involving human subjects that one must for research and to avoid exclusion of subjects
be aware of. Most institutions have criteria and due to bias? What screening practices and
other requirements for the evaluation, approval, documentation of the rationale for inclusion
and implementation of human research done on-­ and exclusion of subjects need to be in place
site or with affiliates, so these are also important to reduce potential for bias and to allow for
to understand prior to embarking on the human appropriate subject selection into the study?
research project. Key responsibilities of the 4. Favorable risk-benefit rationale: what clinical
investigator, the institution, and those involved in and protocol risks are evident? How are risks
supporting the study from a sponsorship perspec- mitigated and managed prior to the study and
tive that are applicable to most human research during the study? How will subjects be
projects are discussed below. informed of the risks prior to enrollment?
What action will be taken if new risks are
identified?
50.4.1 Responsibilities 5. Independent review: what processes are in

of the Investigator, place to support the requirement for indepen-
Institution, and Sponsor dent review of the protocol and informed con-
sent prior to starting the study and during the
One of the most important responsibilities of study? Will a separate data review committee
each party involved is to ensure that the study is be provided to support safety and medical
conducted in accordance with ethical principles monitoring? Will the study be audited by an
50  Understanding and Addressing Regulatory Concerns in Research 521

independent party or parties? Will FDA review Each country as has a federal authority, such
be required prior to initiating the clinical as the Food and Drug Administration in the
research? United States, which oversees and allows the
6. Informed consent (written and signed): what research to occur in accordance with federal law
do subjects need to know about the research, and regulation. In some cases, the federal
the materials used in the research (e.g., drugs, authority may require review and approval of
biologics, devices, measures and tools applied, the research prior to initiation, in addition to the
surgery or other interventions), the time com- local/regional IRB or ethics committee review
mitment involved, their rights as a human sub- and approval. Most often the deciding factor as
ject, how their data will be used and what to pre-approval of research by a federal health
security will be applied to ensure privacy, and authority is the risk to human subjects associ-
what to do if they have a question or an issue ated with the protocol and the risk associated
during the study, for example? with the materials or products involved. See
7. Respect for potential and enrolled subjects: Sect. 50.3, below, for additional details regard-
what effort will be made to ensure that sub- ing federal health authority considerations and
jects are properly treated with respect and dig- investigational medicinal products, therapies,
nity throughout the study? How will access to and/or medical devices. The investigator should
subject information be controlled, have lim- check with each country-specific regulatory
ited access, and maintained? What training authority for application requirements and ver-
will study staff be given to ensure respect, pri- ify the level of federal authority involvement
vacy, and best research practices? with the IRB or ethics committee involved in
research review.
Two of the most important aspects of clinical
research include (1) independent review com-
mittees to assess and review the protocol prior 50.4.2 Purpose of the Research
to approving clinical research and (2) the
informed consent documentation, process, and The purpose of the research and intent for data
assignment/training of personnel involved in use are important to define early on. An investi-
administration of informed consent. Independent gator may be conducting clinical research for the
review committees (e.g., often referred to as an purposes of publication, institutional quality
IRB or ethics committee) are comprised of a improvement, addressing questions regarding the
group of individuals who will assess the clinical safety or efficacy of a new procedure, and/or
trial and will review the protocol and any mate- assessing the safety and effectiveness of an inves-
rials that are intended for the subject such as the tigational product. Plans as to what level of over-
informed consent, recruitment phone scripts, sight and documentation required will be driven
and advertisements for supporting community by the type of research involved, the purpose of
awareness of the study. The clinical trial should the research, and what parties are interested.
not begin until written approval of the protocol Additionally, a clinical trial proposal should
of the submitted documents is received. Only be supported by publications in a literature review
the approved documents may be used for the that will serve as the foundation for manuscript
study, and if there are any changes desired to the references and also to support the proposed
documents approved, the committee must research scientific validity and study design. Be
review and approve the changes prior to the use descriptive of the approach, discuss what was
of those new materials. This oversight during changed since previous works were conducted,
human research supports proper consideration and propose the new research to fully develop
and independent perspective and reduces bias hypotheses. Pilot studies may be used to support
and the potential for gaps in human subject hypothesis(es) and to demonstrate that subse-
protection. quent studies are warranted.
522 J. L. Koh et al.

50.4.3 Trial Operations Risk


Management and Fact Box 50.7: ICH GCP E6 Essential
Documentation to Support a Documents [23, 24]
Well-Controlled Trial Essential Documents are those documents
which are used to evaluate the quality of a
Several clinical trial documents, including essen- clinical trial and the quality of the data pro-
tial documents [23, 24], listed should be in place duced. The various documents (abbrevi-
prior to enrolling subjects in a clinical trial. ated list) are grouped into three sections
Documents include, but are not limited to, the according to the stage of the trial during
investigational plan, standard operating proce- which they will normally be generated:
dures, accountability of the investigational prod-
uct, monitoring plan, data management plan, and 1. Before the clinical phase of the trial

statistical analysis plan. Deficiencies found dur- commences
ing regulatory audit include inadequate monitor- (a) Investigator’s Brochure (IB)
ing, investigator noncompliance, informed (b) Protocol
consent deficiencies, and investigational product (c) Informed consent
accountability. It is important to conduct risk (d) Case report forms (CRF)
assessment and provide a structured approach to (e) Agreements with investigator/insti-
risk mitigation. Evaluating the study sites, moni- tution, sponsor, and CRO
toring the frequency, and establishing a clinical (f) IRB/EC/CA approvals
events committee and/or data safety monitoring (g) Lab/test procedures
board are key to addressing operational deficien- 2. During the clinical conduct of the trial
cies. For issues found in the protocol design, pro- (a) Updates/revisions to IB, protocol
active measures should be taken so that they are (b) Approvals by EC/CA
not repeated in future studies. Also, swift correc- (c) ICs and CRFs
tion of issues is important, along with documen- (d) Adverse events
tation of the correction and follow-up to ensure (e) Subject enrollment
that the correction made was effective action 3. After completion or termination of the
taken. trial
(a) Investigational product accountability
(b) Subject list
50.4.4 Subject Safety (c) Trial closeout
(d) Final clinical study reports
Ethical conduct of a clinical trial includes pro-
tecting the rights, interests, and welfare of sub-
jects throughout the duration of a clinical trial. responsibilities can be found in the ICH
Many study team members, such as the ethics Harmonized Guideline, Integrated Addendum to
committee, monitoring boards, and sponsor and ICH E6 (R1): Guideline for Good Clinical
investigator, play a role in subject safety. It is the Practice E6 (R2) [25, 26].
investigator’s responsibility to ensure ethics com-
mittee study approval(s) are in place, ethics com-
mittee continuing review, adherence to the 50.4.5 Detailed Protocol
protocol, the informed consent process is fol-
lowed and the informed consent forms are signed A clinical trial protocol provides the background
and dated accordingly, adherence to GCP, adverse for the research and the design of the study and is
events are properly reported, and control of the instructional tool for the investigational site(s)
investigation product and to allow adequate time to conduct the study. ICH GCP [25, 26] provides
for monitoring the study. A full list of investigator guidance for the contents of a clinical protocol
50  Understanding and Addressing Regulatory Concerns in Research 523

and protocol amendments. Sections of a protocol informed consent goes beyond a signature. The
should include: informed consent should allow for dialogue about
the potential subject’s participation. The poten-
• General Information tial subject should be given adequate time to
• Background Information review the information, consider all options, and
• Trial Objectives and Purpose ensure the potential subject understands what is
• Trial Design being asked in order to participate and volun-
• Selection and Withdrawal of Subjects tarily agree to participate in the study. ICH GCP
• Treatment of Subjects [25, 26] outlines informed consent of trial sub-
• Assessment of Efficacy jects in detail.
• Assessment of Safety
• Statistics
• Direct Access to Source Data/Documents 50.4.8 Data Collection
• Quality Control and Quality Assurance and Documentation
• Ethics
• Data Handling and Record Keeping Data collection can be paper-based or collected
• Financing and Insurance via electronic data capture (EDC), each having
• Publication Policy their pros and cons. No matter whether paper or
• Supplements electronic data capture is the method to be used
in a clinical trial, the same supporting framework
should be established to ensure minimal data
50.4.6 Qualified Medical Personnel errors. When developing case report forms
and Trained Team (CRFs), ensure you are collecting pertinent data
to be analyzed. Extra data points that are not
The principal investigator is responsible for the meaningful to the research cause burden to the
clinical trial conducted at his/her site. He/she may investigational sites. A sponsor or investigational
delegate qualified personnel to conduct certain site may use source document worksheets to cap-
aspects of the cynical trial; however the principal ture data that they may not typically capture in
investigator ultimately remains responsible for private practice. Check with your regulatory
their actions. A Delegation of Authority log is pro- agency to ensure worksheets are acceptable. Any
vided to each site, and each member of the clinical type of source where the data is first documented
trial team will outline the duties they will perform. is auditable, such as electronic medical records
ICH GCP [25, 26] outlines the investigator’s quali- (EMR), radiology, and even the miscellaneous
fications and agreements which outlines that the paper used to record data such as height, weight,
investigator should be qualified by education, and vital signs.
training, and experience, they should be familiar Ensure a data management plan (DMP) is in
with the investigational product, they should com- place prior to the study. Proper data quality man-
ply with GCP, the investigator/institution should agement will help to ensure minimal errors and
permit monitoring and auditing, and the investiga- quality data that will support the study and
tor should maintain a list of qualified persons to hypothesis.
whom he/she has delegated trial-­related duties.

50.4.9 Safety Reports


50.4.7 Informed Consent Process
The clinical protocol will define what constitutes
Properly conducting the informed consent pro- an adverse event and serious adverse event (addi-
cess with potential subjects is extremely impor- tional and differing definitions should be pro-
tant to ensure subject protection. The process of vided based on whether the investigational
524 J. L. Koh et al.

The definition of intellectual property will


Fact Box 50.8: ICH GCP Safety vary between each country’s patent laws. The
Reporting [27, 28] scope of an invention and ownership of the data
All serious adverse events (SAEs) should will be defined and agreed to in the CTA.
be reported immediately to the sponsor
except for those SAEs that the protocol or
other documents (e.g., Investigator’s 50.5 Regulatory Considerations:
Brochure) identify as not needing immedi- Competent Authorities
ate reporting. The immediate reports should and Research Leading
be followed promptly by detailed, written Toward a Commercial
reports. The immediate and follow-up Product
reports should identify subjects by unique
code numbers assigned to the trial subjects Clinical research has as its goals to protect the
rather than by the subjects’ names, per- patient or subject but also to contribute or gener-
sonal identification numbers, and/or ate knowledge which can be of use to patients or
addresses. The investigator should also other clinicians or researchers. Regulatory
comply with the applicable regulatory requirements for clinical research studies vary
requirement(s) related to the reporting of depending on the type of investigation. If the new
unexpected serious adverse drug reactions researcher is involved in any research which
to the regulatory authority(ies) and the require human clinical subjects or could lead to a
IRB/IEC. commercial product, they need to be aware of
Adverse events and/or laboratory abnor- some additional considerations so that the
malities identified in the protocol as critical research effort can be utilized to its fullest poten-
to safety evaluations should be reported to tial. The new researcher should begin with an
the sponsor according to the reporting understanding of the regulatory considerations
requirements and within the time periods and differences of the commercial market(s) they
specified by the sponsor in the protocol. may target for a regulatory submission. The new
For reported deaths, the investigator researcher should familiarize themselves between
should supply the sponsor and the IRB/IEC device, drug, and biologic regulations as they per-
with any additional requested information tain to the product and to the jurisdiction where
(e.g., autopsy reports and terminal medical regulatory approval may be sought. For many new
reports). drugs and biologics, clinical studies will be
required to demonstrate safety and effectiveness.
For devices in many markets, the landscape can
product is a drug or device). Regulations for become even more complex as they can be
reporting safety events vary from country to assigned to one of three regulatory classes (Class
county, but they have similar procedures for clas- I, Class II, and Class III) based on the level of risk
sifying adverse events. or control necessary to assure the safety and effec-
tiveness of the device. This device classification
then defines the regulatory requirements for an
50.4.10  Intellectual Property approval of a new device and regulatory control,
and Inventions and requirements increase from Class I to Class II
to Class III. Depending on the device classifica-
The Clinical Trial Agreement (CTA) is the legally tion, clinical studies may or may not be required.
binding document between the investigator and Each country has its own regulatory authority
the sponsor. Sections include, but are not limited with its own regulations for approving clinical
to, publication, data ownership, intellectual prop- study protocols and also for conducting clinical
erty, indemnification and insurance, adverse studies when testing and approving a new device,
event reimbursement, and the study budget. drug, or biologic. The following table lists the
50  Understanding and Addressing Regulatory Concerns in Research 525

major global regulatory or competent authorities The competent authorities, singular or multi-
(Tables 50.1, 50.2, and 50.3). ple in some cases, oversee clinical studies in
The websites can provide a source of informa- their respective countries. Initially their respon-
tion regarding regulatory requirements, processes, sibilities include reviewing and approving clini-
or expectations with regard to all aspects of the cal trial protocols and then ensuring that clinical
product. The competent authorities may also offer trials comply with national regulations and inter-
guidance documents which more definitively national guidelines. After the clinical studies,
describe information and data requirements for the development, and subsequent approval, they also
development, clinical studies, and regulatory pro- have quality assurance authority to ensure the
cess for defined indications. The guidance docu- production, distribution, labeling, and safety
ments may also outline the expectations for each monitoring of new or existing devices, drugs,
phase of clinical development (Phases I–III). and biologics.

Table 50.1  National competent authorities in major global markets: European Medicines Agency
European Medicines Agency
Austria Austrian Agency for Health and Food http://www.ages.at/
Safety
Belgium Federal Agency for Medicines and www.fagg-afmps.be/
Health Products
Croatia Agency for Medicinal Products and http://www.halmed.hr/en/
Medical Devices (HALMED)
Denmark Danish Health and Medicines Authority www.laegemiddelstyrelsen.dk
Germany Federal Office of Consumer Protection and www.bvl.bund.de
Food Safety
Greece National Organization for Medicines www.eof.gr
Italy Ministry of Health http://www.salute.gov.it/
Netherlands Healthcare Inspectorate www.igz.nl
Poland Office for Registration of Medicinal www.bip.urpl.gov.pl
Products, Medical Devices, and Biocidal
Products
Spain Spanish Agency for Medicines and Health www.aemps.gob.es
Products
United Medicines and Healthcare products https://www.gov.uk/government/organisations/
Kingdom Regulatory Agency medicines-and-healthcare-products-regulatory-agency

Table 50.2  Health authorities in Latin America


Health authorities in Latin America
Brazil Ministério da Saúde http://portalms.saude.gov.br/
Mexico Secretaría de Salud https://www.gob.mx/salud
Argentina Ministerio de Salud https://www.argentina.gob.ar/salud

Table 50.3  Health authorities in Latin America


Health authorities in Asia-Pacific
India Central Drugs Standard Control Organization https://cdscoonline.gov.in/CDSCO/homepage
China China Food and Drug Administration http://eng.sfda.gov.cn/WS03/CL0755/
Japan Ministry of Health, Labour and Welfare http://www.mhlw.go.jp/english/index.html
Australia Therapeutic Goods Administration www.tga.gov.au
Health authorities in North America
Canada Health Canada https://www.canada.ca/en/health-canada.html
United States Food and Drug Administration www.fda.gov
526 J. L. Koh et al.

50.5.1 Role of Sponsors and Clinical sometimes all of the sponsor’s trial responsibili-
Research Organizations (CRO) ties. In some cases, central laboratory services are
an important ingredient of clinical trials, conduct-
If the research involves a sponsor, many times the ing work such as processing blood samples and
investigator has limited interaction with the com- reading radiographic images. Sponsors and some-
petent authority. A clinical trial sponsor can be an times competent authorities may require that one
individual, an investigator, company, institution, or single-source process sample so there is a stan-
organization that takes responsibility for the initia- dardized process, results are reliable and repro-
tion, management, and financing of a clinical trial. ducible, and data is collected and stored in a
A sponsor can be a device, pharmaceutical or bio- centralized facility. The central laboratory ser-
tech company, a non-profit organization such as a vices can be considered a specialized CRO.
research fund, a government organization or the Of note, investigators must be able to report
institution where the trial is to be conducted, or the all results of the clinical trial, regardless of out-
individual investigator. The sponsor will assume come. In the United States, there is a requirement
the responsibilities such as protocol development, that results of trials of FDA-approved products to
financing the trial, and will seek permission for be posted within 12  months of trial completion
trial initiation from the competent authority or on www.clinicaltrials.gov.
authorities. The competent authority will interact
with the sponsor and approves the trial protocol
that is provided to the investigator. The sponsor 50.6 Elements of the Technical
can have the following responsibilities: Document

• Submitting the plan for the clinical trial to the At the conclusion of the clinical research on a
competent authority for approval device, drug, or biologic, the next steps may
• Informing clinical investigators about the test involve seeking regulatory approval for commer-
article, its safety and instructions for use, as cial marketing of the investigational product. The
well as training the staff and facility regarding compilation of the technical file is a critical step in
its use and handling the regulatory approval process and includes
• Making sure there are an appropriate number detailed information about the design, function,
of test articles for the investigation composition, use, claims, and clinical evaluation
• Ensuring the trial protocol is properly of your drug, device, or biologic. The Common
reviewed by an experienced EC Technical Document (CTD) was designed to pro-
• Monitoring the trial to ensure the protocol is vide a common format between Europe, the
being followed, data collection is accurate, United States, and Japan for the technical docu-
adverse events are reviewed and reported, and mentation included in an application for the regis-
all regulations are complied with tration of a human device, drug, or biologic
product. The agreement to harmonize and assem-
If the research involves a clinical research ble all the quality, safety, and efficacy information
organization (CRO), many of the responsibilities in a common format was a major advance in the
of the investigator are delegated to the CRO. CROs global regulatory review process and enabled
are independent companies providing research implementation of good review practices. For
services for the device, drug, and biologic indus- device, drug, or biologic companies, it eliminated
try and function as outsourcing of tasks related to the need to reformat the information for submis-
clinical trials. Such outsourcing services can be sion to the different ICH competent authorities. A
related to project management, trial monitoring, CTD is a comprehensive description of your prod-
data collection, and medical statistics work. When uct and demonstrates compliance with the require-
a CRO is contracted by a sponsor, it and their des- ments of the applicable regulatory requirements.
ignated clinical trial monitor (CTM) or clinical For devices in the European Union, the directives
research associate (CRA) takes on many and that need to be met include Medical Devices
50  Understanding and Addressing Regulatory Concerns in Research 527

Directive 93/42/EEC (MDD) (transitioning to ER# Scope


Medical Device Regulation 2017/745 (MDR)), In 6a Clinical evaluation
Vitro Diagnostic Medical Devices Directive 98/79/ Part II—design and construction requirements
EC (IVDD), and Active Implantable Medical 7 Chemical, physical and biological
properties
Devices Directive 90/385/EEC (AIMDD).
8 Infection and microbial
contamination
9 Construction and environment
50.7 Essential Requirements properties
for Medical Devices: MDD 10 Device with a measuring function
Annex I, 93/42/EEC 11 Protection against radiation
12 Devices with an energy source
ER# Scope 13 Information supplied by the
manufacturer
Part I—general
requirements
1 Risk reduction and acceptable risk/ The CTD is organized into five modules [29,
benefit 30] (Fig.  50.1). Module 1 is region specific, and
2 Safety and risk controls
Modules 2, 3, 4, and 5 are intended to be common
3 Intended performances
4 Lifetime of the devices for all regions. Please refer to the ICH guidances
5 Transportation and storage for industry: M4Q The CTD, Quality [31, 32];
6 Side-effects must constitute M4S The CTD, Safety [33, 34]; and M4E The
acceptable risk CTD, Efficacy [35] for additional information [36].

CTD Triangle

Not part
of the CTD
Regional
administrative
information
Module 1

Non-clinical
overview Clinical
Module 2
overview

The CTD
Quality overall Non-clinical Clinical
summary summary summary

Quality Non-clinical Clinical study


study reports reports
Module 3 Module 4 Module 5

The CTD triangle. The common Technical Document is organized into five modules. Module 1 is
region specific and modules 2, 3, 4 and 5 are intended to be common for all regions.

Fig. 50.1  The Common Technical Document triangle has five modules. Module 1 is region specific; Modules 2–5 are
common to all regions [30]
528 J. L. Koh et al.

Module 1: Administrative Information and Module 3: Quality (Device, Drug, or Biologic


Prescribing Information Documentation)
According to the FDA guidance document, This module provides information on the
“This module should contain documents spe- quality of the product and should be pre-
cific to each region; for example, application sented in the structured format described in
forms or the proposed label for use in the the ICH M4Q guidance. This module con-
region. The content and format of this module tains the detailed chemical, pharmaceutical,
can be specified by the relevant regulatory and biological data relevant to the product.
authorities.” The content of this module is spe- This module will consist of items such as:
cific to each market or competent authority
and may include the following information: • Table of Contents
• Introductory letter • Body of Data
• Application form –– Product—information and its
• Product information properties
• Labeling and Information for use –– Manufacturing—details, process,
• Product information for products control of materials, validation
already approved –– Characterization
• Information about the experts –– Controls
• Country-specific requirements –– Reference standards
• Environmental risks –– Packaging
• Others –– Stability
–– Appendices—facilities, regional
information requirements
• References
Module 2: Overviews and Summaries of
Modules 3–5
This module summarizes and outlines the
Module 4: Nonclinical Reports
information that will be presented in Modules
(Pharmacology/Toxicology/
3–5. It should begin with a general introduc-
Biocompatibility)
tion to the device, drug, or biologic, including
This module presents the integrated and
its class, mode of action, and proposed clinical
critical information on the pharmacologic,
use. In general, this introduction should be a
pharmacokinetic, and toxicological evalua-
summary and limited in overall length:
tion of the drug, device, or biologic. This
• Table of Contents module consists of the reports which were
• Introduction summarized in Module 2, so all the com-
• Quality Overall Summary (QOS) plete and detailed reports will be provided
• Nonclinical Overview in this module. The nonclinical study
• Clinical Overview reports should be presented in the order
• Nonclinical Written and Tabulated described in the ICH M4S guidance. This
Summaries, including pharmacology, module will consist of items such as:
pharmacokinetics, toxicology, biocom-
patibility, viral inactivation • Table of Contents
• Clinical Summary, including clinical • Pharmacology
pharmacology studies, clinical safety • Pharmacokinetics
and efficacy, literature references, and • Toxicology
synopses of individual studies • Literature references
50  Understanding and Addressing Regulatory Concerns in Research 529

Module 5: Clinical Study Reports period of years. Regulatory guidance from


(Clinical Trials) the FDA also changed during this time,
This section is intended to provide the full resulting in further changes to the proto-
complete and detailed reports from all of col. Ultimately, several protocols for more
the available clinical information. This advanced cell-scaffold-based autologous
includes clinical study reports, information chondrocyte implantation techniques were
obtained from any meta-analyses or other approved, and are currently in various
cross-study analyses, and post-marketing stages of the trial process, with one tech-
data for products that have been marketed nique recently approved for clinical use.
in other regions. This module consists of
the reports which were summarized in
Module 2. The human study reports and
related information should be presented in
50.8 Summary
the order described in the ICH M4E guid-
ance. This module will consist of items
Regulation is an inescapable aspect of any
such as:
research activity that involves human or animal
subjects. The reasons behind this regulation are
• Table of Contents
complex and can involve a number of issues
• Tabular Listings of All Clinical Studies
including human and animal subject safety,
• Clinical Study Reports—safety, effi-
patient protection, and social, ethical, and politi-
cacy, dosing, post-marketing, case
cal considerations. Regulation can result in
reports
increased complexity in study design and execu-
• Literature references
tion. It is critical for investigators to be familiar
with the regulatory issues involved in research
and to be careful to comply with required regula-
tion. Failure to comply with regulation can result
in subject risk or harm and for the investigator
Clinical Vignettes
can result in possible civil liability or criminal
One area that has been the subject of
charges.
extensive regulation has been in the use of
cell therapies to treat musculoskeletal inju-
ries. In the United States for over 20 years, Take-Home Messages
the only cell therapy treatment approved • Regulatory issues govern the conduct of
for use for articular cartilage injuries was research and are typically based in protecting
the first-generation autologous chondro- the health and safety of subjects, but can also
cyte implantation technique. The regula- be related to ethical issues, such as the ban on
tory environment in the United States aspects of human cloning due to “human dig-
limited the use and evaluation of other nity” concerns.
cell-based therapies and contributed to a • The regulatory burden often depends on the
relative lack of progress in the United perceived risks; new drugs and devices typi-
States regarding cell therapy treatment cally undergo an expensive and time-­
compared to other countries. In this situa- consuming multistage approval process
tion, addressing the regulations around designed to decrease risks, whereas drugs or
approval of new therapeutic cell therapies devices similar to predicates may be seen as
has taken an extensive amount of time. posing lower risk and often receive expedited
Protocol design and development involved approval.
multiple meetings with the FDA over a • Local regulations often exist, but there is
increasing harmonization internationally
530 J. L. Koh et al.

around Good Clinical Practice (quality, safety, 15. International Committee of Medical Journal Editors.
efficacy, and multidisciplinary) guidance, ICMJE | recommendations | clinical trials. http://
icmje.org/recommendations/browse/publishing-
including documentation. and-editorial-issues/clinical-trial-registration.html.
• Research should be performed in accordance Accessed July 18, 2018.
with ethical principles. 16. National Library of Medicine. https://www.clinicaltri-
• Failure to comply with regulation can result in als.gov/. Accessed 18 July 2018.
17. ISRCTN Registry. http://www.isrctn.com/. Accessed
subject harm and can result in civil liability or 18 July 2018.
criminal charges. 18. World Health Organization. International Clinical

Trials Registry Platform (ICTRP). http://www.who.
int/ictrp/network/primary/en/. Accessed 18 July 2018.
19. http://www.nejm.org/doi/full/10.1056/NEJMe1515172.
20. Taichman DB, Backus J, Baethge C, Bauchner H, de
References Leeuw PW, Drazen JM, et al. Sharing clinical trial data—
a proposal from the International Committee of Medical
1. ICH GCP. http://ichgcp.net/. Accessed 18 July 2018. Journal Editors. N Engl J Med. 2016;374:384–6. https://
2. http://www.ich.org/fileadmin/Public_Web_Site/ICH_ doi.org/10.1056/NEJMe1515172.
Products/Guidelines/Efficacy/E6/E6_R1_Guideline. 21. https://clinicalcenter.nih.gov/recruit/ethics.html.
pdf 22. National Institutes of Health. NIH Clinical Center:
3. International Council for Harmonisation of Technical Ethics in Clinical Research. https://www.ncbi.nlm.
Requirements for Pharmaceuticals for Human Use nih.gov/pubmed/. Accessed 18 July 2018.
(ICH). Guideline for good clinical practice E6(R1); 23. http://ichgcp.net/8-essential-documents-for-the-con-
1996. p. 59. duct-of-a-clinical-trial.
4. http://www.ema.europa.eu/docs/en_GB/document_ 24. Essential documents for the conduct of a clinical trial.
library/Scientific_guideline/2009/09/WC500002874. Documents. http://ichgcp.net/8-essential-documents-
pdf for-the-conduct-of-a-clinical-trial. Accessed 18 July
5. Committee for Human Medicinal Products. Guideline 2018.
for good clinical practice E6(R2). London: European 25. http://www.ich.org/fileadmin/Public_Web_Site/
Medicines Agency; 2017. ICH_Products/Guidelines/Efficacy/E6/E6_R2__
6. http://www.ich.org/fileadmin/Public_Web_Site/ICH_ Step_4_2016_1109.pdf.
Products/Guidelines/Efficacy/E6/E6_R2__Step_4_ 26. International Council for Harmonisation of Technical
Presentation_06Feb2017.pdf. Requirements for Pharmaceuticals for Human Use
7. ICH E6(R2) Expert Working Group. Integrated adden- (ICH). Integrated addendum TO ICH E6(R1): guide-
dum to ICH E6(R1): guideline for good clinical prac- line for good clinical practice E6(R2). ICH; 2016.
tice E6(R2); 2017. 27. http://ichgcp.net/411-safety-reporting.
8. h t t p s : / / a b o u t . c i t i p r o g r a m . o r g / e n / s e r i e s / 28. Safety Reporting. http://ichgcp.net/411-safety-report-
good-clinical-practice-gcp/. ing. Accessed 18 July 2018.
9. Collaborative Institutional Training Initiative (CITI 29. http://www.ich.org/products/ctd.html.
Program). Good clinical practice (GCP)—CITI 30. International Council for Harmonisation of Technical
Program. https://about.citiprogram.org/en/series/ Requirements for Pharmaceuticals for Human Use
good-clinical-practice-gcp/. Accessed 18 July 2018. (ICH). CTD: ICH: ICH; http://www.ich.org/products/
10. h t t p : / / w w w . b a r n e t t i n t e r n a t i o n a l . c o m / ctd.html. Accessed 18 July 2018.
Publications/Good-Clinical-Practice%2D%2DA- 31. h t t p s : / / w w w. f d a . g o v / d o w n l o a d s / D r u g s /
Question%2D%2D-Answer-Reference-Guide- GuidanceComplianceRegulatoryInformation/
2017/?gclid=EAIaIQobChMI783lydi_2QIVyLbACh Guidances/UCM073280.pdf.
0ORwRVEAAYASAAEgL82vD_BwE. 32. US Department of Health and Human Services, Food
11. Good clinical practice: a question & answer refer- and Drug Administration, Center for Drug Evaluation
ence guide 2018 (electronic). Needham: Barnett and Research. Guidance for industry M4Q: the
International Publication; 2018. https://www.barnet- CTD—quality. Rockville: U.S. Department of Health
tinternational.com/good-clinical-practice-a-question- and Human Services; 2001.
amp-answer-reference-guide-2018-electronic-0. 33. h t t p s : / / w w w. f d a . g o v / d o w n l o a d s / D r u g s /
12. http://www.clinicaldevice.com/mall/Workshops.aspx. GuidanceComplianceRegulatoryInformation/
13. Medical Device Workshops. http://www.clinicalde- Guidances/UCM073299.pdf.
vice.com/mall/Workshops.aspx. Accessed 18 July 34. US Department of Health and Human Services, Food
2018. and Drug Administration, Center for Drug Evaluation
14. http://icmje.org/recommendations/browse/publish- and Research. Guidance for industry M4S: The
ing-and-editorial-issues/clinical-trial-registration. CTD—safety. Rockville: U.S. Department of Health
html. and Human Services; 2001.
50  Understanding and Addressing Regulatory Concerns in Research 531

35. US Department of Health and Human Services, Food 36. h t t p s : / / w w w. f d a . g o v / d o w n l o a d s / D r u g s /


and Drug Administration, Center for Drug Evaluation GuidanceComplianceRegulatoryInformation/
and Research. Guidance for industry M4E: The Guidances/UCM073290.pdf.
CTD—efficacy. Rockville: U.S. Department of Health
and Human Services; 2001.
What Is Needed to Make
Collaboration Work?
51
Richard E. Debski and Gerald A. Ferrer

51.1 Introduction: Importance 51.2 Leadership


of Collaborations
Effective leadership is essential to the success of
Due to the complexity of many healthcare prob- any collaboration. Effective leaders need to pos-
lems, finding the appropriate solutions may not sess strong communication and management
be feasible without the right partners. skills. Leaders must be able to communicate the
Collaborations between basic scientists and clini- overall vision of the research group, as well as
cians are critical to advancing scientific knowl- the roles and responsibilities of each team mem-
edge by performing high-quality, high-impact, ber to accomplish the group’s goals. Without a
translational research [1–4]. Through collabora- vision, team members may be working without a
tion, much more can be achieved. Collaboration purpose of what they are doing and why. One
offers numerous benefits to both parties in addi- good way to assess if each team member under-
tion to completing research studies. On an indi- stands the vision of the research group is have
vidual level, collaboration offers the opportunity them give a “lab tour.” Lab tours offer the oppor-
to expand one’s knowledge and skill set while tunity to explain different projects conducted in
fostering a professional network that could the lab and their contribution to fulfilling the
develop into lifelong relationships. Professionally, research mission of the lab.
collaboration allows for access to new resources Many research teams have a diverse group of
and increased prestige by publishing in more individuals with different levels of experience
impactful journals, which increases the likeli- and expertise. From undergraduate students to
hood of having their work implemented to clini- young investigators or international research fel-
cal practice. Achieving a successful collaboration lows, every team member should know his/her
between basic scientists and clinicians is not a responsibilities and be held accountable by the
simple task. This chapter will discuss key compo- leaders of the group. While leaders should keep
nents needed to make collaborations between every team member accountable, effective lead-
basic scientists and clinicians successful. ers are able to accomplish this without promoting
an aggressive environment. One important factor
that may lead to an unhealthy work environment
R. E. Debski (*) · G. A. Ferrer
is not giving team members the proper credit or
Orthopaedic Robotics Laboratory, Department of recognition. For example, the leaders need to
Bioengineering, Swanson School of Engineering, communicate the criteria for being an author on
University of Pittsburgh, Pittsburgh, PA, USA abstracts and publications, authorship order, and
e-mail: genesis1@pitt.edu

© ISAKOS 2019 533


V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_51
534 R. E. Debski and G. A. Ferrer

the basic scientist to consider and assess the clini-


Fact Box 51.1 cian’s commitment and dedicated time for
–– Communication and management skills research. Some clinicians may be very involved
are keys toward effective leadership. throughout the entire process of a research proj-
–– Have a shared vision of scientific goals. ect, from idea generation, developing methods,
–– Give appropriate credit and recognition. data analysis, and writing. There are clinicians
who may provide occasional support during the
research process, and others may be interested in
who is responsible for presenting at conferences. generating project ideas and reviewing the final
At our lab, every author must fulfill at least three abstract or manuscript. Given the significant
of the five requirements: responsibilities of clinicians, it is important to
gauge how much time and effort each party is
1 . Writing and providing feedback willing to contribute to the collaboration. For
2. Developing and performing experimental
example, clinicians may only be able to meet on
methods the weekends or very early in the morning or late
3. Data processing at night. Additionally, meetings may be most
4. Project idea generation appropriate in the lab or office space of the scien-
5. Acquisition of project funding tist. Both parties need to be willing to compro-
Receiving appropriate credit and recognition mise and make sacrifices for a collaboration to
is particularly important for aspiring young team work and be long lasting.
members who are trying to promote their careers. Another important factor to consider when
By fostering a positive work environment and building an effective research team is trust
defining the roles of the team members, achiev- (Fig. 51.1). Trust entails understanding the other
ing goals that work toward fulfilling the mission team member’s ambitions, desires, and values.
of the research team becomes much easier. Knowledge of what drives each team member
Ultimately, the leaders of the collaboration may help recruit the right personnel for what
should focus on developing a supportive and fair your group may want to accomplish. Furthermore,
environment where the group can conduct high-­
quality research.

51.3 Building Your Team

When pursuing a potential collaboration, it is


important to be selective and get to know the per- Communication
son who you want to collaborate with. One must
assess whether they are the “right” collaborator Trust
who they trust and envision having a lasting rela-
Collaborating
tionship to accomplish their scientific goals at a with anybody Leadership
high level of research quality and productivity.
Aggressive work
Finding the “right” collaborator may take time Institutional
environment
and require a period of trial and error. Potential Support
collaborations can often be found from mutual
acquaintances or one-on-one discussions at sci-
entific conferences. While basic scientists focus
on performing research, the clinicians’ primary
focus may be on their patients’ care and other
Fig. 51.1  Key building blocks needed to develop a posi-
clinical responsibilities. Thus, it is imperative for tive and successful collaboration
51  What Is Needed to Make Collaboration Work? 535

Fact Box 51.2 Fact Box 51.3


–– Choose the “right” collaborator, not just –– Consistent terminology and limited
anyone. technical jargon
–– Long-lasting relationship –– Promote disagreement while containing
–– Develop trust within team. conflicts

trust within the team is predicated on reliability. important to establish. At our lab, we find our
To be an effective team, everyone needs to be weekly lab meetings a perfect opportunity for
relied upon to do their job, such as meeting dead- diverse opinions. During the lab meetings, we
lines and fulfilling expectations. In addition, team critically evaluate each project being conducted
members must trust each other’s abilities and within our research group and discuss lab phi-
expertise. There needs to be a balance of keeping losophy. The lab meetings present an opportunity
each other accountable for their work without to not only critically evaluate the projects but also
wasting time and resources. As a team member, it provide the presenter the opportunity to practice
is important to trust one another and be trustwor- effective communication of their project to a
thy yourself, in terms of your character and work. diverse audience (i.e., undergraduate students,
Without trust, the productivity of the research Ph.D. students, medical students, residents, fel-
team and quality of work may be compromised. lows, clinicians, engineers, professors, etc.).

51.4 Communication 51.5 Institutional Support

Communicating with people within one’s own Support from your institution for research collabo-
discipline can be difficult. Interdisciplinary com- ration efforts cannot be understated. Institutional
munication is even more challenging, yet essen- support that provides adequate time, funding, facil-
tial for the success of collaboration between basic ities, equipment, and personnel puts your collabo-
scientists and clinicians. Ineffective communica- ration efforts in a position to succeed. Moreover,
tion can be due to not understanding each other institutional support can help maintain long-lasting
and is generally due to poor terminology and collaborations and allow performance of research
overuse of technical jargon. When communicat- at a high level. For example, our lab was developed
ing between disciplines, it is important to use through the support of the Departments of
consistent terminology to what is accepted by the Orthopaedic Surgery and Bioengineering. Their
scientific community and limit technical jargon. combined support provides an ideal environment
Thus, effective communication can be achieved for training and completion of research projects in
within your own research group, as well as the a multidisciplinary environment.
scientific community at large. Furthermore,
effective communication can improve research
efforts to produce high-quality studies. The key is 51.6 Have Fun!
to promote a diversity of opinions between team
members. Diverse opinions may promote dis- At the end of the day, what makes collaborations
agreement, but they can push your collaboration work is truly enjoying what you are doing and
to become better (i.e., strengthen relationships who you are doing it with. Working with people
and trust, new ideas and solutions to problems). who share the same amount of enthusiasm and
However, too much disagreement can lead to passion as you about what you are doing is fun.
major conflicts that drive collaborations apart. A Answering complicated healthcare questions and
formal mechanism for conflict resolution is pioneering scientific breakthroughs are very
536 R. E. Debski and G. A. Ferrer

rewarding. As such, it is important to celebrate References


accomplishments of not only the research group
but of individuals as well. 1. Bennett LM, Gadlin H.  Collaboration and team
science: from theory to practice. J Investig Med.
2012;60:768–75.
Take-Home Message 2. National Research Council. Facilitating interdis-
• What makes collaborations between basic sci- ciplinary research. Washington, DC: The National
entists and clinicians work is not an easy task Academies Press; 2004.
3. Green BN, Johnson CD.  Interprofessional collabo-
and cannot be accomplished alone. ration in research, education, and clinical practice:
• First and foremost, it is important to find the working together for a better future. J Chiropr Educ.
right collaborator who you believe you can 2015;29:1–10.
have a lasting relationship with and shares a 4. Mattessich PW, Monsey BR.  Collaboration: what
makes it work. A review of research literature on fac-
common vision of research aspirations. tors influencing successful collaboration. St. Paul:
• To fulfill your vision, build a team that exem- ERIC, Amherst H. Wilder Foundation; 1992.
plifies three main points discussed throughout
this chapter: (1) leadership, (2) trust, and (3)
communication.
• In addition, institutional support helps maximize
productivity and longevity of collaborations.
A Clinical Practice Guideline
52
Aleksei Dingel, Jayson Murray, James Carey,
Deborah Cummins, and Kevin Shea

52.1 Introduction 52.1.2  A New System to Translate


Best Evidence into Best
52.1.1  Why Clinical Practice Practice
Guidelines?
Historically, clinical practice guidelines were
Within the past two decades, there has been a largely based upon the consensus of physician
push toward evidence-based clinical practice expert opinion, specialist group recommenda-
guidelines (CPGs). These guidelines, unlike tion, governments, payers, etc. [2]. Unfortunately,
their opinion-based predecessors, would be these unregulated recommendations frequently
designed to streamline health-care efficiency, contradicted each other. The lack of a consistent
improve health-care outcomes, and decrease and reproducible recommendation development
practice variation [1]. In 2008, the US Congress process led to variations in patient care and ques-
mandated that the Institute of Medicine (IOM) tions about the validity and reliability of the
develop standards for the evidence-based guide- guideline process. Other questions about the
lines. In response, the IOM produced a rubric for CPG process included concerns about the man-
well-organized and reproducible guideline agement of conflicts of interest (COI), as well as
development and evidence-based systematic the ranking of relevant evidence [2]. Gaps in evi-
review. dence, poor quality reviews, and biased recom-
mendations based off of lower levels of research
were all concerns relating to the guideline devel-
opment [1]. Consensus/expert opinion-based
guidelines left much to be desired by patients and
caregivers alike and these concerns were suffi-
A. Dingel ∙ K. Shea (*) cient to warrant a call to change from consensus-
Department of Orthopaedic Surgery, Stanford
University Medical Center, Stanford, CA, USA based to evidence-based guidelines.
e-mail: adingel@stanford.edu; kgshea@stanford.edu
J. Murray · D. Cummins
American Academy for Orthopaedic Surgeons Evidence-based clinical practice guidelines
(AAOS), Norridge, IL, USA
e-mail: jmurray@aaos.org; cummins@aaos.org
have been designed to replace the consen-
sus-based guideline to increase health-care
J. Carey
Pennsylvania Hospital, Philadelphia, PA, USA
efficiency and patient care success.
e-mail: james.carey@uphs.upenn.edu

© ISAKOS 2019 537


V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_52
538 A. Dingel et al.

In 2001, the IOM Committee on Quality of


Health Care in America completed an extensive Over the past 30 years, the number of ran-
analysis of the health-care system and con- domized clinical trials alone has increased
cluded that there were four key quality problem from just over 100 to nearly 10,000 annu-
areas: ally. The last 5  years alone account for
nearly 50% of published articles in the
• The growing complexity of science and medical literature, and there is no evidence
technology. the rate of publication is slowing [6].
• The increase in chronic conditions.
• A poorly organized delivery system.
• Constraints on exploiting the revolution in There is no doubt the data organization and
information technology. filter system of this information are in desperate
need of a remodel. Rather than sorting through
These quality problem areas, along with endless clinical trials to determine the best course
questions raised regarding trustworthy and of action, providers and patients should be able
appropriate development of consensus−/recom- to turn to trustworthy evidence-based guidelines
mendation-based guidelines, lead the IOM, along to efficiently determine the appropriate route of
with other health-care agencies, to call for the care [7].
increased use of evidence-based clinical practice
guidelines (CPGs) in order to improve quality of 52.1.3.2 Chronic Conditions
care, decrease inefficiencies, and reduce practice Chronic conditions, defined by the Centers for
variation within the health-care system in Disease Control and Prevention (CDC) as any ill-
America [2, 3]. ness lasting longer than 3  months and not self-
Although financial benefits are not the main limiting, were the leading cause of illness,
focus of an evidence-based practice guideline, disability, and death in America in 1996 [8].
improved guidelines may also reduce costs [4]. According to a 2008 survey conducted by the
CDC/National Center for Health Statistics,
85.6% of individuals 65 and older have at least
52.1.3  Quality Problem Areas one of the following chronic conditions: arthritis,
asthma, cancer, cardiovascular disease, chronic
52.1.3.1 Science and Technology obstructive pulmonary disease, and diabetes. In
The rapid advancement of science and technol- 2030, when the large baby boom cohort has
ogy in health care creates challenges for improv- entered old age, one in five persons is expected to
ing the safe, effective, and efficient delivery of be in this senior age group. With modern medi-
health care [5]. In addition, boundless medical cine and technological advances adding years to
research databases have led to challenges for the average American life expectancy, now over
physicians, patients, and payers, who desire 76 years of age, the incidence and prevalence of
access to timely, concise, relevant information to chronic conditions will only increase [3, 9].
guide care. The volume of relevant scientific According to the CDC, in 2012, almost half of all
information provided by the extensive literature Americans (117 million people) were living with
databases is overwhelming, and a process to one or more chronic conditions [10], and in 2014,
select the highest-quality, lowest-bias research seven of the top ten leading causes of death were
from these databases is critical to practice evi- chronic diseases [10]. The treatment of chronic
dence-based practice. conditions accounted for 62% of health-care
52  A Clinical Practice Guideline 539

spending in 2008 [10, 11], and in 2012, that num- ing persons with multiple chronic conditions, the
ber grew to 83%. Not to mention those with five current mechanism of coordination is lacking and
or more chronic conditions had an average of needs reconfiguring to increase efficiency and
almost 15 physician visits and filled over 50 pre- ensure safety and proper treatment. The ultimate
scriptions in a year. Osteoarthritis, a degenerative goal is to help, not hinder a patient. It appears
joint disease, affected 54  million Americans in obvious that coordination should be as smooth
2014. According to the CDC, that number will and with the least number of hand-offs as possible
rise to 67 million in the year 2025 [12]. to minimize time delay in health-care delivery.

52.1.3.4 Constraints on Information


The treatment of chronic conditions Technology
accounted for 62% of health-care spending Information technology poses serious concerns
in 2008 [10, 11], and in 2012, that number for many health-care providers, the main concern
grew to 83%. being a patient’s misunderstanding of proper
medical treatment as they turn to web-based self-
diagnosis and treatment rather than taking the
The demographic transformations that are pro- time to see a trained medical professional. This
jected to occur over the following years have impor- may lead to serious illnesses being left poorly or
tant implications for the organization of the inadequately treated.
health-care delivery system. Self-management, However, when appropriately applied, the
family support services, committing to the treatment information technology is also a great tool to
plan, and sustained follow-up visits are just as vital patients. E-mail allows for efficient communica-
to patient recovery as initial diagnosis. Collaboration tion between provider and patient. The web allows
between the health-care provider, health-care pro- patients to self-educate and take more responsibil-
vider team, patients, and patient’s family adds an ity for and control of their recovery process. Online
additional layer of complexity that must be consid- forums have been beneficial, especially for those
ered when developing clinical guidelines [3]. It is struggling with rare diseases that may not have a
yet another need for universally applied, clear, con- community near them which they can lean on.
cise, and streamlined medical guidelines. Information technology has also the potential to
increase the quality of health care through improv-
ing physician communication and removing com-
The demographic transformations that are munication barriers to health-care delivery.
projected to occur over the following years These problems outlined by the IOM are sim-
have important implications for the organi- ply additional reasons to update the health sys-
zation of the health-care delivery system. tem and guideline development process.

52.1.3.3 Poorly Organized System 52.2 W


 hat Is an Evidence-Based
The current health-care delivery system is a laby- Clinical Practice Guideline?
rinth, a seemingly endless web of non-answers.
Patients and families have described it as a “night- CPGs, as defined by the CPG Development
mare to navigate” [13]. Clinicians have reflected Manual, are statements that include recommen-
that it is an acute waste of time. The complex dations intended to optimize patient care that are
series of hand-offs between doctors, specialists, informed by a systematic review of evidence and
hospitals, insurance agencies, third parties, and an assessment of the benefits and harms of alter-
other providers decreases the efficiency of patient native care options [14].
care. While multiple hand-offs from specialty cli- These evidence-based recommendations are
nician to specialty clinician are vital when treat- developed using a minimally biased, transparent,
540 A. Dingel et al.

and reproducible evaluation of published medical trials conducted on one specific antidepressant,
literature. Evidence-based CPGs are designed to 38 produced positive results and 36 found the
withstand the type of scrutiny and review its drug to have “questionable or no efficacy” [18].
“expert group/consensus-based” predecessor However, only 8% of the “questionable or no
could not. They serve as an effective synthesis of efficacy” studies were published, while 94% of
an enormous literature database, providing a the positive studies were published. Moreover,
complete yet concise summary of available 15% of the 8% “questionable or no efficacy”
knowledge and a detailed treatment plan for a studies were published in such a way as to spin
specific topic or condition. These “evidence- the results in a positive form [19]. As drug com-
based guidelines” undergo a rigorous protocol to panies can cherry pick which data they wish to
deliver the optimum care route for the patient [2]. present, it is easy for physicians and medical pro-
Such a guideline should streamline patient care viders to inadvertently develop a biased opinion
while ensuring patient safety and increasing out- about the drug. This can influence clinical prac-
come success. As with all information and tech- tice and prescribing habits. Unsurprisingly, addi-
nologies, CPGs are subject to regular updates as tional studies have found that industry-sponsored
new research and clinical studies are published studies are significantly more likely to report
[3]. CPGs are beneficial in that they provide an favorable results and less likely to report unfavor-
efficient source of information for the best course able outcomes than their federally funded coun-
of treatment while allowing for flexibility in a terparts [15, 17]. This is troubling as many drugs
treatment pathway [3]. are associated with serious adverse effects.
As such, it is vital that developers of CPGs
look closely at the research evidence and develop
52.2.1  Trustworthy CPGs CPGs in a trustworthy, reproducible, and trans-
parent manner. Developers must consider not
The need for trustworthy guidelines is one of the only the findings but also who sponsored the
main driving factors for the new guidelines. study. They must rigorously scrutinize if the
Guidelines must be developed by a qualified and results are reported truthfully. Only then can
diverse group of individuals. The development guideline development occur with minimal bias
process critically analyzes the data, the source of and maintain reliability.
the data, and those who conducted the study to
ensure limited guideline bias. Bias and COIs can
influence the efficacy of and impact published 52.3 Development of CPGs
research findings have on a community [15–17].
Bias must be minimized in order to provide the In response to Congress and to develop trustwor-
public with the most trustworthy guideline. When thy guidelines, the IOM has established eight
bias and COIs are allowed to traverse the bound- standards of developing CPGs [1]:
ary line between good research and bad, the
effect can be detrimental. 1. Transparency.
For instance, therapeutic drug research often 2. Management of conflicts of interest (COI).
is run more like a promotional campaign for 3. Development group diversity.
pharmaceutical companies, rather than a clinical 4. Systematic review.
research study, intended to increase sales rather 5. Evidence and recommendation strength.
than improve drug performance [18]. In 2008 the 6. Articulation of recommendations.
New England Journal of Medicine published a 7. External review.
study [19] which reviewed the selective publica- 8. Updating.
tion process of antidepressant drugs and the
effect those selected publications have on drug Each of these standards is intended to create
efficacy. The study found that of the 74 clinical the most well-researched, trustworthy, and clini-
52  A Clinical Practice Guideline 541

cally relevant guideline possible. These stan- 52.3.3  Development Group Diversity
dards, or “guidelines for the guidelines,” are
imperative in ensuring the reproducibility and A trustworthy CPG depends greatly on its team
clarity of the guideline development process—a of developers. A diverse team—One that includes
factor the previous recommendation develop- a member from every discipline or party associ-
ment process lacked. ated with its implementation or consumption—
Can provide a well-rounded guideline with the
interest of all parties protected [4]. This team
52.3.1  Transparency includes primary care physicians, specialists,
nurses, other providers, and any other party who
A transparent guideline serves two main may utilize the guideline. Patients, or other prox-
purposes: ies, who may advocate for patients, must be also
The first is to ensure unambiguous and repro- present. Patients and other proxies need no prior
ducible guidelines. Transparency ensures the medical experience with the guideline topic as
guideline is clear and easy to follow. The treat- their role is to provide a voice for patients.
ment pathway should be well articulated. Such a diverse group ensures that patients’
A transparent guideline also fully discloses needs are protected and concerns are respected.
author information, conflicts of interest (COIs),
and guideline funding.
Transparency allows physicians and patients 52.3.4  Systematic Review
to evaluate for themselves the reliability of and
potential biases within a guideline. Ideally, this The systematic review (SR) process determines
translucent nature of the guideline development the inclusion and exclusion criteria for the litera-
process will deter biases from crossing into the ture search. During the SR process, articles are
development process, further cementing the gathered, analyzed, and interpreted, and relevant
trustworthy quality of the guideline. data is summarized. The SR process begins with
an all-inclusive search of the medical literature
and ends with a preliminary draft of the
52.3.2  Management of Conflicts guideline.
of Interest

Any association between a developer and the 52.3.5  Evidence


guideline in question serves as a potential for and Recommendation
conflicting interests. These associations may Strength
include academic interests, professional gains,
personal gains, or financial advancements. A ranking of the quality of evidence and the
Biases may cause the guideline to be devel- strength of research is conducted to apply appro-
oped unduly—Intentionally or not. Therefore, priate weight to each guideline recommendation
COIs must be limited to maintain guideline [20]. CPG developers focus on high-quality evi-
credibility. dence to build their recommendation. Steering
To manage COIs, each individual participat- clear of overdependence on expert opinion is
ing in guideline development must disclose inter- important as expert opinion may not be based
ests and medical and financial associations upon well-rounded experience or complete infor-
relating to the guideline. Ideally, disclosures mation [4]. Basing guidelines on research with
work to minimize the biases that may seep into weak design or flawed methodology will result in
the development process assuring that the guide- biased or faulty guidelines. To ensure quality evi-
lines were not developed to suit certain interests dence, a quality assessment is performed for all
while harming others. research included in the guideline development.
542 A. Dingel et al.

Another factor the guideline team must con- For example, a pharmaceutical company has
sider while making recommendations is the fact two drugs, Drug A and Drug B, which both
that a statistically significant finding may not be decrease anxiety. Drug A has a success rate of 95%,
clinically relevant [21]. To resolve this discrep- whereas Drug B has a success rate of 89%. Drug A
ancy, the America Academy of Orthopaedic costs five times as much as Drug B and has much
Surgeons (AAOS) has applied the minimal clini- more severe side effects than Drug B. Statistically,
cally important improvement (MCII) method for Drug A has a significantly greater success rate;
determining clinical significance in research. however, clinically, Drug B is far more appealing in
This is similar to the minimally important differ- the eyes of the clinician and the patient. It saves the
ence (MID) or the smallest amount of change a patient money and potentially harmful side effects
patient may distinguish. Identifying clinical and yields nearly the same treatment outcome. In
­significance is important because a research find- this case, the statistically significant finding isn’t
ing that may be statistically significant to a applicable in the clinical setting as the patient
researcher may not be relevant to patient treat- wouldn’t be able to distinguish a difference
ment. Thus, certain research findings may not between the successes of both drugs.
actually bear enough clinical weight to warrant a Medical literature is analyzed and ranked by
change in clinical treatment. its quality of study design—the highest-quality
evidence corresponds to the lowest risk of bias.
The AAOS has developed a reliable
A statistically significant finding may not “Clinical Practice Guideline Strength of
be clinically relevant [21]. Recommendation” rubric (Table 52.1) [3] that
has been proven to generate strong CPGs. The

Table 52.1  AAOS strength of recommendation description table [23]


Overall strength
Strength of evidence Description of evidence strength Strength visual
Strong Strong Evidence from two or more
“high”-strength studies with
consistent findings for recommending
for or against the intervention
Moderate Moderate Evidence from two or more
“moderate”-strength studies with
consistent findings or evidence from
a single “high”-quality study for
recommending for or against the
intervention
Limited Low-strength Evidence from one or more
evidence or “low”-strength studies with
conflicting consistent findings or evidence from
evidence a single moderate-strength study for
recommending for or against the
intervention or diagnostic test or the
evidence is insufficient or conflicting
and does not allow a
recommendation for or against the
intervention
Consensus No evidence There is no supporting evidence. In
the absence of reliable evidence, the
work group is making a
recommendation based on their
clinical opinion. Consensus
recommendations can only be created
when not establishing a
recommendation could have
catastrophic consequences
52  A Clinical Practice Guideline 543

Table 52.2  AAOS recommendation language table care maps or flow diagrams may be the best way
Strength of to convey information as they are concise and
Guideline language recommendation easy to read and understand even when multiple
Strong evidence supports that the Strong
practitioner should/should not do X,
variables are present. Multiple recommendations
because… are included in the guideline to account for the
Moderate evidence supports that the Moderate variances that may arise. No two clinical cases
practitioner could/could not do X, are the same; as such the guideline provides a
because…
myriad of recommendations, so the physician
Limited evidence supports that the Limited
practitioner might/might not do X, may alter treatment as needed.
because…
In the absence of reliable evidence, it is Consensusa
the opinion of this work group that…
52.3.6  Articulation
a
Consensus-based recommendations are made according of Recommendations
to specific criteria. These criteria can be found in
Appendix VII
Recommendations must be clearly written. They
strength of a recommended treatment pathway must be presented in the standardized format that
in a CPG is based off the quality of its support- includes a detailed treatment pathway as well as
ing evidence (Table 52.2) [3]. Evidence quality circumstances in which each recommendation
is based on the following hierarchy of study should be used. Particular language is used to
design [3]: properly express the strength of the recommen-
dation as well as the level of confidence the
• High quality: <2 study design flaws. development team has in the recommendation.
• Moderate quality: ≥2 and <4 study design This information is vital as it allows the reader to
flaws. evaluate how closely the guideline should be
• Low quality: ≥ 4 and <6 study design flaws. followed.
• Very low quality: ≥6 study design flaws.

Two or more high-quality studies yield a strong 52.3.7  External Review


recommendation. One high-quality study or two
or more moderate-quality studies yield a moder- After a guideline is developed by the work group,
ate-strength recommendation. One moderate- but before it is released, an external review is con-
quality study and/or two or more low-quality ducted by an independent peer review group [4].
studies yield a limited-strength recommendation. The external review serves as an independent,
If there is conflicting evidence, the recommenda- non-biased evaluation of the guideline. The group
tion is ranked as limited [3, 22]. If there is no evi- consists of medical professionals in related areas,
dence to support the recommendation and the persons from medical societies, and persons from
development team produces a recommendation, it the community. Just as the development team
is labelled “consensus” and is published in a sepa- members are required to disclose COIs, so are the
rate companion consensus statement document to external review group members.
ensure separation between evidence-based and Reviewers are asked to review the evidence
consensus-based recommendations. A “consen- and comment on the wording of the recommen-
sus” recommendation is equivalent to the historic dations. The peer review group is responsible for
expert group-based recommendation. ensuring three main qualities of the guideline:
The terminologies “strong,” “moderate,” “lim- validity, reliability, and feasibility.
ited,” and “consensus” are used to express the
strength of recommendation within the guideline. 1. Valid guidelines clearly state the scientific evi-
After evidence analysis and recommendation dence supporting their recommendation.
ranking have occurred, the guideline is drafted. Justifications are present where group consen-
In many cases, more than one recommenda- sus and expert opinion were needed to support
tion may be presented. In instances such as these, recommendation.
544 A. Dingel et al.

2. Reliable guidelines are reproducible. They are change the overarching treatment plan but
guidelines in which a peer reviewer comes to changes minor steps. This may be due to new
the same conclusion as the focus group. research findings.
3. Feasible guidelines are easily understood by 3. Major revisions: A major revision is any revi-
both patients and physicians and allow for sion that significantly alters the treatment
both routine use and case-by-case modifica- plan, course of action, or main conclusion of
tions when necessary. guideline.
The review team’s written comments are col-
lected into a single response form which is Updates and amendments undergo an inde-
then reviewed and responded to by the chair of pendent review and majority vote and are then
the guideline development team. Guideline published and distributed with an alert that the
development team members vote on all sug- guideline has been revised.
gested revisions to recommendation language Each guideline is accompanied by its “profile.”
and are accepted with a majority vote [24]. A short statement that discloses the entire deci-
The revision process is documented and sion-making process includes the development
reported in the guideline document until final team’s values, the evidence quality and harm-ben-
guideline approval [24]. efit assessments, and the level of confidence the
team has in the evidence. Limitations of the guide-
line are also expressed such as intentional vague-
52.3.8  Updating ness the team may have included [4]. CPGs will
not always be correct, as they must be revised with
CPGs are subject to routine updates and amend- new information as research is published, nor will
ments as new information presents itself or as time they be entirely bias-free. The CPG process out-
passes. Certain branches of medicine, such as the lined by the IOM aims to limit the amount of bias
American Academy of Orthopaedic Surgeons and that seeps into the recommendation development
American Association of Otolaryngology-Head process, increase consistency within patient care,
and Neck Surgery (AAO-HNS), update their and streamline the health-care delivery process.
guidelines at a minimum of 5 years after publica- Often a brief disclaimer is added to the begin-
tion [21, 25]. Situations that warrant guideline ning of the guideline abstract, such as this one
updating may include but are not limited to [4, 26]: from the American Academy of Otolaryngology-
Head and Neck Surgery Foundation (AAO-
• Changes/advancements in available treatment HNSF) [4]:
or intervention methods. This clinical practice guideline is not intended as a
• New evidence that impacts current treatment. sole source of guidance in managing [topic speci-
• Changes in health-care availability, affordabil- fied here]. Rather, it is designed to assist clinicians
by providing an evidence-based framework for
ity, or access. decision-making strategies. The guideline is not
intended to replace clinical judgment or establish a
In addition to updates, CPGs may undergo protocol for all individuals with this condition, and
amendments. There are three types of may not provide the only appropriate approach to
diagnosing the managing the problem.
amendments:
AAOS includes similar language in the intro-
1. Reaffirmations: This simply consists of a brief duction to the guidelines:
statement of the organization’s agreement with This guideline should not be construed as includ-
the current guideline. This occurs when the ing all proper methods of care or excluding meth-
guideline requires no significant alterations ods of care reasonably directed to obtaining the
such as when time passes but treatment meth- same results. The ultimate judgment regarding any
specific procedure or treatment must be made con-
ods and research findings have not changed. sidering all circumstances presented by the patient
2. Minor revisions: A minor revision includes and the needs and resources particular to the local-
any alteration to the guideline that doesn’t ity or institution.
52  A Clinical Practice Guideline 545

52.3.9  Implementation
options, and this condition impacts a sig-
Guidelines are only as effective as those who nificant number of patients in the USA. The
implement them. The National Guideline extensive literature had not yet been con-
Clearinghouse (NGC) is responsible for the cisely accumulated to distinguish best
announcement, promotion, and distribution of treatment options for appropriate cases. A
CPGs [14]. Once the guideline is ready for systematic review of the literature was con-
­implementation, it is crucial that physicians and ducted in adherence with the aforemen-
all health-care providers use the guideline to tioned guideline criteria. In the ranking of
deliver the highest quality of care to the patient. their guideline recommendations, the
guidelines were assigned a star grade to
easily distinguish strength of recommenda-
52.3.10  Outcomes Assessment tion (4 stars for a strong recommendation,
3 stars for a moderate-strength recommen-
Outcomes assessments are important measures to dation, etc. in accordance with Table 52.1).
determine whether or not treatment has been suc- The stars aligned the IOM’s “strong,”
cessful [14]. A CPG, like any treatment, under- “moderate,” “limited,” and “consensus”
goes an “outcome assessment.” However, the rubric language terms.
organizations typically involved in outcome Of the 20 recommendations put forth by
assessments, such as the National Quality this ACL injury management CPG, 5 had
Measures Clearinghouse (NQMC) and others, strong supporting evidence (4 stars), 6 had
are not involved in the outcome assessment of moderate supporting evidence (3 stars), 7
CPGs. This is because the evaluation of outcomes had limited supporting evidence (2 stars),
is built into the CPG development process and and only 2 were consensus-based. This
expressed by the ranking of the recommenda- CPG shows not only great advancement in
tions. As such, the IOM Committee on Quality the strength of the orthopedic research
Health Care in America resolves that there is no being conducted but also great improve-
need for the rating of quality measure of CPGs as ment in the guideline themselves. Only
it would be redundant. Moreover, an additional 6 years prior, in 2009, AAOS published its
rating could create conflicts of interest as some first CPG on diaphyseal femur fractures in
CPG developers also develop related outcome pediatrics. This guideline, although much
assessment rubrics [14]. stronger than its consensus-based prede-
cessor, only had 1 recommendation out of
its 14 that would have received a 4-star
Clinical Vignette ranking and only 2 that would have received
Management of Anterior Cruciate 3 stars. The 2015 CPG on ACL injury man-
Ligament Injuries: agement shows great advances from both
As of 2017, the AAOS has completed 18 an orthopedic research perspective and a
CPGs. In 2015, the AAOS published a clinical practice guideline and care man-
guideline for the “Management of Anterior agement perspective.
Cruciate Ligament Injuries” providing The following are a few examples of the
physicians with a detailed, outlined plan to strongest recommendations from the ACL
aid in the prompt and accurate treatment of injury management CPG [27]:
ACL injuries [27].
The topic was chosen for guideline 1. “Strong evidence supports that the prac-
development as some controversy existed titioner should obtain a relevant history
over best treatment and management and perform a musculoskeletal exami-
546 A. Dingel et al.

which can help local medical groups and health


nation of the lower extremities, because system and outline care pathways. Simple, easy-
these are effective diagnostic tools for to-follow flow diagrams help providers quickly
ACL injury” (4-star/strong evidence determine the best route of care for each specific
recommendation). clinical case.
2. “Strong evidence supports that the MRI The AAOS 2014 CPG for developmental dys-
can provide confirmation of ACL injury plasia of the hip (DDH) was used to develop
and assist in identifying concomitant DDH Care Map for the St. Luke’s Health System
knee pathology such as other ligament, in Idaho [28]. The CPG summarized recent
meniscal, or articular cartilage injury” research and clinical treatment options for the
(4-star/strong evidence evaluation of DDH, creating evidence-based
recommendation). treatment options for different clinical presenta-
3. “When ACL reconstruction is indi-
tions of DDH. This guideline was turned into an
cated, moderate evidence supports easy-to-follow care map, providing practitioners
reconstruction within five months of with step-by-step instructions on how to best
injury to protect the articular cartilage treat each specific case. The map includes treat-
and menisci” (3-star/moderate evidence ment methods varying with patient age and
recommendation). degree of hip dysplasia. It also includes modifica-
tions to treatment for those clinics that may not
As the AAOS has gained more experi- have access to the ideal imaging machines.
ence with the guideline process, and the Moreover, it is easily accessed via smartphone,
overall quality of the orthopedic literature tablet, iPad, or other portable screens providing
has improved, more recent guidelines ease of access for providers and families.
include questions that follow patients The DDH Care Map (Fig. 52.1) continues to
through the path of care. The language for be used in clinical practice in the St. Luke’s
recommendations reflects the quality of evi- Health System, and this care map is continually
dence in research publications. The recent reviewed and updated. Feedback from clinicians,
guidelines for management of elderly hip as well recent publications, will lead to changes
fractures and ACL injury are examples of in the care map.
high-level guideline recommendations. AAOS CPG program led to the development of
other care maps for other health systems and prac-
tices, including those for carpal tunnel syndrome
and for management of elderly hip fractures.
The language for recommendations reflects
the level of evidence in research publica-
tions. The recent guidelines for manage- 52.5 Limitations of CPGs
ment of elderly hip fractures and ACL
injury are examples of high-level guideline Many areas of medicine, like orthopedics, have
recommendations. massive medical literature databases. Sorting
through the extensive databases for appropriate
articles and ranking levels of evidence to develop
CPGs requires considerable effort and expertise
52.4 T
 urning CPGs into a Care [3]. CPGs are beneficial for many reasons; how-
Map ever, some disadvantages include:

CPGs provide clinical guidance for a wide vari- 1 . The process is time-consuming and expensive.
ety of topics and help manage specific condi- 2. Patient feedback is ideal, but often not avail-
tions. CPGs may be used to support care maps, able from in the literature published.
52  A Clinical Practice Guideline 547

Routine Well Baby Exam


for Hip Stability

Normal No
Yes
Exam?

Risk Dislocatable Dislocated


No Factors? Yes Barlow+ Dislocated?
Ortolani+

< 4 wks Age? > 16 wks

Repeat Exam 4-16 wks


at 4 wks

DDH
No Trained Staff Yes
Accessible?

Wait Until
Baby is 16 Ultrasound
wks

AP Pelvis
X-ray

Yes Normal?

No
Yes Normal? No

Continue Routine DDH Well Baby Exams Refer to Peds Ortho Specialist

Diagnosis Screening and Referral Pathway


Ultrasound is the preferred imaging study until 6 months of age. Radiographs are indicated
thereafter. If ultrasound is unavailable, radiographs can be used as early as 3 months.

Fig. 52.1  Care map for DDH [29]

3 . Guidelines are subject to misinterpretation. database, members of the clinical practice guide-
4. Guidelines must be continuously updated, to line work groups are allowed to create a compan-
reflect changes in the published literature. ion consensus statement [14]. These are
5. Adequate literature is required to develop
statements based on expert opinion and are pub-
CPGs, so areas lacking in literature won’t lished separately from the CPGs to ensure sepa-
qualify for CPG development. ration of expert-opinion-based recommendations
6. Guidelines are only as effective as those who and evidence-based recommendations.
implement and abide by them.
7. For less common conditions, adequate

research literature may not support the devel- 52.6 Future Studies
opment of a CPG.
Although the guidelines have limitations, the
For those guideline patient care questions that guidelines are beneficial for many reasons. The
lack relevant research or have an insufficient guideline process identifies medical areas which
548 A. Dingel et al.

lack higher levels of research and highlight the References


direction for future research. A lack of evidence
to develop strong evidenced-based CPGs is com- 1. Graham R, Mancher M, Miller Wolman D, Greenfield
S, Steinberg E, eds. Clinical Practice Guidelines
mon in many medical specialties. A review arti- We Can Trust. Washington, DC; 2011. https://doi.
cle evaluating the strength of over 2700 org/10.17226/13058.
recommendations put forward by the American 2. Shea KG, Sink EL, Jacobs JC Jr. Clinical practice
College of Cardiology and the American Heart guidelines and guideline development. J Pediatr
Orthop. 2012;32(Suppl 2):S95–100.
Association found that only 11% of those 2700 3. Crossing the Quality Chasm: A New Health System
recommendations were based on Grade A, or for the 21st Century. Washington, DC; 2001. https://
“strong,” evidence [3]. CPGs may not always be doi.org/10.17226/10027.
feasible to produce as certain topics lack suffi- 4. Rosenfeld RM, Shiffman RN, Robertson P,
Department of Otolaryngology State University of
cient data, but they do provide a service—high- New  York. Clinical practice guideline development
lighting important gaps in research and important manual, third edition: a quality-driven approach for
clinical questions that must be answered in to translating evidence into action. Otolaryngol Head
provide optimal patient care. Neck Surg. 2013;148:S1–55.
5. Foundation TRWJ. Chronic care in America: the 21st
century challenge. 1996. http://www.rwjf.org/library/
Take-Home Message chrcare/. Accessed 19 Sept 2000.
• The IOM and others have called for a revision 6. Chassin MR, Galvin RW.  The urgent need to
of the development of the highest standards of improve health care quality. Institute of Medicine
National Roundtable on Health Care Quality. JAMA.
care in the American health system. 1998;280:1000–5.
• Evidence-based clinical practice guidelines 7. Cooke CR, Gould MK.  Advancing clinical prac-
have been designed to replace the consensus- tice and policy through guidelines: the role of the
based guideline to increase health-care effi- American Thoracic Society. Am J Respir Crit Care
Med. 2013;187:910–4.
ciency and patient care success. 8. Hoffman C, Rice D, Sung HY. Persons with chronic
• Guidelines have limitations, but they can have conditions. Their prevalence and costs. JAMA.
a positive impact on patient care. 1996;276:1473–9.
• They are designed to streamline patient care— 9. Gerteis J.  Multiple chronic conditions Chartbook:
2010 medical expenditure panel survey data. 2014.
potentially aiding in the treatment of the antic- www.ahrq.gov/sites/default/files/wysiwyg/pro-
ipated increase of chronic conditions. fessionals/prevention-chronic-care/desicion/mcc/
• As CPGs serve as a summary of scientific evi- mccchartbook.pdf.
dence available, those areas which lack ade- 10. Hempstead K. The real killer is still out there: update
on health care spending. 2017. http://www.rwjf.org/
quate clinical research may become research en/library/research/2017/08/the-real-killer-is-still-
priority. out-there-update-on-health-care-spending.html.
• Through the extensive and deliberate analysis 11. Roehrig C.  The health SPending slowdown for

of high-quality medical literature, CPGs pro- 2008–2013: implications for sustainability. 2017.
http://altarum.org/sites/default/files/uploaded-related-
vide evidence-supported health-care plans for files/11_Roehrig%20Symposium%20July%20
physicians and patients alike. 18%202017%20FINAL.pdf.
• Ultimately, these guidelines may reduce prac- 12. Centers for Disease Control. Arthritis—National

tice variation, improve quality of care, decrease Statistics. 2017. Located at: CDC.
13. Association PIaAH. Quality and patient safety. 2006.
inefficiencies, and withstand the scrutiny the http://aha.org/advocacy-issues/quality/background/
previous guideline process could not. shtml.
14. AAOS. Clinical Practice Guidelines. Am Acad Orthop
Surg. 2017.
52.7 Useful Websites 15.
Als-Nielsen B, Chen W, Gluud C, Kjaergard
LL.  Association of funding and conclusions in ran-
domized drug trials: a reflection of treatment effect or
https://www.aaos.org/guidelines/?ssopc=1 adverse events? JAMA. 2003;290:921–8.
h t t p : / / w w w. o r t h o g u i d e l i n e s . o r g / 16. Bekelman JE, Li Y, Gross CP. Scope and impact of
financial conflicts of interest in biomedical research:
topic?id=1018
a systematic review. JAMA. 2003;289:454–65.
52  A Clinical Practice Guideline 549

17. Riaz H, Raza S, Khan MS, Riaz IB, Krasuski


23. American Academy of Orthopaedic Surgeons. AAOS
RA. Impact of funding source on clinical trial results Introductory Packet for Clinical Practice Guidelines
including cardiovascular outcome trials. Am J (CPG)/Systematic Review (SR) Work Group
Cardiol. 2015;116:1944–7. Members. 2015:31.
18.
Gagnon A.  Corporate influence over clinical 24. AAOS. Guideline peer review and public responses.
research: considering the alternatives. Prescrire Int. https://www.aaos.org/guidelinepeerreview/?ssopc=1.
2012;21:191–4. Accessed 20 Nov 2017.
19. Turner EH, Matthews AM, Linardatos E, Tell RA, 25. Waters E.  Evidence for public health decision-mak-
Rosenthal R.  Selective publication of antidepressant ing: towards reliable synthesis. Bull World Health
trials and its influence on apparent efficacy. N Engl J Organ. 2009;87:164.
Med. 2008;358:252–60. 26.
Shekelle P, Eccles MP, Grimshaw JM, Woolf
20. Shea KGMC, Quinn R, Beckmann JT.  Evidence
SH. When should clinical guidelines be updated? BMJ.
based quality and outcomes assessment in Pediatric 2001;323:155–7.
Orthopedics. Rosemont, IL: AAOS; 2016. 27.
American Academy of Orthopaedic Surgeons.
21. Jevsevar D, Shea K, Cummins D, Murray J, Sanders Management of anterior cruciate ligament injuries.
J. Recent changes in the AAOS evidence-based clini- Located at: Clinical Practice Guidelines. 2015.
cal practice guidelines process. J Bone Joint Surg Am. 28. Kevin G Shea CTP. Turning a CPG into a care map.
2014;96:1740–1. 2016. https://www.aaos.org/AAOSNow/2016/Oct/
22. Guyatt GH, Oxman AD, Vist GE, Kunz R, Falck- Research/research03/. Accessed 14 Nov 2017.
Ytter Y, Alonso-Coello P, et al. GRADE: an emerging 29.
AAOS.  DDH Care Map-Diagnoses and
consensus on rating quality of evidence and strength Referral Pathway. 2016. http://hipdysplasia.org/
of recommendations. BMJ. 2008;336:924–6. diagnosis-and-referral-pathway/.
How to Navigate a Scientific
Meeting and Make It Worthwhile?
53
A Guide for Young Orthopedic
Surgeons

Darren de SA, Jayson Lian, Conor I. Murphy,


Ravi Vaswani, and Volker Musahl

53.1 Introduction opportunities to explore new geographic regions


and cultures, they provide ample opportunity to
“When minds meet, they don’t just exchange facts; refresh, reignite our passions, avoid burnout, and
they transform them, reshape them, draw different
implications from them, and engage in new trains
ultimately, deliver high-quality care. To fully take
of thought. advantage of a scientific meeting, thorough prepa-
ration is warranted, with success akin to a rigorous
Conversation doesn’t just reshuffle the cards, it
creates new cards” “preoperative plan” that has been visualized,
rehearsed, and ready to adapt to sudden changes.
Theodore Zeldin (b.1933) Historian & Author
This chapter, modeled after the “preoperative
Scientific meetings present immense potential to plan,” will outline some key elements of a scien-
enhance one’s overall well-being in both personal tific meeting and present a helpful guide for pre-,
and professional realms. Often a welcomed break during, and post-meeting actions to maximize the
and change in tempo from the hectic clinical personal and professional impact.
schedule, these meetings present opportunities to
not only reconnect with peers and/or establish new
contacts but also to focus on evidence-based prac- 53.2 Clinical History
tice. Given that the quantity of scientific literature
continues to expand at an exponential rate, and Well before one sets foot in a conference venue,
considering recent evidence suggesting the poor significant preparation, often at a minimum of
short-term publication rate of posters and podiums 2–3 months in advance, is required. Although not
presented at various meetings [7–9], attendance always obvious, the key first step is to identify a
and more so active participation at these meetings meeting of professional relevance among the
enable one to remain “ahead of the curve” in their numerous listed to maximize time-value effi-
chosen field. Though at the very least, these are ciency. It is wise to often consult colleagues and
mentors for recommendations, to examine previ-
ous conference programs, and to avoid common
D. de SA · C. I. Murphy · R. Vaswani
V. Musahl (*)
pitfalls of attending a meeting simply to keep the
Department of Orthopaedic Surgery, University of status quo (i.e., “everyone is attending”), or
Pittsburgh Medical Center, Pittsburgh, PA, USA because of its sheer size (i.e., “bigger is not
e-mail: musahlv@upmc.edu always better”). Though this holds true of any
J. Lian meeting, it is of importance especially if a first-
Albert Einstein College of Medicine, time attendee to prioritize meetings based on an
Bronx, NY, USA

© ISAKOS 2019 551


V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_53
552 D. de SA et al.

cognizant of the cost-savings from “early” regis-


Fact Box 53.1 tration and accommodation booking. Though
SMART goals are specific, measureable, often influenced at an individual institutional
achievable, results-focused, and time- level, junior learners are often provided with
bound. They are a recognized method of limited access to funding, and so early arrange-
improving efficiency and thoroughness of ments are particularly beneficial for this group.
task completion [3]. If used when planning Regardless, it is highly recommended to consult
one’s approach to a scientific meeting, they with travel agencies in collaboration with a meet-
can help the attendee make the most of the ing, to ensure cost-effective and effortless
experience. arrangements are made. This is particularly true
if bringing family, as it is recommended to not
only consult the conference program but also the
ability to achieve personal goals (i.e., education, visitor center for the conference location to
networking, visiting/trialing new vendor prod- develop an itinerary of attractions and restaurants
ucts, marketing, etc.). Know the conference sci- to keep your companions entertained.
entific program and its intended audience, and
from this, establish SMART goals (see Fact Box
53.1) for both the immediate conference and 53.3 Physical Exam
short-term period thereafter—that is, objectives
that are specific, measurable, achievable, results- Know the meeting, well in advance. This is an
focused, and time-bound [3]. Remember, depend- opportunity to see if the individual components
ing on the size of the meeting, its individual of the meeting align with one’s understanding of
components can range quite broadly and include, its intended aims and audience. Review, in-depth,
but are not limited to, didactic lectures, small- the scientific program; download meeting-spe-
group discussions/workshops, multiple breakout cific smartphone applications; and note dates of
sessions of smaller focus, various scientific and interest (i.e., Welcome Reception, Opening
vendor exhibits, and numerous social activities. Ceremony, Guest/Keynote Speaker, meeting
It is not necessary, or often not feasible, to do highlights). Knowledge of the different events is
it all. important, facilitating one to pack comfortable,
Upon selecting a meeting of interest, know the yet appropriate, professional clothing for any set-
important dates (i.e., “early bird” registration, ting: delivering a podium presentation, serving as
calls for abstracts, award application deadlines, part of a panel discussion, attending a “black tie”
accommodation booking dates, deadlines for reg- fundraiser or “cocktail reception,” etc. Moreover,
istration of pre-courses or supplemental activi- as meetings are shifting toward encompassing
ties, etc.). Whenever possible, consider more active audience participation—often medi-
submitting content, be it abstracts, instructional ated through smartphone applications—obtain-
course lectures (ICLs), video demonstrations, or ing and testing the smartphone application ahead
the like, to optimize chances of being an active of time will enable one to be savvy with navigat-
participant, engaged in the meeting. This will ing throughout the meeting and participating in a
best ensure that information is not only obtained timely fashion. Though it may appear daunting,
but further retained with higher likelihood of consider reading the accepted abstracts for podi-
being practically applied. At the very least, add ums and posters of sections of personal interest,
your name to the conference mailing list, to keep note the authors, review the faculty profiles, and
abreast of updates/latest information. compose, ahead of time and where applicable,
The importance of being fiscally responsible one or two questions of significance to ask should
has not escaped notice. To this end, search for the opportunity arise. Preparation is key, and this
institutional and conference-specific funding advanced preparation will undoubtedly enhance
sources to support travel and participation, and be understanding of conference content and facili-
53  How to Navigate a Scientific Meeting and Make It Worthwhile? A Guide for Young Orthopedic Surgeons 553

tate networking as well. To this end, armed with the timeline, and plan for execution to maintain
this intimate knowledge of conference content, it focus. However, understand that the pre-meeting
would be wise to contact and establish meetings plan is not “written in stone” [5], and those who
with people of interest at this juncture, well in truly benefit from meeting attendance often spend
advance of the meeting date, as it can be nearly time to reflect at the end of each conference day
impossible to exchange itineraries at the venue on the value obtained from the pre-plan—main-
itself. In-depth review of the conference outline taining flexibility to modify the remainder of the
and content also will facilitate advanced prepara- schedule to maximize experiences and follow-up
tion for technical workshops and maximize edu- on new connections made. Again, thorough
cational yield. It will enable one to contribute to knowledge of all opportunities a given meeting
as well as gain from interactions with others in presents ahead of time eases this transition.
these more focused sessions. Typically, during the check-in process, partici-
pants receive materials including the full program,
an identification badge, and a tote bag. It is impor-
53.4 Imaging tant at this juncture to identify key locations
including the resource center, exhibit hall, lecture
Though especially important for the large society hall, and food and restroom locations and obtain
annual meetings, reviewing the conference and Wi-Fi access at the conference location.
accommodation venue layouts is paramount.
Obtain the “big picture” of where large group
(didactic) sessions are located versus such other 53.5.1 Education
elements as symposia, podium presentations,
poster/e-poster presentations, surgical demon- While visiting symposia, podium presentations,
strations, instructional course lectures, panel dis- and poster exhibitions, among other conference
cussions/debates, technical/vendor exhibits, events, it is critical to be engaged. However, being
hands-on workshops, career development/prac- engaged oftentimes does not lead to retention of
tice management booths, etc. so that such knowl- learned information at the conference [17]. In
edge of both the content and layout will enable fact, 3 and 90  days following a conference, the
thoughtful planning of an individualized sched- mean retention rate was reported of only 14.9%
ule. Know when scheduled breaks are and take and 11.3%, respectively [17]. One solution for
them. Do not plan on filling every moment of the this may be purposeful self-selection and atten-
day with conference-related activity, and make a dance of poster presentations or podium presenta-
conscious effort to sample all the different types tions. Secondly, having access to conference
of settings for delivering content, all while bal-
ancing scheduled personal time to focus on
fatigue management [22] and sleep hygiene [4]. Fact Box 53.2
Flipped sessions involve exposing learners
to content before the classroom session, so
53.5 The Approach that learners are more prepared and the
classroom sessions can focus on interactive
Arrive early, well rested, and free from outside activities and questions the learners may
distractions. A “pearl,” if possible, would be to have [13]. This model works well for scien-
not bring unfinished work (i.e., dictations, manu- tific meeting sessions as learners have vary-
scripts to review, etc.) to the meeting. While in ing levels of experience, so the learners can
attendance, one should be wholeheartedly focused tailor the sessions to fit their needs. The
on being mentally and physically present. If nec- moderators of the sessions can then act as
essary, perform a pre-meeting “time-out” to again facilitators rather than lecturers [13].
review one’s intended and individualized goals,
554 D. de SA et al.

materials in advance and preparing for meaning- to a conference participant, it is equally critical
ful dialogue can promote and enhance a “flipped that it not be used to answer work-related emails
session” format, which is a contemporary model or provide means to other distractions. This only
of learning in which lecturer and viewers engage detracts from the purpose and opportunity of the
one another with question and answer—shown to conference. Furthermore, compartmentalization
aid learning and retention [13, 21]. As more con- of work-related materials to the end of the day,
ferences adapt this format [13, 21], preparation and focusing on it with full attention is a strat-
for conferences can greatly augment the confer- egy shown to provide improved efficiency and
ence experience. That being said, approach the results [11].
conference as a learner, and set aside personal
agenda and/or ego.
Conferences are significant in that current and 53.5.2 If Presenting
up-to-date concepts are discussed among experts
in the field. During meetings, listening to “buzz- Preparation is of the utmost importance for exe-
words” or “themes” may help flavor future cuting a successful presentation at a scientific
research directions as they provide a “snapshot” meeting. Initially it is important to research effec-
of the field. Likewise, learning about ongoing, tive presenting strategies and techniques in antic-
impactful research projects generates new ipation of the presentation. Constructing slides
research ideas and innovation. Often, new ideas that are readable and concise in a format that is
are invented during conversation among orthope- easily consumed by the audience is difficult and
dic surgeons at these conferences. must be consistently at the forefront of the pre-
As the concept of a scientific meeting evolves, senter’s mind during composition. Formatting
so too does technology, and as such, the internet and color schemes need to be consistent with
has drastically changed the way in which confer- other presentations within the department or lab-
ence participants interact. An increased number oratory where the work was performed.
of conferences are utilizing social media, such as Furthermore, formatting should adhere to any
Twitter® or smartphone meeting/conference listed guidelines for the meeting. Standard con-
applications, to engage conference participants tent includes a brief background, hypothesis and
as well [14]. For example, there are times when aims, methodology, results, conclusions, and dis-
traditional poster presentations fail to elicit cussion. Each slide should be purposeful, with a
engaging questions and enthusiastic viewers. “take-home message” that is easy to convey. Use
Traditional oral presentations can largely be lec- of images can be particularly helpful. Lastly, the
ture style and less interactive. Some conferences slides and material must be tailored to the tar-
have therefore utilized Twitter and other social geted audience. Attendees may be comprised of
media outlets to encourage conversation among scientists, clinicians, vendors, mathematicians,
conference participants and interested outsiders statisticians, engineers, residents and fellows, or
[14]. In a study among the urological community, any combination thereof. The emphasis and mes-
for example, conference attendees using Twitter® sage of the presentation may change depending
found it beneficial for networking (97%), spread- on the target audience.
ing information (96%), research (75%), advo- After the slides have been prepared and edited
cacy (74%), and career development (62%) [1]. by the presenter and advisor, live practice to hone
As the social media presence of orthopedic sur- timing and delivery and anticipate potential audi-
geons grows [18], the importance of utilizing ence questions is necessary. There should be no
Internet platforms to bolster conversation and difficulty complying with time limits established
dialogue cannot be ignored. under the presentation guidelines. Failure to do so
The Internet, however, can be a “double-edged demonstrates lack of preparation and will distract
sword,” and though it poses tremendous advantages from the presentation and “take-home messages.”
53  How to Navigate a Scientific Meeting and Make It Worthwhile? A Guide for Young Orthopedic Surgeons 555

Scripting the presentation so as to avoid unneces- be delivered with vigor and glean equal insights.
sary pauses or lapses is helpful. Lab meetings or Although often led by a moderator to keep to a
educational conferences are advantageous venues particular task at hand, it is important to remem-
to practice in front an audience that will be similar ber the group nature of a roundtable, and as such,
to the scientific meeting. Monitor the timing, elicit one must ensure that when participating, all mem-
feedback, and make final edits as needed to clarify bers present are being addressed and afforded
the message of the presentation. Remember, you opportunities to engage as well. Questions should
represent not only yourself but your institution— be posed with appropriate tone, so as to convey a
and this must not be overlooked. sense of open-mindedness and to be mindful of
At the scientific meeting, the first manner of the process. To that end, all action items of the
order is to locate the “Speaker Ready” room to roundtable may not be addressed during the ses-
upload presentation materials. Most meetings sion, and emphasis should remain on thoroughly
will specify a location with technical assistants, addressing a particular item, as opposed to rush-
ready to upload the appropriate materials. Follow ing transitions through topics or pressing for a
the listed instructions to avoid any difficulties. consensus resolution.
Maintain a backup of the presentation on a por-
table memory device and/or in an e-mail inbox, 53.5.2.2 H  ow to Make the Most
should the upload encounter problems. of Panel Discussions?
Finally, visit the venue in advance of the pre- On the contrary, a panel discussion involves
sentation in order to become familiar with the members of the audience listening to the perspec-
room and the technical tools available. Ensure tives of experts who have been pre-selected to the
that any pointers, clickers, or other presentation panel. Each individual panel’s focus may be dif-
tools are in working order. It can be helpful to ferent, and as an audience member, it is important
attend another presentation in the same venue to realize that members of the panel have differ-
earlier, to observe other technical shortcomings ent roles, suited to the particular meeting. Not all
or mistakes so as to avoid repeating them. It can panel members may agree with each other. Panel
also help a great deal with any degree of public members may be pre-selected to present a view-
speaking anxiety. point in keeping with their individual practice, to
present a viewpoint contrary to their practice, to
53.5.2.1 H  ow to Make the Most debate other panel members, etc. Thus, to maxi-
of Roundtable Discussions? mize the yield from attending panel discussions,
A roundtable provides a small group of partici- one should keep this in mind and anticipate a pas-
pants (typically 10–12) the opportunity to, over a sive learning experience for the most part. It is
short 60–90  minute time period, partake in a helpful to not only prepare for the topic and
group discussion across specific items of a topic anticipated viewpoints ahead of time but to also
of interest. Given its academic nature, a roundta- research background information on each panel
ble encompasses a rapid exchange of high-level member, to familiarize yourself with their pre-
information among peers, with its success resting rogative in advance. This will further enable one
on input from all members in attendance. Often, it to identify some key areas of controversy and
is an opportunity to clarify approaches to current possible areas that will be addressed. Though
problems and provide an avenue to present and opportunities do exist to pose questions to the
weigh all viewpoints. Therefore, it is paramount panel, it is advised to use this platform wisely, be
that you first attend a roundtable that not only is mindful of the others in attendance, be respectful
of particular interest to you but to which you have of everyone’s time, and not use this opportunity
a sufficient knowledgebase and that is being as an avenue to express your own perspectives, to
attended by peers, ideally those you are familiar debate the panelists, or to engage in lengthy one-
with, so that contributions can, at the very least, on-one discussions.
556 D. de SA et al.

53.5.2.3 H  ow to Make the Most contain information for specific populations of


of Breakout Sessions/ registrants that may be available for scholarships
Hands-on Workshops? or price reduction to attend the meeting. There
Breakout sessions, particularly those with a will possibly be price reductions at local hotels,
hands-on focus, provide a unique opportunity to restaurants, or other local establishments for reg-
integrate learning, challenge oneself, and learn istrants as well.
from expert facilitators leading the workshop. Some meetings may have interactive agendas
Preparation is the key to maximizing return from listing the most up-to-date schedules of events. In
these opportunities. Review the anatomy, prac- order to maximize time and efficiency at the
tice with the equipment, review the procedural meeting, it is important to study the upcoming
sequence, etc. ahead of time. One must possess daily agenda and plan out the day ahead of time
strong self-reflective abilities and select work- in order to avoid missing lectures or exhibitions
shops that are mindful of one’s ability and level of personal interest and relevance. At larger meet-
of training. Unless specified as introductory, or ings with several simultaneous events, the web-
focused on a novel technique, one should thor- site and online agenda can be the best tool to help
oughly prepare for this session by preemptively plan out a personal daily itinerary. It is often easy
obtaining the necessary experiences that can sub- to sort events by topic or speaker in order to iso-
sequently be built upon during the workshop. late specific interests.
Hands-on workshops provide a great opportunity Many scientific meetings publish a dedicated
to refine foundational skills, learn new approaches app for mobile devices. Depending on the meet-
to similar problems, exchange tips/tricks, and, in ing, this app may function as the primary tool for
a safe environment, experiment with new tech- event feedback, enrollment in daily activities,
niques outside of their comfort zone. Practice voting in polls, or announcing last-minute
makes permanent [2]; and so, while these are changes in venue or speaker. Associated video
often artificial environments, it is important to lectures, learning modules, abstracts, posters, and
visualize the clinical setting in which the skill is referenced manuscripts can be available through
being applied, to simulate the real-world environ- mobile apps as well. Briefly reviewing these
ment as much as possible. Poor preparation will materials prior to attending the event greatly sup-
prevent any meaningful skill acquisition and will plements presentations and augments retention.
be pointless for both yourself and the instructor(s). Prior to the meeting, check the website for
announcements regarding available apps, and
53.5.2.4 H  ow to Make the Most download it to your mobile device.
of Online Tools
The increasing utilization of technology within
orthopedic surgery has also expanded into the 53.5.3 Networking
realm of scientific meetings. Online tools con-
tinue to become more available to assist attend- While networking strategies have been critically
ees navigate the meeting. First, the scientific analyzed and developed in other specialties, it
meeting website operates as the primary tool for seems there is a paucity of literature in the medi-
registration, description, updates, and informa- cal community when approaching this topic. The
tion. Programs, relevant academic materials, reason for this is multifactorial; however, it is
speakers, maps, exhibitions, lectures, and other important to recognize that networking is an
special events will all be listed here. It is impor- important skill all physicians should acquire.
tant to visit the website well in advance of the Like all skills, networking has both instinctive
meeting to register and pay any fees. This will and learned components and can be improved
accelerate the on-site registration and possibly over time, but it also can be prepared and trained.
avoid any late increases in price of registration as As medicine is largely founded on collaboration
the meeting gets closer. The website will also and teamwork, this should be paramount in
53  How to Navigate a Scientific Meeting and Make It Worthwhile? A Guide for Young Orthopedic Surgeons 557

physician development and training. In this institutions as well, as the latter provides an
respect, conferences are key opportunities to opportunity to reconnect, reinforce relationships,
meet and connect with other physicians with and share experiences in a less-judgmental and/
shared interests. or anxiety-provoking environment.
Nowadays, business cards are less important, For junior learners going to conferences, net-
and there are countless methods to network working can come in a variety of forms: among
online, such as through LinkedIn®, ResearchGate®, peers from other institutions, with faculty, and
Facebook®, Twitter®, and Instagram®. For exam- with potential mentors. Mentors are invaluable
ple, among nearly 1000 Pediatric Orthopaedic resources for learners and can provide intangible
Society of North America members, 95% of benefits such as letters of recommendation, job
members had a professional webpage, 36.8% a positions, and career-long advice and guidance.
LinkedIn® page, 33% at least one YouTube® video, Junior learners can use the meeting as a chance
25.8% a ResearchGate® page, 14.8% a profes- to spend time outside of the work environment
sional Facebook® page, and 2.2% a professional with their mentors and build a more personal
Twitter® page [10]. Among members, private- relationship. Furthermore, by following a senior
practice physicians had double the utilization of faculty member, learners can also have the oppor-
social media [20]. Growing an Internet presence tunity to meet and network with other senior
prior to the conference may additionally improve members of the community. For mid-career and
networking. senior leaders, it is mutually beneficial to develop
As alluded to earlier, further preparation for their mentorship capabilities. While it can be
networking can include identifying presenters daunting for learners to network, it is simultane-
that one may be interested in meeting, or profes- ously imperative that junior learners seize oppor-
sional societies one may be interested in joining. tunity to learn from more experienced physicians
Performing the necessary research on their areas and that they get sufficient face time with leaders
of interest, and finding a niche for potential col- in the field. This is especially important given the
laboration, can increase the likelihood for suc- “Internet age,” as the younger generation is more
cessful networking. Preparing a three-sentence likely to engage online rather than face to face
“Tell me about yourself” pitch should also be [15]. Junior learners should also take notice of
practiced, rehearsed, and delivered confidently. how more senior conference attendees act in
Recent trends suggest the content of these “eleva- order to learn and seek advice regarding the
tor pitches” has shifted toward why a business “unspoken rules of proper conference etiquette”
exists and that the average adult attention span is [19]. Further, similar to resident and attending
currently just 8 s—perhaps due to the advent of physicians, a junior learner can build his/her
smartphones [16]. Seizing the brief windows of reputation in the field by their behavior at a con-
opportunity that arises during a conference, with ference. Lastly, finding a genuine connection to
esteemed professionals in the field, can dramati- junior learner peers can go a long way, since
cally affect career path. As such, the importance these fellow learners will most likely continue to
of attending the social events, alumni and society become peers and leaders in the field in the
meetings, and approaching people has not gone future.
unnoticed. Besides addressing one’s nutritional
needs, noting the breakfast and lunch schedules
can be invaluable networking opportunities, and 53.5.4 Equipment/Vendors/Suppliers
it is key that one sits at an active table, engages in
discussion with others, and removes any distrac- Sponsors, vendors, equipment suppliers, and
tions (i.e., concomitant laptop or smartphone many medical companies occupy a large and cen-
use). Try to balance efforts on meeting new peo- tral role in scientific meetings. Many attendees
ple with time spent with members from one’s avoid these areas and interactions with represen-
own institution or previous colleagues at other tatives to avoid the stigmatized label of being an
558 D. de SA et al.

“industry-sponsored physician.” The relationship productive and what formats resulted in disen-
between industry and physicians has been scruti- gagement. This is important to help form future
nized and more regulated recently due to past conference planning, though a fair degree of tol-
unscrupulous practices by some physicians and erance should be exercised as poor experiences in
industry leaders [6]. However, attendees should one format may be one-off experiences due to a
not be apprehensive about going to these areas of multitude of factors. Often meetings have post-
the scientific meeting as industry representatives conference access to the program, video presen-
play vital roles in the hospital to provide patient tations, copies of presentations, etc., and one
care and assist physicians. Establishing relation- should obtain copies of these for future reference
ships and developing them over time have bene- materials.
fits for both sharpening surgical technique and
providing enhanced patient care. Mastering the
plethora of tools in the surgical armamentarium 53.7 Follow-Up
and recognizing the capabilities and deficiencies
of each are an arduous process of experience. It is important to establish a platform to share the
Vendors and representatives from equipment sup- knowledge gained from the meeting with others,
pliers are a wealth of technical product knowl- as this allows the reach of the particular confer-
edge and can be used to help expedite this ence to grow exponentially and further integrates
learning curve. They can share “tips and tricks” the key concepts. Preferably, these summary
from other surgeons and provide discounted meetings are established pre-departure. Lab
access to educational opportunities such as members can collaborate by assigning certain
cadaver labs, sawbones sessions, and research topics beforehand to each member who is attend-
opportunities to compare products. Lastly, these ing. Then, in the summary meeting, each member
connections may increase surgeon exposure to will summarize what he or she has learned for the
other techniques and equipment serving similar rest of the group. This way, all members can
existing purposes to those at home institutions expand their knowledge from the meeting even if
but at a decreased cost. As the landscape of medi- they did not attend.
cine transitions to a value-based healthcare The importance of following up with key con-
approach with an evolving financial payout meth- nections made during the meeting cannot be
odology [12], reduced equipment costs with sim- underestimated. Do not hesitate to implement
ilar function and outcome can play a pivotal role. social media, e-mail, etc. to send a short message
to new connections—not only does this reinforce
to them the value of their personal connection,
53.6 Closure but this may further lead to future collaborations
and networking opportunities. Finally, if not
At the conclusion of the scientific meeting, if already done, this remains a last opportunity to
done right, one should be refreshed, be reinvigo- not only join the society hosting the meeting, if of
rated in their chosen field, and possess a sense of interest, but to also provide meeting feedback and
accomplishment. The initial moments represent a ensure continuing medical education (CME) cer-
vital period for both reflection and following up tificates are received/statuses updated.
on connections established during the meeting.
Though not necessary, any notes made should be
revisited to reinforce key ideas, return to further 53.8 Conclusion
reading materials/resources, or identify the
springboard for the next set of personal activities. In conclusion, scientific meetings are excellent
Be critical of your experience, evaluating what avenues for pursuing continuing medical educa-
content and session format was most enjoyable/ tion. To take advantage of these meetings, a
53  How to Navigate a Scientific Meeting and Make It Worthwhile? A Guide for Young Orthopedic Surgeons 559

• Finally, junior learners are exposed to the spe-


Fact Box 53.3 cialty outside of traditional lectures and start
The online resources often include supple- to build their own network.
mental information and meeting schedules.
These include MyAcademy App, aaos.org/
annualmeeting, ORS app, isakos.com/ Glossary of Meeting-Specific Terms
meetings, My POSNA App, and ota.org/
education/meetings-and-courses. Scientific meeting  Academic gathering of sci-
entific professionals to discuss research and
specialty-specific topics.
comprehensive researched strategy prior to Networking  Forming connections between pro-
arrival enhances the experience and allows the fessionals that are typically used for future
attendee to make the most of the time. The extra- collaborations.
curricular exhibits such as the vendor displays, Junior learner A young student or physi-
technical demonstrations, and social engagements cian who attends meetings to expand his/her
supplement the experience by providing opportu- knowledge and form new relationships.
nities for networking and establishing profes- Roundtable Conversation among experts typi-
sional relationships. Multiple online tools such as cally regarding a topic.
websites and apps exist to streamline the dissemi- Breakout session  Short session where a group
nation of meeting information in a portable and of attendees discusses a central topic among
easy-to-use fashion. Post-meeting analysis ses- each other.
Panel discussions  A small cohort of experts dis-
sions help promulgate the energy and ­information
cussing a topic of interest in front of a large
shared at the scientific meeting into the communi-
audience.
ties and practices of the participants.
Symposium  A formal discussion among experts
in the field regarding a specific topic.
Take-Home Message Podium presentation Formal exhibition of a
• Navigation of scientific meetings requires noteworthy research study, typically given in
research and a seamless execution of a well- front of an audience.
formed “preoperative plan” prior to attendance. Instructional course lecture (ICL) An up-to-
• Understanding the components of the meeting date educational series, given by a panel of
and aligning specific areas of interest ahead of experts, on a specific topic and/or procedure.
time facilitate the attendee to efficiently utilize Vendor  Medical equipment supplier companies
time and energy. present at medical conferences to advertise
• Beyond the multitude of educational events, there their product and negotiate partnerships with
are many opportunities for networking, equip- physicians and medical researchers.
ment demonstrations, and social endeavors.
• It is not necessary, or often not feasible, to do
it all. References
• At the conclusion of the meeting, it is crucial
1. Borgmann H, DeWitt S, Tsaur I, Haferkamp A,
to dedicate time to synthesize the experience
Loeb S.  Novel survey disseminated through Twitter
and knowledge gained in order to develop supports its utility for networking, disseminating
future academic pursuits and collaborations. research, advocacy, clinical practice and other profes-
• The connections developed at these meetings sional goals. Can Urol Assoc J. 2015;9(9-10):E713–7.
https://doi.org/10.5489/cuaj.3014.
foster new ideas and relationships between
2. Bostic JQ.  Practice makes permanent (not necessar-
orthopedic surgeons that ultimately improve ily perfect). J Am Acad Child Adolesc Psychiatry.
delivery of patient care and expand research 2016;55(9):749–50. https://doi.org/10.1016/j.jaac.
horizons. 2016.06.007.
560 D. de SA et al.

3. Bowman J, Mogensen L, Marsland E, Lannin N. The 12. Porter ME, Lee TH. From volume to value in health
development, content validity and inter-rater reli- care: the work begins. JAMA. 2016;316(10):1047–8.
ability of the SMART-goal evaluation method: a https://doi.org/10.1001/jama.2016.11698.
standardised method for evaluating clinical goals. 13. Ramnanan CJ, Pound LD. Advances in medical edu-
Aust Occup Ther J. 2015;62(6):420–7. https://doi. cation and practice: student perceptions of the flipped
org/10.1111/1440-1630.12218. classroom. Adv Med Educ Pract. 2017;8:63–73.
4. Brick CA, Seely DL, Palermo TM.  Association https://doi.org/10.2147/AMEP.S109037.
between sleep hygiene and sleep quality in medi- 14. Randviir EP, Illingworth SM, Baker MJ, Cude M, Banks
cal students. Behav Sleep Med. 2010;8(2):113–21. CE. Twittering about research: a case study of the World’s
https://doi.org/10.1080/15402001003622925. first twitter poster competition. F1000Res. 2015;4:798.
5. Devlin R.  Value of networking. Emerg Nurse. https://doi.org/10.12688/f1000research.6992.3.
2016;24(7):17. https://doi.org/10.7748/en.24.7.17.s23. 15. RitchieA. How to network with fellow physicians to land
6. Flacco ME, Manzoli L, Boccia S, Capasso L, the job you want. Med Econ. 2013. http://www.medi-
Aleksovska K, Rosso A, Scaioli G, De Vito C, caleconomics.com/modern-medicine-feature-articles/
Siliquini R, Villari P, Ioannidis JP.  Head-to-head how-network-fellow-physicians-land-jobyou-want.
randomized trials are mostly industry sponsored 16. Robinson R.  The art of the elevator pitch: 4 tips

and almost always favor the industry sponsor. J for making an impression. Forbes. 2017. https://
Clin Epidemiol. 2015;68(7):811–20. https://doi. www.forbes.com/sites/ryanrobinson/2017/09/05/
org/10.1016/j.jclinepi.2014.12.016. elevator-pitch-tips-making-impression/.
7. Kay J, Memon M, de Sa D, Duong A, Simunovic N, 17. Saperstein AK, Lennon RP, Olsen C, Womble L,
Athwal GS, Ayeni OR. Five-year publication rate of Saguil A.  Information retention among attendees at
clinical presentations at the open and closed American a traditional poster presentation session. Acta Med
shoulder and elbow surgeons annual meeting from Acad. 2016;45(2):180–1. https://doi.org/10.5644/
2005-2010. J Exp Orthop. 2016;3(1):21. https://doi. ama2006-124.178.
org/10.1186/s40634-016-0059-z. 18. Sculco PK, McLawhorn AS, Fehring KA, De Martino
8. Kay J, Memon M, de Sa D, Duong A, Simunovic N, I.  The future of social media in orthopedic surgery.
Ayeni OR.  Does the level of evidence of paper pre- Curr Rev Musculoskelet Med. 2017;10(2):278–9.
sentations at the Arthroscopy Association of North https://doi.org/10.1007/s12178-017-9412-9.
America annual meetings from 2006-2010 correlate 19.
Sohn E.  Networking: Hello, stranger. Nature.
with the 5-year publication rate or the impact factor of 2015;526(7575):729–31.
the publishing journal? Arthroscopy. 2017;33(1):12– 20. Sumrein BO, Huttunen TT, Launonen AP, Berg HE,
8. https://doi.org/10.1016/j.arthro.2016.05.032. Felländer-Tsai L, Mattila VM. Proximal humeral frac-
9. Kay J, Memon M, Rogozinsky J, de Sa D, Simunovic tures in Sweden-a registry-based study. Osteoporos
N, Seil R, Karlsson J, Ayeni OR.  The rate of publi- Int. 2017;28(3):901–7. https://doi.org/10.1007/
cation of free papers at the 2008 and 2010 European s00198-016-3808-z.
Society of Sports Traumatology Knee Surgery and 21. Torre D, Manca A, Durning S, Janczukowicz J, Taylor
Arthroscopy Congresses. J Exp Orthop. 2017;4(1):15. D, Cleland J. Learning at large conferences: from the
https://doi.org/10.1186/s40634-017-0090-8. ‘sage on the stage’ to contemporary models of learn-
10.
Lander ST, Sanders JO, Cook PC, O’Malley ing. Perspect Med Educ. 2017;6(3):205–8. https://doi.
NT. Social media in pediatric orthopaedics. J Pediatr org/10.1007/s40037-017-0351-3.
Orthop. 2017;37(7):e436–9. https://doi.org/10.1097/ 22. Wong LR, Flynn-Evans E, Ruskin KJ.  Fatigue

BPO.0000000000001032. risk management: the impact of Anesthesiology
11. Maloney S, Tunnecliff J, Morgan P, Gaida J, Keating Residents’ work schedules on job performance
J, Clearihan L, Sadasivan S, Ganesh S, Mohanty P, and a review of potential countermeasures. Anesth
Weiner J, Rivers G, Ilic D.  Continuing professional Analg. 2018;126(4):1340–8. https://doi.org/10.1213/
development via social media or conference atten- ANE.0000000000002548.
dance: a cost analysis. JMIR Med Educ. 2017;3(1):e5.
https://doi.org/10.2196/mededu.6357.
How to Write a Scientific Article
54
Lukas B. Moser and Michael T. Hirschmann

54.1 Introduction starts with a central conflict, which is not entirely


resolved during the story. It leaves the reader
The first step into unknown territory is always the with some loose ends, which creates the neces-
hardest but the most important one. There are sary tension for the reader. This ambiguity and
numerous steps in orthopedic residency, where a lack of clarity is one essential part of a novel. In
resident must master unknown areas using addition, a novel does not have to be linear. It
unknown methods for unknown problems. It is a could jump from present to past and backwards.
steady pursuit for improvement. One has to Novels are typically written in active person,
understand that even the most exciting and origi- either first or third person.
nal results may not be accepted for publication in In contrast to novels, accuracy and clarity are
a peer-reviewed journal if the presentation and the most important pillars of good scientific writ-
illustration is of mediocre quality. It is of utmost ing. Accurate, clear, and unambiguous expres-
importance to acquire good scientific writing sions of your findings are crucial. Do not use
skills for the researchers’ armamentarium. expressions such as “To the best of our knowl-
Writing the first scientific paper appears to be edge,” “up to,” or “approximately.” Examples of
one of these difficult steps to master. No one is a commonly used phrases and their possible
born master. In fact, scientific writing has to do impression to the reader clarify the importance of
more with endurance and discipline than talent. an accurate writing style.
The good news is that scientific writing can be Furthermore, scientific papers are typically
learned as it follows a well-defined structure and written in passive voice and past tense.
writing process.
Writing a scientific article is not like writing a
novel. When writing a good novel, the author ide- Keep in mind:
ally is in a creative process and tells a compre- No one is a born master! Scientific writing
hensive story using numerous metaphors. A novel can be learned. Endurance and discipline
are crucial to improve your skills!
L. B. Moser · M. T. Hirschmann (*)
Department of Orthopaedic Surgery and
Traumatology, Kantonsspital Baselland (Bruderholz,
Liestal, Laufen), Bruderholz, Switzerland Scientific article ≠ novel
University of Basel, Basel, Switzerland
e-mail: michael.hirschmann@ksbl.ch,
michael.hirschmann@unibas.ch

© ISAKOS 2019 561


V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_54
562 L. B. Moser and M. T. Hirschmann

will need to refine and revise your manuscript


Accuracy and clarity are keys for scientific several times. This takes a considerable amount of
writing! your endurance.

A scientific article is typically structured as Checklist Structure


the following: • Title
• Abstract
1. A clear and brief title summarizing the major • Introduction
content and findings. • Methods
2. A concise, structured, or unstructured • Results
abstract serving as a comprehensive sum- • Discussion
mary of your study. • Conclusion
3. The necessary background information to
understand the purpose and hypothesis of the
study. Before getting started, you need to complete a
4. A concise but comprehensive description of detailed review of the current literature with
how the study was conducted. regards to your topic. When you are not famil-
5. The exact and complete but condensed iar with the topic yourself, which is often the
results obtained. case being a junior researcher, it pays off to
6. A proper yet concise discussion putting the thoroughly review the topic of your paper. As a
findings into the context of current literature more advanced researcher, the topic and the
and future research. papers published are often well known, which
7. A brief but comprehensive conclusion giving facilitates writing of the introduction. In addi-
the clinical relevance of your results. tion, it makes you aware of the preexisting
8. Tables should significantly add to the article studies and similarities to your study. Then you
and not duplicate the results. might be able to differentiate your article from
9. Figures should illustrate your results and this group.
methods and only be included if highly valu- Starting with article writing is difficult, and it
able for the reader. might be advisable to start with an outline.
10. References need to be up to date and format- One should not worry about the correct syn-
ted with regards to the instructions for tax, grammar, or language. No one except you
authors [20]. will ever read this draft. After having finished
your first draft of the manuscript, put it away
This chapter aims to give guidance for young for days or weeks and then revise again. Do not
residents aiming to publish their first scientific feel too attached to your writing, critically
articles. It is our purpose to guide you through the challenge the content of each sentence and
process of writing a scientific article in a case-­ paragraph, and meticulously revise it. Another
based approach. After having read the chapter, recommendation is that you show your paper
the reader should have a guide to start writing draft to a good colleague or collaborator not
such articles and enjoy the process of scientific being previously involved in the project.
writing. Ideally, this person is neither attached to the
topic nor the project itself allowing for an unbi-
ased review.
54.2 General Comments Writing a scientific article follows a clear
structure. It gives you as an author less freedom
Firstly, one should not be intimidated by the blank as to what and how you write it, but this also
piece of paper in front of you. You have to accept appears to be a chance for the beginner to excel.
that the first draft is not aiming for perfection. You Clearly, this helps you in getting started [8].
54  How to Write a Scientific Article 563

Generally, a scientific paper is structured but comprehensive summary of your study and
following the IMRAD acronym. IMRAD most important findings. Letchford et al. investi-
stands for Introduction, Methods, Results, and gated the benefits of having a shorter title and
Discussion [16]. found that journals publishing papers with shorter
After the preparation of a proper study proto- titles are more frequently cited [13].
col (see Chap. 8), one should start the writing Jargon or colloquial wording has no place in
process with the Introduction and Methods fol- the title. Do not use abbreviations or acronyms in
lowed by the Results and Discussion. The the title. Never pose a question or use exclama-
Introduction and Methods can be written after tion marks in the title. For an easier identification
having finished the study protocol and might in search engines, it is recommended to use
need to be adapted later on [16]. words indexed as Medical Subject Headings
(MeSH) in your title [1].

First step  =  detailed review of current


literature 54.3.1 Practical Case Example

Prospective randomized controlled study investi-


Start with a draft! Forget about correct syn- gating patients with stiffness after isolated early
tax, grammar, or perfect language initially! versus late ACL reconstruction.

54.3.2 Poor Title


Introduction
Methods What is the difference between early versus late
Results ACL reconstruction?—A single-center RCT.
And
Discussion

Use the CONSORT Statement when per-


forming a prospective randomized con-
For a clinical paper dealing with a prospective
trolled trial!
randomized controlled trial, the Consolidated
Standards of Reporting Trials (CONSORT)
should be used. It is an evidence-­based, minimum
set of recommendations for reporting randomized
Checklist Title
trials. It offers a standard way for authors to pre-
• Brief and concise
pare reports of trial findings, facilitating their
• Comprehensive summary
complete and transparent reporting, and aiding
• Important findings
their critical appraisal and interpretation. The
• No jargon
CONSORT Statement consists of a 25-item
• No abbreviations
checklist and a flow diagram. The checklist items
• No acronyms
focus on reporting how the trial was designed,
• No question
analyzed, and interpreted. The flow chart shows
• No exclamation marks
the screening and enrollment processes [19].

54.3 How to Write a Good Title 54.3.3 Good Title

Although many authors believe that a good title is Early isolated ACL reconstruction shows
found easily, the opposite is true. A good title is increased risk of arthrofibrosis—a prospective
brief and concise. However, it should give a brief randomized controlled study.
564 L. B. Moser and M. T. Hirschmann

54.4 H
 ow to Write a Good The tibial and femoral components were graded
Abstract as internally rotated, neutral rotation, and exter-
nally rotated.
Together with the title, the abstract is often the
first and only part read by other researchers.
Hence, writing of a good abstract directly Checklist Abstract
influences the penetration and scientific power • Comprehensive summary
of your study. It decides if your study is • Avoid abbreviations
acknowledged and cited by your research • No passive voice
colleagues. • Sample size if percentages are reported
However, in practice often only limited time is • Effect size with confidence intervals
spent for the preparation of a good abstract. • Abstract can be read independently
Although the abstract is the last written part of a from the main text
scientific article, in a last effort, one should take • Purpose/introduction/background:
enough time to prepare it. What is known? Why is this study
The abstract should be a comprehensive sum- needed?
mary of the scientific article. Therefore, you • Methods: What did I do?
should carefully check the instructions for • Results: What did I find?
authors of your target journal before submitting • Discussion: What does it mean?
the abstract.
Most journals require rather structured than
unstructured formatting. Words are generally Results: Two groups were investigated—
limited to 150–350 words highlighting the impor- patients who underwent a medial parapatellar
tance of being brief and concise [1]. approach (MPA) and a lateral parapatellar
approach using a tibial tubercle osteotomy (LPA).
Means of tibial component rotation were 2.7323°
54.4.1 Practical Case Example ER ± 6.12323 (MPA) and 7.62323° ER ± 5.4232
(LPA). Patients of group LPA presented a signifi-
Prospective study using 3D-CT investigating the cantly less internally rotated (LPA, 18.43%;
influence of the approach in total knee arthro- MPA, 48.83%) and more externally rotated (LPA,
plasty (TKA) on TKA component rotation. 52.63%; MPA, 22.83%) tibial component
(p < 0.001). No significant differences were seen
for the femoral component position, tibial valgus/
54.4.2 Poor Abstract varus, and tibial slope.
Conclusion: Patients of group LPA presented
Purpose: TKA is a successful treatment option a significantly less internally rotated and more
for end-stage osteoarthritis of the knee. Outcome externally rotated tibial component. It appears
after TKA is influenced by numerous surgery and that a LPA tends to externally rotate the tibial
patient related factors. The purpose of this study TKA component.
was to investigate which factors influence TKA
position.
Methods: This study included 200 patients 54.4.3 Good Abstract
after TKA using either a parapatellar medial or
parapatellar lateral approach with tibial tubercle Purpose: The purpose of this study was to inves-
osteotomy. TKA components’ position and the tigate if the type of approach [medial parapatellar
whole leg axis were assessed on 3D reconstructed approach (MPA) versus lateral parapatellar
CT scans (sagittal, coronal, and rotational). Mean approach with tibial tubercle osteotomy (LPA)]
values of TKA component position and the whole influences rotation of femoral and/or tibial
leg alignment of both groups were compared. component in total knee arthroplasty (TKA). It
54  How to Write a Scientific Article 565

was the hypothesis that MPA leads to an inter- questions: What is the paper about? Why is it
nally rotated tibial TKA component. worth being read and published? Arrange your
Methods: This study included 200 consecutive paper from basic to more complex. You should
patients in whom TKA was performed using start with a paragraph giving very basic informa-
either a parapatellar medial (n  =  162, MPA) or tion on the background of your topic. Imagine the
parapatellar lateral approach with tibial tubercle structure of the introduction as a funnel. The
osteotomy (n = 38, LPA). All patients underwent background information represents the broadest
clinical follow-up, standardized radiographs, and part at the top, which is narrowing down to the
computed radiography (CT). TKA components’ specific information of your research topic [2].
position and the whole leg axis were assessed on However, this first paragraph should not start
3D reconstructed CT scans (sagittal, coronal, and from “Adam and Eve” but furthermore jump
rotational). Mean values of TKA component posi- more directly into the topic covered.
tion and the whole leg alignment of both groups
were compared using a t-test. The tibial compo-
nent was graded as internally rotated (<3° of 54.5.1 Practical Case Example
external rotation (ER)), neutral rotation (equal or
between 3° and 6° of ER), and externally rotated Prospective randomized controlled study investi-
(>6° ER). The femoral component was graded as gating patients with stiffness after isolated early
internally rotated [>3° of internal rotation (IR)], and late ACL reconstruction.
neutral rotation (equal or between −3° IR and 3°
of ER), and externally rotated (>3° ER). Checklist Introduction
Results: There was no significant difference in • Background information from basic to
terms of whole leg axis after TKA between both complex
groups (MPA, 0.2° valgus ±3.4; LPA, 0.0° valgus • What is known and what is unknown in
±3.5). Means of tibial component rotation were this specific topic?
2.7° ER ± 6.1 (MPA) and 7.6° ER ± 5.4 (LPA). • Why is the study needed?
Patients of group LPA presented a significantly • Hypothesis (what did we want to know?)
less internally rotated (LPA, 18.4%; MPA, 48.8%) • Study aim (how did we answer the
and more externally rotated (LPA, 52.6%; MPA, research question?)
22.8%) tibial component (p < 0.001). No signifi-
cant differences were seen for the femoral compo-
nent position, tibial valgus/varus, and tibial slope.
Conclusion: The type of approach (medial ver- Don’t lose the red thread of your
sus lateral) significantly influenced tibial TKA introduction!
component rotation. It appears that a MPA tends to
internally rotate the tibial TKA component and a
54.5.2 Poor Introduction
LPA tends to externally rotate the tibial TKA. The
anterior cortex should not be used as landmark for
“ACL tears are common injuries. ACL recon-
tibial TKA component placement when using the
struction is the most frequently performed
lateral approach with tibial tubercle osteotomy.
­surgery in orthopedics. There are many surgical
methods described. The following ACL grafts
can be used. The purpose of this study was…”.
54.5 H
 ow to Write a Good
Introduction
54.5.3 Good Introduction
This is the part in which you introduce the topic
to your reader. Typically, it consists of one page An unsolved problem in patients suffering from
and guides the reader into the topic of your arti- ACL insufficiency is the timing of ACL reconstruc-
cle. The introduction should answer the two key tion. [Then describe what is known about it!]
566 L. B. Moser and M. T. Hirschmann

Table 54.1  What you write in your introduction and published. Therefore, you need to perform a
what the reader probably understands [12] thorough literature review.
Introduction 7. End with a clear hypothesis and purpose of
What the reader your study. The more clear and easy it is, the
What you write… understands…
better it is. Just one or maximum two study
It has been known that… I haven’t been bothered to
It is well known… look up the original questions are optimal.
To the best of our reference but… 8. Clearly separate the major from the minor
knowledge… research question.
Of great theoretical and Interesting to me… 9. Leave comparison with other studies for dis-
practical importance…
cussion [2, 20].
While it has not been The experiment didn’t work
possible to provide out, but I figured I could at
definite answers to these least get a publication out
questions… of it… 54.6 H
 ow to Write a Good
Future studies need toa Methods Section
clarify the meaning of
these results…
This part of a scientific article could be written
after having finished your study protocol (see
Stiffness is an associated problem with early ACL
Chap. 8). However, it might need to be adjusted
reconstruction… It was our hypothesis that…
once the study is finished.
[Finally end with “The purpose of this study was...”].
Proceed with explaining what is already
known, and describe open questions with regard Ask yourself: “Will my introduction sell my
to this topic. The reader should be able to follow paper to readers, reviewers, and editors?”
the red thread of your introduction.
Like in a good novel, you should guide the reader
from basic to complex and climax with your study
hypothesis and purpose of the study undertaken. Checklist Methods Section
The last paragraph should include a clear • Study design
description of your hypothesis and purpose of • Setting and subjects
your study. This part needs to answer the ques- • Data collection
tion why you undertook this study and which • Data analyses
open research question should be answered. • Ethical approval
Please consider the following tips for writing
a good introduction (Table 54.1):
The methods section should meticulously
1. Perform a detailed and thorough review of describe the study design and ideally serve as an
current literature. instruction for the readership to redo the study.
2. Do not include unnecessary background infor- Consider your research study as a specific dish
mation! Do not give well-known information! and your methods section as its recipe. You need
(“Do not start from Adam and Eve!”)! to list all ingredients and give a detailed descrip-
3. Keep it as short as possible and as long as tion of the cooking process. Only then the dish
needed. Be brief and concise! can be prepared repetitively with a reproducible
4. Do not overstate the clinical value and impor- result [9].
tance of your work! Try to create a clear story line. The methods
5. Ask yourself what you need to know to under- section links the introduction with the results.
stand the research topic and purpose of your Hence, it should be structured from basic to more
study. Try to obtain the reader’s perspective. complex. Generally, one should start with the
6. Explain to the reader what is novel in your study design. Here, the authors need to clarify the
study, which makes it worth being read or type of study done (Table 54.2).
Table 54.2  Possible study types for a scientific article [18]

Study type

Primary research Secondary research

Basic research
54  How to Write a Scientific Article

(Experimental research) Clinical research Epidemiological research Meta-analysis Review

Theoretical Applied Experimental Observational Experimental Observational


Systematic
(Interventional) (Noninterventional) (Interventional) (Noninterventional)
Method Animal study Intervention Cohort stury
development study Simple
Clinical Study Therapy study
(narrative)
Analytical Cell study Field study Prospective
measurement Phase I study Prognostic study
procedure
Group study Historical
Genetic Phase II study Diagnostic study
Imaging
engeineering
procedure
Gene Phase III study Observational study
sequencing with drugs Biometric Case control study
Biometric procedure
Biochemistry
procedure
Phase IV study Secondary data Test Cross-sectional
Test Material analysis development study
development development Assessment
Assessment Case series procedure
procedure
Genetic studies Single case report Monitoring,
Surveillance
Description with
registry data
567
568 L. B. Moser and M. T. Hirschmann

Then, you need to proceed with a clear Poor description of inclusion and exclusion
description of your study sample. The study sam- criteria: A consecutive series of patients who
ple description should include the exact number underwent a computed tomography (CT) after
of patients or subjects included. With regards to primary TKA as clinical routine follow-up or
patients’ basic demographics mean age  ±  stan- because of knee pain in a university-affiliated
dard deviation, gender, body mass index, align- hospital were analyzed. Patients were divided
ment information and other important variables into two groups with regard to the used surgical
should be given here [9]. approach.
Good description of inclusion and exclusion
criteria: A consecutive series of 200 patients who
54.6.1 Practical Case Example 1 underwent a computed tomography (CT) after
primary TKA from 2013 to 2016 as clinical rou-
Prospective study using 3D-CT investigating the tine follow-up or because of knee pain in a
influence of the approach in total knee arthro- university-­affiliated hospital were prospectively
plasty (TKA) on TKA component rotation. collected and retrospectively analyzed. Indication
Poor description of study sample: A consecu- for TKA was end-stage osteoarthritis. Only pri-
tive series of patients who underwent a computed mary TKA were included. Exclusion criteria
tomography (CT) after primary TKA as clinical were any history of infection, tumor. A team of
routine follow-up or because of knee pain in a two senior surgeons performed the surgeries
university-affiliated hospital were analyzed. using either cruciate retaining or posterior stabi-
Patients were divided into two groups with regard lized TKA. The decision to perform a CR or PS
to the used surgical approach. TKA was based on the integrity of the posterior
Good description of study sample: A consecutive cruciate ligament and was done independently
series of 200 patients who underwent computed from alignment (varus versus valgus knees).
tomography (CT) after primary TKA from 2013 to Patients were divided into two groups with regard
2016 as clinical routine follow-up or because of knee to the used surgical approach.
pain in a university-affiliated hospital were prospec- As a general rule, this part should be suffi-
tively collected and retrospectively analyzed. ciently detailed so that an independent researcher
Patients were divided into two groups with regard to could reproduce the results and thereby validate
the used surgical approach. Group A (lateral peri- the study findings [20].
patellar approach (LPA)) included 38 patients (male/ A common error of many authors is to describe
female  =  14:24; 67.5  ±  10.4  years), and Group B the study sample in the results section, but it
(medial parapatellar approach (MPA)) included 162 clearly belongs to material and methods and
patients (male/female = 56:106; 67.2 ± 9.8 years). should be given here. This is also true for review
A clear flow chart should allow the reader to papers [9].
understand how many patients were screened, The description of the study sample is fol-
how many were excluded, and how many finally lowed by a detailed description of tests and
included for this study. It is important to give experiments done for experimental studies and
exact numbers here. Along with an exact descrip- description of outcome instruments used for clin-
tion of inclusion and exclusion criteria, the article ical studies. In the case a novel methodology is
needs to allow the reader to judge a possible bias applied, a more detailed description is required.
in patient selection. If standard methods are applied such as well-­
established outcome instruments, it is only nec-
essary to refer to these [20].
54.6.2 Practical Case Example 2 For all measurements done, inter- and intra-­
observer reliability needs to be tested and pre-
Prospective study using 3D-CT investigating the sented. All measurements, in particular
influence of the approach in total knee arthro- measurements on any image such as radiographs,
plasty (TKA) on TKA component rotation. CT, MRI, or other imaging modalities, should be
54  How to Write a Scientific Article 569

done by at least two independent blinded observ- Table 54.3  What you write in your methods section and
ers twice with an interval of 6 weeks [7]. what the reader probably understands [12]
Inter- and intra-observer reliability as well as Methods
accuracy values need to be presented. This could What the reader
What you write… understands…
be done, for example, as intra-class correlation
Three of the samples were The results on the others
coefficients (ICCs), kappa values, or Bland-­ chosen for detailed study… didn’t make sense…
Altman plots [21]. Accidentally strained during Dropped on the floor…
A statement that ethical approval was obtained mounting…
from the local ethical committee or institutional Handled with extreme care Not dropped on the
review board approval should be included. An throughout the floor…
experiment…
increasing number of journals require a docu-
ment showing ethical approval to be included in
your submission. For clinical studies also state report if this was done and the result. Every sta-
that informed consent was obtained by each tistical test used needs to be mentioned here. The
patient or subject in the study [20]. level of statistical significance needs to be
Some journals require a statement that the reported as p-value. Generally, it is considered to
study was done in agreement with the ethical be p < 0.05 [6].
standards of the institutional and/or national A sample size calculation needs to be pre-
research committee and with the 1964 Helsinki sented in all clinical studies. The sample size is
declaration and its later amendments or compa- an estimation of the number of subjects required
rable ethical standards [17]. to detect a significant difference in a study. If the
sample size is too small, then a true difference
might turn out nonsignificant although being sig-
Describe your study sample as detailed as nificant. If the sample size is too large, this would
possible! mean that you unnecessarily waste scientific
resources, and it might also make even small dif-
ference significant (Table 54.3) [5].
Standard methods should be referred to!
Only describe methods which are also pre-
sented in the results!

Ethical approval is required for every sub-


mission in a peer-reviewed journal!
Report your statistical tests used, and
explain which test was used to compare
The final paragraph should consist of a proper which measure with what!
and complete description of the statistical meth-
odology used [20].
First you have to state which and how the Clinical studies need a sample size
data was presented. For example, “Continuous calculation!
variables were described using means, standard
deviations, and medians. Categorical variables
were tabulated with absolute and relative
frequencies.” 54.7 H
 ow to Write a Good Results
Another relevant part of the statistical meth- Section
odology is the exact description of tests used. It is
important to differentiate parametric from non- This section should be brief and concise.
parametric tests. To use parametric tests, the data Results are the only thing shown here. It should
needs to be tested for normality. Please also not contain any description of methodology. Do
570 L. B. Moser and M. T. Hirschmann

Table 54.4  What you write in your results section and


what the reader probably understands [12] Checklist Result
Results Brief and concise
What the reader Past tense
What you write… understands…
Typical results are shown… The best results are
Presentation without interpretation
shown… Match this section with methods section
Agreement with the Fair Use table and figures to highlight findings
predicted curve is: Poor
Excellent Doubtful
Good Imaginary
Satisfactory Poor results section: A decrease of BTU in the
Fair medial subchondral zones after HTO was found.
BTU normalized in all asymptomatic patients. A
not interpret your results; simply describe what decrease of BTU was partly seen in the lateral
you have found. Please organize your results in compartments, but the decrease was significantly
the same logical order and structure as previ- higher in the de-loaded medial tibial and femoral
ously reported in material and methods. This joint compartment. The achieved average valgus
makes it better accessible for the reader correction of the tibiofemoral angle by HTO was
(Table 54.4) [10]. 5.9° ± 2.8. There were no adverse events such as
pseudoarthrosis, infection, loss of correction, or
skin necrosis. The mean WOMAC score pain was
54.7.1 Practical Case Example 6.2 ± 5.6, WOMAC stiffness was 2.8 ± 2.4, and
the WOMAC daily activities (0–68) was
Prospective study investigating the outcome and 17.4 ± 16.4. The mean total score was 25.4 ± 22.00
bone tracer uptake (BTU) in SPECT/CT (single-­ after HTO.  Less stiffness with regard to the
photon emission computerized tomography in WOMAC score correlated significantly with
combination with conventional CT) after high SPECT/CT BTU.  Higher postoperative bone
tibial osteotomy (HTO) due to symptomatic tracer uptake significantly correlated with more
varus malalignment. pain. Interestingly, no statistical significant asso-
It was the hypothesis that BTU after HTO ciations between SPECT/CT BTU and alignment
decreases in the medial compartment and clinical correction by HTO were found.
outcome and the degree of correction correlates Good results section: A significant decrease of
with BTU and asymptomatic patients after HTO BTU in the medial subchondral zones after HTO
reveals a significantly decreased BTU in the was found from preoperatively to 12 and
medial subchondral bone. 24 months follow-up (p < 0.01). BTU normalized
Twenty-two consecutive patients with 23 in all asymptomatic patients within 24  months.
knees undergoing medial opening-wedge HTO The normalized grading of BTU in SPECT/CT
for medial compartment overloading were for each anatomical area of the localization
assessed pre- and postoperatively (12 and/or scheme is presented in Table 1 (values represent
24  months) using Tc-99  m-HDP-SPECT/CT the difference between preoperative and postop-
including our 4D-SPECT/CT protocol. BTU was erative measurements). A decrease of BTU was
quantified and localized to specific biomechani- partly seen in the lateral compartments, but the
cally relevant joint areas. Maximum absolute and decrease was significantly higher in the de-loaded
relative values (mean  ±  standard deviation, medial tibial and femoral joint compartment
median, and range) for each area were recorded. (p < 0.0001, Fig. 4). The achieved average valgus
Pre- and postoperative mechanical alignments correction of the tibiofemoral angle by HTO was
were measured. At 24  months after HTO, the 5.9° ± 2.8. The mean WOMAC score pain (0–20)
WOMAC score was used. was 6.2  ±  5.6, WOMAC stiffness (0–8) was
54  How to Write a Scientific Article 571

2.8  ±  2.4, and the WOMAC daily activities However, a pure repetition of your results should
(0–68) was 17.4  ±  16.4. The mean total score be avoided. In the further course of the discussion,
(0–96) was 25.4 ± 22.00 after HTO (Fig. 5). Less the current evidence with regard to the study
stiffness with regard to the WOMAC score cor- question needs to be discussed. One common pit-
related significantly with a higher decrease in fall of the discussion part is to present a review of
SPECT/CT BTU (p < 0.05). a considerable number of published studies lack-
Higher postoperative bone tracer uptake sig- ing the interpretation of these papers with regard
nificantly correlated with more pain (p < 0.05).
A Spearman correlation analysis revealed no
statistical significant associations between Checklist Discussion
SPECT/CT BTU and alignment correction by • Summary of main findings
HTO. • Comparison and interpretation with
It should be clear that all data is reported and current literature
not only the data supporting your hypothesis. Do • Strength and limitations
not report data which has not been mentioned in
material and methods. No referencing is allowed
in the results section. to your study question.
Sometimes it appears difficult to decide what Finally, the strength and more importantly the
exactly is considered a result. For example, if limitations of your study need to be discussed in
inter- and intra-observer variability is tested for detail [3].
measurements on radiographs for validation of
your measurement method, however, it is not the
main study question but the radiological results, 54.8.1 Practical Case Example
then the measurements of inter- and intra-­
observer reliability should be reported not in One year clinical and MR imaging outcome after
results but in material and methods. partial synthetic meniscal replacement in stabi-
Figure, tables, and graphs might help to make lized knees using a collagen meniscus implant.
it better understandable to the reader. Do not Poor discussion: The purpose of the present
duplicate results in text and figures, tables, or study was to evaluate the clinical and radiological
graphs. Often not much supporting text is needed outcomes of patients who underwent a medial or
here [10]. lateral collagen meniscus implantation. We
hypothesized that good functional results with
maintained sport capacity and activity level could
54.8 H
 ow to Write a Good be achieved. We further hypothesized that MRI
Discussion would show no changes in size and signal inten-
sity of the CMI over time. The median preinjury
The discussion part should put your results into Tegner score was 7 (range 2–10); it decreased
the context of the current literature. In contrast to preoperatively to 3 (range 0–123). At 1-year fol-
the previous parts of a scientific article, here the low-­up, the median Tegner score was 6 (range
findings need to be explained, interpreted, and 2–10). The mean Lysholm score before surgery
debated. was 68 ± 20 and 93 ± 9 at 1-year follow-up. The
The key questions to be answered are: What is mean flexion and extension ±125 standard devia-
similar and what is different to other studies pub- tion at 1-year follow-up was 140°  ±  5° and
lished? How do your findings help to answer the 5° ± 1°, respectively. Meniscal substitution with
study question posed? Generally, the discussion the collagen meniscal implant showed excellent
should start with a sentence such as “The most clinical 1-year results in a highly active patient
important findings of the present study were...”. group. Significant pain relief and functional
572 L. B. Moser and M. T. Hirschmann

improvement throughout all scores were noted at a Table 54.5  What you write in your discussion and what
minimum of 1-year follow-up. The collagen menis- the reader probably understands [12]
cus implant undergoes significant remodeling, Discussion
degradation, and extrusion in most of the patients. What the reader
What you write… understands…
No difference in outcomes between the medial
It is generally believed A couple of other fellows
and lateral CMI was observed. that… think so too.
Good discussion (shortened): The most Correct within an order of Wrong!
important findings of this study were twofold: magnitude
Firstly, the meniscal substitution with the colla- It is clear that much I don’t understand it!
gen meniscal implant (CMI) showed excellent additional work will be The results did not turn
required before a complete out to be good enough to
clinical 1-year results in this large patient series understanding… make us understand the
in which most patients also underwent ACL Future studies are needed… problem
reconstruction. Significant pain relief and func-
tional improvement throughout all scores were Another weakness is that the patients included
noted at a minimum of 1-year follow-up. also underwent a variety of concomitant surger-
A variety of studies have been performed ies. In this study, the majority of patients under-
investigating the early to longer-term clinical went CMI due to prophylactic reasons, which
experience using CMI [1–8]. In agreement with currently could be considered a dubious indica-
the present study, most of the authors reported tion. This fact might have influenced the results of
that mean Tegner activity and Lysholm scores as the study, although this was not shown statisti-
well as pain values significantly improved from cally (Table 54.5).
pre- to postoperatively [5, 6, 9]. There was no sig-
nificant clinical difference whether a medial or
54.9 H
 ow to Write a Good
lateral CMI was implanted. Monllau et al. investi-
Conclusion
gated 25 non-consecutive patients at minimum
10  years after CMI implantation [2]. They
The conclusion should be not more than one or
included almost the same number of patients
two sentences long and provide the reader with a
undergoing an ACL reconstruction and found sig-
summary of results and discussion. In particular,
nificantly improved Lysholm and VAS pain scores
the clinical implications of the findings should be
at 1-year follow-up.
highlighted. Please explain why the study is sig-
Secondly, with regard to the MRI results, the
nificant to the research world. This part should
CMI undergoes significant remodeling, degrada-
provide the key message you want to convey with
tion, and extrusion. The new meniscus tissue was
regard to your study. In addition, it is also possi-
well integrated. The size of the meniscus tissue
ble to provide an outlook of future research direc-
was reduced when compared to the normal menis-
tion. However, do not just write further
cus which could be only partially explained by
investigation is needed or do not announce future
compressive joint loading forces. However, these
studies that might not be completed [3].
findings are in agreement with Monllau et al. who
found less meniscus volume than expected at
minimum 10 years after CMI implantation [2]. 54.9.1 Practical Case Example
One major limitation of this study is the lack
of a control group. This study only presents the A prospective, longitudinal, single-cohort study
clinical and radiological outcomes of a consecu- investigating the correlation of depression, con-
tive series of patients undergoing collagen menis- trol beliefs, anxiety, and a variety of other psy-
cus implantation in stable and unstable knees. chological factors with outcomes of patients
However, it is one of the biggest case series of undergoing total knee arthroplasty (TKA) in 104
patients showing clinical and MRI results 1 year consecutive patients.
after CMI.
54  How to Write a Scientific Article 573

should be referred to in the text and numbered


Checklist Conclusion in order of their citation. Avoid any disagree-
• 1–2 sentences ment of table data and information given in the
• Summary of results and discussion text [11]!
• Clinical Implications
• Why is my study significant?
54.10.1  Practical Case Example

Poor conclusion: Self-efficacy did not influence (See Tables 54.6 and 54.7).
clinical scores. More depressed patients showed
higher pre- and postoperative WOMAC scores,
but no difference in amelioration. 54.11 Tips and Tricks for Figures
Good conclusion: Depression, anxiety, a ten-
dency to somatize, and psychological distress Always consider illustrating your article with
were identified as significant predictors for high-quality figures, which include photo-
poorer clinical outcomes before and/or after graphs or drawings. When including any fig-
TKA.  Standardized preoperative screening and ures, you should ask yourself the question if
subsequent treatment should become part of the this figure really adds to the article. Often one
preoperative work-up in orthopedic practice. is too attached to photographs taken to
have an objective view on this matter. It might
be beneficial to get a different independent
54.10 Tips and Tricks for Tables perspective.
The number of figures accepted is dependent
Tables allow to present a great amount of data from the journal. You will find the actual limits
in a listed way. It is in particular beneficial for along with resolution and image quality require-
large datasets, which can hardly be described in ments in instructions for authors of each journal.
narrative form. It helps to keep the results sec- Often figures need to be recut or enlarged to high-
tion concise and brief. Ask a colleague of yours light the important part of the figure used. For
to check if your table is self-explanatory. Tables such purpose, arrows can also be used.

Table 54.6  Bad example of a table included in a scientific article. Clinical outcome scoring preoperatively, 6 weeks, 4
months and 1 year after surgery
Clinical scoring Pre-operative p 6 weeks p.o. p 4 months p.o. p 1-year p.o. Overall p
WOMAC pain 52 ± 24 49 *** 25 ± 14 22 *** 14 ± 16 9 n.s. 13 ± 17 6 ***
WOMAC stiffness 54 ± 26 50 *** 36 ± 20 30 *** 22 ± 20 15 n.s. 20 ± 20 15 ***
WOMAC function 48 ± 20 44 *** 27 ± 15 22 *** 16 ± 15 12 n.s. 16 ± 18 9 ***
n.s., p > 0.05. *p < 0.05, **p < 0.01, ***p < 0.001

Table 54.7  Good example of a table included in a scientific article. Clinical outcome scoring preoperatively, 6 weeks,
4 months and 1 year after surgery
Preoperative 6 weeks p.o. 4 months p.o. 1-year p.o.
Clinical scoring M ± SD Md p M ± SD Md p M ± SD Md p M ± SD Md Overall p
WOMAC pain 52 ± 24 49 *** 25 ± 14 22 *** 14 ± 16 9 n.s. 13 ± 17 6 ***
WOMAC stiffness 54 ± 26 50 *** 36 ± 20 30 *** 22 ± 20 15 n.s. 20 ± 20 15 ***
WOMAC function 48 ± 20 44 *** 27 ± 15 22 *** 16 ± 15 12 n.s. 16 ± 18 9 ***
Mean  ±  standard deviation (M  ±  SD), Median (Md), p between measuring points (Wilcoxon-test), and overall p
(Friedman-test)
n.s., p > 0.05, *p < 0.05, **p < 0.01, ***p < 0.001
574 L. B. Moser and M. T. Hirschmann

Finally, it is also important to refer to the 54.11.1—Practical Case Example


image within your text [11].
(See Figs. 54.1, 54.2, 54.3, 54.4, 54.5, 54.6, 54.7
Checklist Tables
and 54.8).
• Title reflects content
• Title at the top
• Self-explanatory 54.12 T
 ips and Tricks for Reference
• Clear and easy to read Section
• Use correct order
• Check if you have followed the instruc- Formatting of references needs to meticulously
tion for author follow the individual instructions for authors of
each journal. Formatting errors do not increase
the confidence of the reviewer in the quality
and diligence of your work. Some journals even
Checklist Figures rapidly reject your work without the possibility
• Title reflects content of later resubmission. Hence, it is important to
• Title at the bottom follow exactly the instructions for authors.
• Use high-quality figures When using reference manager software such
• Does the figure add to the article? as EndNote, Reference Manager, or Mendeley,
• How many figures are accepted by the it is nevertheless important to check before sub-
journal? mission that all references are formatted
• Which file format is accepted? correctly.
• Are instructions for authors followed? One should consider the following principles:

Fig. 54.1  Bad example


of a figure included in a
scientific article
54  How to Write a Scientific Article 575

Fig. 54.2 Good R 10 mm native 10 mm R


example of a figure
included in a scientific
article [15] shaft shaft
P

F 1s 2s F

lat. med. 1i 2i ant. post.

1s 2s sa sp
1i 2i ia ip

1s 3s 2s 10mm 10mm sa sp
1p 3p 2p
1i 3i 2i ia ip
50mm 50mm
1a 3a 2a

T 4 60mm 60mm 4 T

shaft shaft

20 20

15
15
Max BTU

10 *
10 *
*
*
5

5
0

MaioRegen: affected non- affected non-


0 affected affected
Preop Postop
Fig. 54.3  Bad example of a figure included in a scientific
Fig. 54.4  Good example of a figure included in a scien-
article
tific article [15]
576 L. B. Moser and M. T. Hirschmann

1. Only use references for which you have read


the entire article. Please do not just read the
abstract.
2. Try to cite as less references as possible. If a
reference is only cited once it must be excep-
tional. Otherwise delete it.
3. Always cite the original article making the
statement you want to reference.
4. Check for duplicates and delete.
5. Please provide a reference for every statement
you make.
6. Please avoid overly self-referencing. Only cite
your own paper if these really contribute here.
Some authors tend to cite as many of their
own papers as possible.
7. Use the latest and updated references for your
article. Recheck just before submission of
article.
Fig. 54.5  Bad example of a figure included in a scientific
article

a b

Fig. 54.6  Good example of a figure included in a scientific article [15]


54  How to Write a Scientific Article 577

Fig. 54.7  Bad example


of a figure included in a a b
scientific article

a b

Fig. 54.8  Good example of a figure included in a scientific article [14]


578 L. B. Moser and M. T. Hirschmann

8. Try to cite relevant papers of the target journal. 5. Guller U, Oertli D. Sample size matters: a guide for
Editors appreciate it as an interest in their jour- surgeons. World J Surg. 2005;29:601–5.
6. Harris JD, Brand JC, Cote MP, Faucett SC, Dhawan
nal, and it might improve citation scores [4]. A. Research pearls: the significance of statistics and
perils of pooling. Part 1: clinical versus statistical sig-
nificance. Arthroscopy. 2017;33:1102–12.
Checklist References 7. Hassink G, Testa EA, Leumann A, Hugle T, Rasch
H, Hirschmann MT.  Intra- and inter-observer reli-
• Use reference manager software ability of a new standardized diagnostic method
• Use the requested output style using SPECT/CT in patients with osteochondral
• Cite the original reference lesions of the ankle joint. BMC Med Imaging.
• Recheck the final reference list 2016;16:67.
8. Kotz D, Cals JW.  Effective writing and publishing
scientific papers—part I: how to get started. J Clin
Epidemiol. 2013;66:397.
9. Kotz D, Cals JW.  Effective writing and publishing
Take-Home Message scientific papers, part IV: methods. J Clin Epidemiol.
• Writing the first scientific paper appears to be 2013;66:817.
a difficult step to master. Nevertheless, scien- 10. Kotz D, Cals JW.  Effective writing and publishing
scientific papers, part V: results. J Clin Epidemiol.
tific writing follows a well-defined structure. 2013;66:945.
Your first step is to complete a detailed review 11. Kotz D, Cals JW.  Effective writing and publishing
of current literature with regard to your topic. scientific papers, part VII: tables and figures. J Clin
Afterward it is advisable to start with an out- Epidemiol. 2013;66:1197.
12. Lacroix JR. A key to scientific research literature. Can
line. Now you can benefit from your previ- Med Assoc J. 1971;104:1080.
ously written comprehensive study protocol. 13. Letchford A, Moat HS, Preis T.  The advantage of
• Accurate, clear, and unambiguous expression short paper titles. R Soc Open Sci. 2015;2:150266.
of findings is crucial and improves the quality 14. Mathis DT, Hirschmann A, Falkowski AL, Kiekara T,
Amsler F, Rasch H, et al. Increased bone tracer uptake
of your manuscript. Even though you have in symptomatic patients with ACL graft insufficiency:
exciting and original results, your manuscript a correlation of MRI and SPECT/CT findings. Knee
may not be accepted for publication in a peer- Surg Sports Traumatol Arthrosc. 2018;26(2):563–73.
reviewed journal if the presentation and illus- https://doi.org/10.1007/s00167-017-4588-5.
15. Mathis DT, Kaelin R, Rasch H, Arnold MP,

tration are of mediocre quality. Therefore, Hirschmann MT. Good clinical results but moderate
it is of utmost importance to acquire good osseointegration and defect filling of a cell-free multi-­
scientific writing skills for your research layered nano-composite scaffold for treatment of
armamentarium. osteochondral lesions of the knee. Knee Surg Sports
Traumatol Arthrosc. 2018;26(4):1273–80. https://doi.
• Finally, you need to be certain that you metic- org/10.1007/s00167-017-4638-z.
ulously follow the instructions stipulated by 16. Peh WC, Ng KH. Basic structure and types of scien-
your target journal prior to submission. tific papers. Singap Med J. 2008;49:522–5.
17. Rickham PP.  Human experimentation. Code of eth-
ics of the World Medical Association. Declaration of
Helsinki. Br Med J. 1964;2:177.
References 18. Rohrig B, du Prel JB, Wachtlin D, Blettner M. Types
of study in medical research: part 3 of a series on
1. Cals JW, Kotz D.  Effective writing and publishing evaluation of scientific publications. Dtsch Arztebl
scientific papers, part II: title and abstract. J Clin Int. 2009;106:262–8.
Epidemiol. 2013;66:585. 19. Schulz KF, Altman DG, Moher D, CONSORT Group.
2. Cals JW, Kotz D. Effective writing and publishing sci- CONSORT 2010 Statement: updated guidelines for
entific papers, part III: introduction. J Clin Epidemiol. reporting parallel group randomized trials. Open
2013;66:702. Med. 2010;4:e60–8.
3. Cals JW, Kotz D. Effective writing and publishing sci- 20. Vitse CL, Poland GA.  Writing a scientific paper—
entific papers, part VI: discussion. J Clin Epidemiol. a brief guide for new investigators. Vaccine.
2013;66:1064. 2017;35:722–8.
4. Cals JW, Kotz D. Effective writing and publishing sci- 21. Watson PF, Petrie A.  Method agreement analysis:

entific papers, part VIII: references. J Clin Epidemiol. a review of correct methodology. Theriogenology.
2013;66:1198. 2010;73:1167–79.
Common Mistakes in Manuscript
Writing and How to Avoid Them
55
Eleonor Svantesson, Eric Hamrin Senorski,
Kristian Samuelsson, and Jón Karlsson

55.1 Introduction hand, many researchers feel that final drafting of


the manuscript, before crossing the finish line,
Conducting a study of high quality is challenging feels more or less like climbing a mountain.
and requires both effort and discipline. It is a pro- Writing is something that may not come natural
cess that may take years. However, when the day for some researchers and clinicians. However,
finally arrives, the day when you have the results, with discipline and some straight-forward tools
you are of course eager to make them official and for writing, it may not be that complicated. And,
let them impact current practice and evidence- when you start to master the art, you might find
based medicine. that it was not as hard as you thought from the
There is just one thing; the manuscript needs beginning. And, in the end, you may even find
to be written. Considering all the work and time writing a manuscript fun.
you have invested in conducting the study, you
will likely feel ambitious about presenting your
work in the best possible way and to get it pub- 55.1.1 Dedication
lished in a high-impact journal [1]. On the other
The main key for success is to be dedicated to
your work. If you really want to become a suc-
cessful researcher, you need to be passionate
about your research topic and be prepared to
E. Svantesson (*) invest time and effort in your work. A high moti-
Department of Orthopaedics, Institute of Clinical
Sciences, The Sahlgrenska Academy, University of vation and an ability to put your goals in front of
Gothenburg, Gothenburg, Sweden you can help you defeat even the hardest strug-
E. H. Senorski gles. To be dedicated also means that you are
Department of Health and Rehabilitation, Institute of willing to learn and are able to acknowledge your
Neuroscience and Physiology, The Sahlgrenska shortcomings. Many researchers before you have
Academy, University of Gothenburg, experienced exactly the same struggle that you
Gothenburg, Sweden
might be feeling. You should therefore view every
K. Samuelsson · J. Karlsson challenge as an opportunity to learn from these
Department of Orthopaedics, Institute of Clinical experienced researchers. It is time to create new
Sciences, The Sahlgrenska Academy, University of
Gothenburg, Gothenburg, Sweden ground. You are now in a position of writing the
manuscripts which will form the textbooks used
Department of Orthopaedics, Sahlgrenska University
Hospital, Mölndal, Sweden by your future colleagues. Mentors that can share
e-mail: jon.karlsson@telia.com experiences and tips about how to prepare a

© ISAKOS 2019 579


V. Musahl et al. (eds.), Basic Methods Handbook for Clinical Orthopaedic Research,
https://doi.org/10.1007/978-3-662-58254-1_55
580 E. Svantesson et al.

­ anuscript are a highly valuable asset when aim-


m 2. There is no clear line of argument. An article
ing to develop a good skill for writing. should be enjoyable to read. The flow of your
Nevertheless, it is up to you whether you have an writing is crucial for this purpose. You need to
open mind and use this opportunity wisely. decide for yourself before writing what your
line of argument is and arrange your
argument(s) in a logical flow. Logical flow
55.2 Common Mistakes will increase the readability of the manuscript
and the chances of getting it published.
Even though your research may be of good qual- 3. Unnecessary repetitions and statements. The
ity, there are some common mistakes that might vast number of published articles entails that
increase the risk of your manuscript ending up in you could theoretically repeat tangentially
the “rejection box” instead of being published. related results and well-established facts to an
Interestingly, some of these mistakes may sound eternity. Again, aim to have your manuscript
obvious, but from an editor’s experience, the focused and concise. There is no need to
below listed mistakes keep getting repeated over repeat your own findings or findings from
and over again: topic-related literature all over again.
Sometimes it is necessary to assume that the
1. The manuscript is too comprehensive. A man- reader should already be aware of some basic
uscript should be short and concise. This is an knowledge of the area and instead leave room
area where a too high motivation to publish for the readers who are interested to make a
the manuscript actually may be your downfall. deepened review of the literature themselves.
Considering all your effort in conducting the Thus, use the opportunity to refer to other
study, it is understandable if you would like to studies wisely so that the reader could find
present every aspect of it, including all the further information if wanted, without pre-
data, all previous topic-related literature and senting the results of each study in the refer-
discuss all possible findings of your study. In ence list in detail.
other words, you might be tempted to write 4. Instructions to authors. A practical key is to read
everything you know. However, ask your- and follow “Instructions to authors”. The time it
self—what was really the main purpose and takes to read them is always well invested. Way
hypothesis of this study? What is new and too often, it is obvious that authors have not read
what are the most important findings of this the instructions. Another mistake, which is very
study? The production rate of research articles annoying to editors, is to resubmit a manuscript
today is extremely high, and to include too that has been rejected by another journal without
many results in a single article may cause the answering the raised comments, changing the
most important results to drown in text and format, or bothering to look at the different
data. Moreover, an excessive length of the instructions for the particular journal you have
manuscript may entail that fewer individuals now chosen. Fact Box 55.1 summarizes com-
will take time to sit down and read and reflect mon mistakes in manuscript writing.
over the findings. Therefore, choose your Each manuscript is comprised of several
focus points and stick to them. Sometimes essential sections that are worth a thorough
less is more. In fact, most articles are too long. review in order to present each section in the best
Most articles repeat information that is already possible way [2, 5]. Systematically writing each
well known, and this is hardly ever necessary. section of the manuscript usually facilitates writ-
It has been said that “…a manuscript should ing since it allows the author to feel how the man-
be as long as necessary, but as short as possi- uscript successively takes form under constant
ble…” and this is true. A manuscript should critical review. Therefore, let’s give some focus
never be so long that it is boring to read. on each one of these sections.
55  Common Mistakes in Manuscript Writing and How to Avoid Them 581

interest in the study. Therefore, take some time to


Fact Box 55.1: Common Mistakes in formulate the conclusion as direct as possible
Manuscript Writing and, preferably, somewhat controversial. The
• The manuscript is too long and intends to ultimate goal of the abstract is to make the reader
cover a too wide range of topics curious to find out more about how you have
• There is no clear line of argument and the reached your conclusion and wanting to read the
text is not organized in a logical flow
full-text article. Preferably, the conclusion of the
• Similar findings and statements are getting
repeated over and over again abstract should be the same as in the text. Many
• The manuscript goes into details of basic clinical journals ask for level of evidence. This
knowledge instead of using the opportunity information should then be added at the end of
to refer to studies where the reader the abstract if required.
independently could find more information
if necessary
• The authors have not taken the time for
reading the journal guidelines thoroughly 55.2.3 Introduction
• Not answering reviewers’ comments in case
of resubmission to a different journal As the word implies, the introduction should
introduce the reader to the topic of the study.
However, some authors write a far too compre-
55.2.1 Title hensive introduction that may, paradoxically,
instead assuage the interest of the study before
The title should capture the reader’s interest even reaching the results. Another common mis-
immediately. The title should be as short as pos- take is to start discussing results in the introduc-
sible and give the reader an idea of the study and tion. The introduction should be used to raise
the main results. A common mistake is that the some interesting discussion topics and highlight
title is too neutral and only mirrors the area of relevant questions regarding the topic. These
investigation. Instead, let the title speak. Let it be questions will then be answered and discussed in
loud and clear and shout out a direct finding of the discussion section. The introduction should
your study. The title should be a statement and end by turning focus toward your study. A clear
never a question. purpose and a hypothesis for your study should
be stated at this point. These are essential ele-
ments that the reader will bear in mind when
55.2.2 Abstract reading all the following sections of your manu-
script. A good rule of thumb is that the introduc-
In general, the abstract follows the same main tion should be no more than one manuscript page
structure for all journals; however, there may be in length.
some slightly different subheadings and word
limits between journals. An abstract should
include the study purpose, a brief presentation of 55.2.4 Materials and Methods
the methodology, the main results, and a conclu-
sion. Remember, the reader must be able to You have probably heard it before, but it is worth
understand which material and methods have repeating: The methods should be so well
been applied in order to understand the results described that the reader should be able to repeat
that are presented in the abstract. This means that your study like a recipe from a cookbook without
details of the methodology can be limited in the trouble. This means that the methods section
abstract, but the full description should be should be detailed, clear, and honest. The text
included in the main text of the manuscript. The should have a good flow and a readable language.
abstract also functions as an opportunity to raise Instead of trying to create a literary masterpiece
582 E. Svantesson et al.

of the methods section, keep it simple and pre- section should not be longer than one manuscript
cise. The most important aspects to focus on are page.
the inclusion and the exclusion criteria, the
description of the intervention or the experiment,
and a clear presentation of the outcome measure- 55.2.6 Discussion
ments. Consider using flow diagrams or tables to
illustrate your test setup. There is a number of The way you start the discussion is important. In
reporting guidelines published that can help you a few initial sentences, you should preferably
structure your methods section depending on the summarize the most important findings of your
type of study. Examples of such guidelines are study, which function as a foundation for the rest
the CONSORT [4] and the Preferred Reporting of your discussion. The discussion should be
Items for Systematic Reviews and Meta-Analyses written based on your results, and these should be
(PRISMA) statement [3]. The statistical analysis compared and contrasted with previous research,
should be described under a separate subheading, not the other way around.
where all calculations should be clearly reported. Thus, the discussion is not a forum for a gen-
This may include power, sample size, sensitivity eral discussion and presentation of other studies.
analysis, and drop-out analysis. In fact, sample Primarily ask yourself: What did your study
size calculation is always necessary in order to show? Thereafter, discuss these findings in rela-
ensure that the statistical power is adequate. tion to previous findings. Is likely that your
Statistics are really a separate science in itself results are true? What supports your findings
and do therefore not hesitate to ask for profes- compared with other studies and what findings of
sional help in order to ensure that the statistical your study might be contradicted based on previ-
analysis is correctly presented. Two common ous research? Also, focus on the clinical rele-
mistakes are to leave out information about the vance of your findings in the discussion. This is
sample size calculation and the IRB approval. It especially important if you have conducted pre-
can hardly be stated enough that (almost) all clinical research and are aiming for a clinically
studies need an IRB approval. oriented journal. Finally, all researchers know
that no study is perfect. There are always limita-
tions and confounding factors that could influ-
55.2.5 Results ence a result of a study. To be honest and humble
about such, potential factors increase the trust-
Now you have reached the part where you finally worthiness of a study. Furthermore, an under-
are allowed to present the results of your study. standing of limitations of a study will generate
The most important thing of this section is that new ideas for future studies and encourage hon-
the results are presented objectively. Preferably, est research. Therefore, think through your study
you have prepared a thorough study protocol limitations, and clearly present and discuss them
before conducting your trial where you already at the end of the section. All too often, limitations
have prepared an outline for your results section. are not as well reported as they should be.
There should be no subjective influence in the
result section whatsoever; save that for the dis-
cussion part. Take time to “get to know the data” 55.2.7 Conclusion
and to decide how you best present it. There is no
need to present all the findings in the text; choose The conclusion should be based on statistically
the most important one(s) to present in detail in significant findings from your study and nothing
the text. The text is in turn complemented by else. There is room for some slight speculation;
tables and figures in order to display all data. Aim however, such speculations should mainly be
for writing a short results section where the included in the discussion and never in the
results are not duplicated in text and in tables or conclusion. The conclusion should be a brief,
figures. A good rule of thumb is that the results true, and concrete statement of the evidence that
55  Common Mistakes in Manuscript Writing and How to Avoid Them 583

that the references are either in an incorrect for-


Fact Box 55.2: A General Outline of a mat or that they are not up to date. Each journal
Manuscript and the Contents of Each has specific guidelines for how to prepare the ref-
Section erences regarding the order and the format. Read
Introduction Raise some interesting the guidelines carefully—this is always well-
discussion topics and highlight invested time—and get to know your reference
relevant questions regarding the
topic. State a clear purpose and a
system so that you can adjust the references
hypothesis. Keep it short, accordingly. To avoid submitting a manuscript
approximately one manuscript with references that are not up to date, update
page your references just before submitting your man-
Materials and The reader should be able to
uscript to the journal. This is also logical, as you
methods repeat your study by reading this
section. Keep it simple and might have started the study a couple years back
precise. Consider it a cookbook and much could have happened during this time.
Results Present the results objectively. Taken together, the two common mistakes are for-
Focus on presenting the details mat errors and the use of non-updated references.
for the most important findings;
present data in table and figures
to minimize the length of this
section 55.2.9 Figures and Tables
Discussion Start by summarizing the most
important findings. Compare and
contrast your results with
Again, this is an area where all journals have differ-
previous research ent preferences of how these should be submitted
Conclusion Should be a brief, true, and and where in the manuscript they should be located.
concrete statement of the Thus, the primary way to avoid mistakes is to read
evidence that your study has
the guidelines of the journal. Another aspect to
contributed to. It is not an
extended discussion consider is that both tables and figures should be
designed in a way that makes them understandable,
independently of the rest of the article. It is impor-
tant that figures and tables are accompanied by
your study has contributed to and nothing else. In descriptive legends that are self-explanatory. Every
certain journals, there is also room for a brief figure should be able to be read as “stand-alone.”
comment of the clinical relevance of your study, This includes, for example, a presentation of abbre-
especially if you have conducted an experimental viations and key ideas. When used correctly, tables
study. A common mistake is that the conclusion and figures are valuable methods for presenting
is an extended discussion. Fact Box 55.2 summa- large volume of data, to visualize the results and to
rizes the general manuscript outline. keep the result text section short. Avoiding repeti-
tion is an important part of tables and figures. They
should give the details of the results, but not repeat
55.2.8 References them. Another mistake is the figure quality; figures
should be drawn by professional medical artists
One of the easiest and most straight-forward and not by amateurs.
aspects of preparing a manuscript should be to get
the references correct. Nevertheless, this is an Take-Home Message
area that is all too often defective among submit- • To write a manuscript is a process that takes
ted manuscripts. The most common explanation time and effort.
for this is probably that the authors simply have • Aim for a short and concise manuscript, with
not been thorough when reading the journal a clear line of argument.
guidelines before writing the reference section of • Read the journal guidelines thoroughly and
the manuscript. The most common mistakes are follow them in detail.
584 E. Svantesson et al.

• Mentors that can share experiences and tips 3. Moher D, Liberati A, Tetzlaff J, Altman DG. Preferred
reporting items for systematic reviews and meta-
about how to prepare a manuscript are a highly analyses: the PRISMA statement. J Clin Epidemiol.
valuable asset when aiming to develop a good 2009;62(10):1006–12. https://doi.org/10.1016/j.
skill for writing. jclinepi.2009.06.005.
4. Schulz KF, Altman DG, Moher D. CONSORT 2010
statement: updated guidelines for reporting paral-
lel group randomised trials. BMJ. 2010;340:c332.
References https://doi.org/10.1136/bmj.c332.
5. Swales JM, Feak CB. Academic writing for graduate
1. Katz MJ. From research to manuscript: a guide to sci- students: essential tasks and skills. Ann Arbor: The
entific writing. New York: Springer; 2009. University of Michigan Press; 2012.
2. MacArthur CA, Graham S, Fitzgerald J. Handbook of
writing research. New York: Guilford; 2006.

You might also like