You are on page 1of 4

Rapid Critical Appraisal of

Evidence-Based Randomized Controlled Trials

Practice (RCTs): An Essential Skill for
Bernadette Mazurek Melnyk, PhD,
RN, CPNP/NPP, FAAN Evidence-Based Practice (EBP)
Bernadette Mazurek Melnyk
Ellen Fineout-Overholt
ritical appraisal of research studies is the third step behind systematic reviews of RCTs, which are considered

C in evidence-based practice (EBP), following step

one, Asking clinical questions in PICO (i.e., popu-
lation, intervention or interest area; comparison
intervention or group, and outcome of interest) format, and
step two, Searching for evidence to answer the questions.
Level I or the strongest evidence to guide clinical practice
(Guyatt & Rennie, 2002; Melnyk & Fineout- Overholt, 2005).
In order for cause and effect relationships to be support-
ed, three criteria must exist. These criteria include: (a) the
independent variable (i.e., the treatment or intervention)
Once research is found to answer a burning PICO clinical must occur before the dependent variable (i.e., the out-
question (e.g., In teens, is cognitive-behavioral therapy or come) in time sequence; (b) a strong relationship between
relaxation techniques more effective in reducing anxiety the independent and dependent variable must exist, and (c)
symptoms?), the clinician must be able to rapidly appraise the relationship between the independent and dependent
studies in order to determine their validity and applicability variable cannot be explained by the influence of other vari-
to clinical practice. For years, research courses in many ables. For example, a multi-site RCT was recently conduct-
educational curriculums have taught critiquing studies as a ed by Melnyk and colleagues (2004) in order to determine
very detailed, time consuming process. Unfortunately, this whether the effects of an educational-behavioral interven-
method of teaching the appraisal process has often con- tion program (i.e., COPE = Creating Opportunities for
tributed to lasting negative attitudes about research and Parent Empowerment) delivered to mothers of critically ill
misperceptions by clinicians about the length of time that it young children would positively impact their and their chil-
takes to read and critically appraise a single research report. drens mental health/coping outcomes. Phase I of the COPE
Many practicing clinicians also believe that critical appraisal intervention was delivered in the pediatric intensive care
is much too difficult or that experts in EBP and researchers unit, Phase II of the intervention was delivered after transfer
are the only individuals who can be competent in the to the general pediatric unit, and Phase III of the intervention
process. Therefore, this paper will present a user-friendly, was delivered shortly following discharge from the hospital to
efficient approach to critically appraising a published ran- home. Outcome variables (e.g., maternal depression, anxi-
domized clinical trial (RCT). Implementing this method of ety, and post-traumatic stress disorder [PTSD] symptoms;
critical appraisal should contribute to enhanced confidence childrens externalizing and internalizing behaviors) were
in clinicians abilities to swiftly determine whether the find- then measured at 3, 6, and 12 months following hospitaliza-
ings from a RCT are valid and should be translated to their tion. Findings indicated that COPE mothers reported less
clinical practice settings. negative mood state, less depression, and fewer PTSD
symptoms at certain follow-up assessments following hospi-
Definition of and Characteristics of a RCT talization than control mothers. In addition, a significantly
A RCT or true experiment is the strongest design for test- higher percentage of control group children (25.9%) exhibit-
ing cause-and-effect relationships (e.g., whether a treatment ed clinically significant behavioral symptoms at 1 year fol-
or intervention impacts outcomes) (Melnyk & Fineout- lowing discharge as compared to only 2.3% of the children in
Overholt, 2005). In rating systems that grade the strength of the COPE group (Melnyk et al., 2004). Since the COPE and
evidence to guide clinical decision making, RCTs are consis- control subjects who participated in the study were similar on
tently identified as Level II evidence (i.e., the second baseline demographic and clinical variables at the beginning
strongest level of evidence to guide practice), ranked only of the study and site was controlled for statistically in the
analysis, it is not likely that other variables (e.g., baseline
maternal anxiety) contributed to the differences that were
Bernadette Mazurek Melnyk, PhD, RN, CPNP/NPP, FAAN, FNAP, found on the dependent or outcome variables at 6 and 12
is Dean and Distinguished Foundation Professor in Nursing, Arizona months following hospitalization.
State University College of Nursing, Tempe, AZ.
True experiments or RCTs are the gold standard design
Ellen Fineout-Overholt, PhD, RN, is Director, Center for the for determining whether an intervention or treatment
Advancement of Evidence-Based Practice and Associate Professor impacts patient outcomes. These types of experiments pos-
of Clinical Nursing, Arizona State University College of Nursing, sess three characteristics: (a) an experimental group who
Tempe, AZ. receives the experimental intervention or treatment; (b) a
control or comparison group who receives standard care or
The Evidence-Based Practice section focuses on the search for and a comparison intervention that is different from the experi-
critique of the best evidence to answer challenging clinical ques- mental treatment, and (c) random assignment or random-
tions so that the highest quality, up-to-date care can be provided to
children and their families. To submit questions or obtain author
ization to experimental and control or comparison groups
guidelines, contact Bernadette Mazurek Melnyk, PhD, RN, (e.g., subjects are assigned to the study groups by tossing a
CPNP/NPP, FAAN; Section Editor; Arizona State University College coin). For example, in the RCT by Melnyk and colleagues
of Nursing, PO Box 872602, Tempe, AZ 85287-2602, Ph: 480-965- (2004), mothers in the comparison group received informa-
6431; Fax: 480-965-6488; Email: tion about the hospitals policies and procedures instead of

50 PEDIATRIC NURSING/January-February 2005/Vol. 31/No. 1

Table 1. Three Key Questions for Rapid Critical Table 2. Rapid Critical Appraisal Checklist for a
Appraisal of RCTs Randomized Clinical Trial (RCT)

Are the findings valid? (i.e., as close to the truth as I. Are the study findings valid?
A. Were the subjects randomly assigned to the experimental
and control groups?Yes No Unknown
Are the findings important? (i.e., What is the impact
of the intervention [i.e., the size of the effect or the B. Were the follow-up assessments conducted long enough
extent to which the intervention or treatment to fully study the effects of the intervention?
worked]?) Yes No Unknown
Are the findings clinically relevant or applicable to
the patients for whom I am caring? C. Did at least 80% of the subjects complete the study?
Yes No Unknown

D. Was random assignment concealed from the individuals

information about what behaviors and emotions their children who were first enrolling subjects into the study?
would experience as they recovered from critical illness (i.e., Yes No Unknown
information contained in the COPE program). By randomly
assigning subjects to experimental and comparison/control E. Were the subjects analyzed in the group to which they were
groups, there is a good probability that the subjects in each randomly assigned?
of the groups will be similar on demographic and clinical vari- Yes No Unknown
ables at the start of the study, thereby making it less likely
that other variables (e.g., maternal or child age) will explain F. Was the control group appropriate?
any differences in outcomes at the end of the study. Yes No Unknown

Critical Appraisal Questions for RCTs G. Were the subjects and providers kept blind to study
There are three major questions in the critical appraisal group?
process (see Table 1). The first critical appraisal question in Yes No Unknown
reviewing a RCT is whether the findings of the study are valid
(i.e., as close to the truth as possible?). In order to answer H. Were the instruments used to measure the outcomes
valid and reliable?
this first major question, a series of sub-questions should be
Yes No Unknown
asked (see Table 2). The first question is Was random
assignment to the experimental and control groups conduct- I. Were the subjects in each of the groups similar on demo-
ed and was random assignment concealed from the individ- graphic and baseline clinical variables?
uals who were first enrolling subjects into the study? Yes No Unknown
Concealment of study group to the individuals enrolling sub-
jects is important so that they are not biased when approach- 2. What are the results of the study and are they
ing potential subjects to participate in a study. For example, important?
if a researcher knows that individuals are going to be enrolled A. How large is the intervention or treatment effect
in the experimental group, his or her level of enthusiasm (NNT, NNH, effect size, level of significance)?
might be greater when approaching subjects targeted to be __________________
in the experimental intervention group than when approach- B. How precise is the intervention or treatment (Confidence
ing subjects who are designated to receive the control inter- interval)?
vention. Thus, this may result in a higher level of participa-
tion in the experimental intervention group. In addition, the 3. Will the results help me in caring for my patients?
experimental group subjects may enter the treatment proto- A. Are the results applicable to my patients?
col with a different set of expectations and enthusiasm, which Yes No Unknown
could impact the studys outcome variables.
The second question concerning validity of a study is B. Were all clinically important outcomes measured?
Yes No Unknown
Were the follow-up assessments conducted long enough to
study the effects of the intervention and did all subjects
C. What are the risks and benefits of the treatment?
complete the study? It is important for studies to include ______________________
follow-up assessments over time in order to assess the sus-
tainability of an intervention or treatment so that both the D. Is the treatment feasible in my clinical setting?
short- and long-term outcomes of an intervention can be Yes No Unknown
determined. In some instances, follow-up periods are too
short to determine the effects of the intervention, which will E. What are my patients/familys values and expectations for
affect the study results. Additionally, retention rate is critical the outcome that is trying to be prevented and the treat-
in a study. If a large percentage of subjects withdraw or are ment itself?
dropped from a study, the results could turn out differently
than if all of the subjects would have remained in the study,
since individuals with a certain characteristic may be more mation were then permitted to switch to the control group,
likely to discontinue their participation. they might contaminate the results of the pure control group
The third question under the topic of validity of a study and, therefore, the outcomes in the study could be altered.
has to do with analysis of the data, that is, Were patients The next question under validity is Was the control
analyzed in the group to which they were originally group appropriate? Unfortunately, many intervention trials
assigned? do not control for the time that is spent with subjects in the
This also is called intention to treat analysis. For exam- experimental group by providing some type of comparison
ple, if patients who were given partial experimental infor- intervention that does not contain the experimental treat-

PEDIATRIC NURSING/January-February 2005/Vol. 31/No. 1 51

ment. As such, positive effects of the intervention could be between the two study groups. If the effect size was not
attributable to the fact that the experimental group received included as part of the report, a clinician would come to the
more time and attention, not because the intervention itself conclusion that music therapy was not effective in reducing
led to positive outcomes. anxiety; whereas, in reality, it was effective. Therefore, effect
Another key question dealing with validity is Were the sizes are important to include in research reports as they are
instruments used to measure the dependent variables or out- not dependent upon sample size and a critical indicator of
comes valid and reliable? An instruments validity means the magnitude of the experimental intervention. In interpret-
that the tool measures what it is intended to measure (e.g., ing the findings from a research report it is always important
the State-Trait Anxiety Inventory truly measures anxiety, not to assess clinical meaningfulness of the findings (e.g., a
depression). Reliability is the consistency of an instrument in large mean difference in outcome scores between the two
measuring the underlying construct. In instruments that have groups), not just statistical significance.
more than one item, reliability is typically reported as The next question to address in critical appraisal of a
Cronbachs alpha (i.e., an estimate of internal consistency or RCT is How precise is the treatment effect? Confidence
homogeneity of the instrument) (Melnyk & Fineout- intervals are the best indicators of the preciseness of the
Overholt, 2005). Although a Cronbachs alpha of .80 and estimated outcome values. For example, a 95% confidence
above is preferred in research, instruments with a value of interval is the range of values within which one can be 95%
.70 and above are generally regarded as adequate. sure that the true value lies for the whole population from
Additional questions under validity include: (a) Were the whom the study patients were selected (Melnyk & Fineout-
patients and providers kept blind to treatment (i.e., neither Overholt, 2005). The narrower the confidence interval, the
were aware of the group assignment of the subjects)?, (b) more confident the clinician can be about the study effect
Were the groups treated equally aside from the experimen- approximating the true effect of the intervention. In contrast,
tal treatment?, and (c) Were the groups equal at the begin- the wider the confidence interval, the less confident one can
ning of the study on demographic and clinical variables? be about approximating the true effect.
The second major important question in critically The final major question in critically appraising a RCT is
appraising a clinical trial is What are the results of the study Will the results help me in caring for my patients? Sub-
and are they important? When answering this question, it is questions in this category include: (a) Are the results
important to look at the magnitude of the treatment or inter- applicable to my patients?, (b) What are the risks and
vention effect (i.e., how well the intervention worked in the benefits of treatment?, (c) Is the intervention or treatment
experimental versus the control group). There are a variety feasible in my clinical setting?, and (d) What are my
of indices used in reports of studies to determine the magni- patients values and expectations for both the outcome that
tude of an effect of a treatment or intervention, the most is trying to be prevented and the treatment itself? These are
common being effect sizes, number needed to treat (NNT) important questions in critically appraising a study in that, if
(i.e., the number of individuals who would need to receive a research project was conducted with 8- to 12-year-olds
the experimental treatment to prevent one bad outcome or and you were practicing on an adolescent unit, the inter-
cause one additional good outcome), and number needed to vention may not be effective in the manner in which it
harm (NNH) (i.e., the number of individuals who, if they worked for younger children. In addition, although a certain
received the experimental treatment or individual, would intervention may be effective in producing positive out-
result in one additional person being harmed compared to comes, it may be far too expensive to implement on a daily
the clients in the control group). An effect size is an estimate basis or it may be in conflict with a patients or familys val-
of the potency of the intervention or treatment (i.e., how ues. Therefore, these are important factors to consider in
large it is). It is most commonly calculated by subtracting making the decision of whether to implement the interven-
the mean of the control group from the mean of the experi- tion in your clinical setting.
mental group and dividing by the pooled standard deviation In summary, critical appraisal is an essential step in evi-
(Melnyk & Fineout-Overholt, 2005). Small, medium, and dence-based decision making. Once research studies are
large effects are designated as .2, .5, and .8, respectively. evaluated, decisions can be made whether to implement a
Unfortunately, effect size is not commonly reported in many certain treatment in a clinical practice setting. Then, evalu-
intervention trials. Most researchers report only the statisti- ating the implementation of research evidence into practice
cal significance of their findings rather than the strength of is of utmost importance. Reading and evaluating studies on
the findings (i.e., effect size). However, it is important to a regular basis will facilitate the speed at which clinicians are
emphasize that statistical significance of findings are largely able to critically appraise and translate applicable findings
dependent upon the sample size and statistical power in a into practice.
study; the larger the sample, the greater the power and prob-
ability of detecting significant differences between groups,
even when the effect size of the treatment is small. For References
example, a hypothetical study conducted to determine the Guyatt, G., & Renee, D. (2002). Users guides to the medical litera-
effects of music therapy in comparison to distraction on ture. Washington, D.C: American Medical Association (AMA)
state anxiety in 40 6- to 10-year-old children undergoing Press.
radiation therapy found that there were no statistically sig- Melnyk, B.M., Alpert-Gillis, L., Feinstein, N.F., Crean, H.F.,
Johnson, J., Fairbanks, E., Small, L., Rubenstein, J., Slota,
nificant differences in anxiety between the two groups when M., & Corbo-Richert, B. (2004). Creating opportunities for
an independent t-test was performed (p = .15). However, parent empowerment: Program effects on the mental
calculation of the effect size of the intervention revealed a health/coping outcomes of critically ill young children and
medium effect (.5) for the intervention. The most likely rea- their mothers. Pediatrics, 113(6), e597-e607. Retrieved from
son that statistical significance was not found when con- [ full/113/6/e597].
ducting the t-test is because there were only 40 subjects in Melnyk, B.M., & Fineout-Overholt, E. (2005). Evidence-based prac-
the study. A larger sample size (e.g., 80 children) would pre- tice in nursing & healthcare. A guide to best practice.
sumably have resulted in a statistically significant difference Philadelphia: Lippincott Williams & Wilkins.

52 PEDIATRIC NURSING/January-February 2005/Vol. 31/No. 1