Professional Documents
Culture Documents
Summary
• A number of hierarchies of evidence have been developed to enable different
research methods to be ranked according to the validity of their findings.
However, most have focused on evaluation of the effectiveness of interventions.
When the evaluation of healthcare addresses its appropriateness or feasibility,
then existing hierarchies are inadequate.
• This paper reports the development of a hierarchy for ranking of evidence
evaluating healthcare interventions. The aims of this hierarchy are twofold.
Firstly, it is to provide a means by which the evidence from a range of
methodologically different types of research can be graded. Secondly, it is to
provide a logical framework that can be used during the development of
systematic review protocols to help determine the study designs which can
contribute valid evidence when the evaluation extends beyond effectiveness.
• The proposed hierarchy was developed based on a review of literature,
investigation of existing hierarchies and examination of the strengths and
limitations of different research methods.
• The proposed hierarchy of evidence focuses on three dimensions of the
evaluation: effectiveness, appropriateness and feasibility. Research that can
contribute valid evidence to each is suggested. To address the varying strengths
of different research designs, four levels of evidence are proposed: excellent,
good, fair and poor.
• The strength of the proposed hierarchy is that it acknowledges the valid
contribution of evidence generated by a range of different types of research.
However, hierarchies only provide a guide to the strength of the available
evidence and other issues such as the quality of research also have an important
influence.
Introduction
The past two decades have seen a growing emphasis on
Correspondence to: David Evans, Lecturer, Department of Clinical
Nursing, University of Adelaide, South Australia 5005 (tel.: +61 basing healthcare decisions on the best available evidence.
8 8303 6288) This evidence encompasses all facets of healthcare, and
includes decisions related to the care of an individual, an [National Health Service (NHS) Centre for Reviews and
organization or at the policy level. Attention has also Dissemination, 1996]. Ranking research designs according
focused on the quality of the scientific basis of healthcare to their internal validity not only grades the strength of
and, with this, recognition that not all evidence is equal in the evidence, but also indicates the confidence the end-
terms of its validity. user can have in the findings.
To aid the interpretation and evaluation of research
findings, hierarchies of evidence have been developed
Hierarchies of evidence
which rank research according to its validity. The major
focus of these hierarchies has been effectiveness and, as a Hierarchies of evidence were first popularized by the
result, the randomized controlled trial (RCT) has been Canadian Task Force on the Periodic Health Examination
commonly viewed as providing the highest level of in the late 1979, and since that time many different
evidence. While many valid approaches to research hierarchies have been developed and used (Canadian Task
exist, they are often ranked at a level lower than the Force on the Periodic Health Examination, 1979; Sackett,
RCT although each approach provides its own unique 1986; Woolf et al., 1990; Cook et al., 1992, 1995; Guyatt
perspective. et al., 1995; Wilson et al., 1995). Until recently, these
To address this, a hierarchy for ranking research focused on effectiveness, and for this reason the RCT was
evaluating healthcare interventions was developed. This most commonly listed as providing the highest level of
hierarchy differs from existing models in that it recognizes evidence.
the contribution of evidence generated by a range of These hierarchies have used a range of different
research methodologies. approaches to grading research. For example, one hierar-
chy for clinical recommendations used levels A1 through
to C2 (Guyatt et al., 1995). Level A1 represented RCTs
Grading the evidence
with no heterogeneity and a confidence interval (CI) all on
It has long been recognized that not all research designs one side of the threshold number needed to treat (NNT).
are equal in terms of the risk of error and bias in their Level C2 at the other end of this scale was assigned to
results. When seeking answers to specific questions, some observational studies with a CI overlapping the threshold
research methods provide better evidence than that NNT. NNT is the number of patients who have to be
provided by other methods. That is, the validity of the treated to prevent one event occurring (see Information
results of research varies as a consequence of the different Point in Vol. 10, no. 6, p. 783). Another hierarchy used a
methods used. For example when evaluating the effec- scale of level I through to level IV [National Health and
tiveness of an intervention, the RCT is considered to Medical Research Council (NHMRC), 1995]. Level I was
provide the most reliable evidence (Muir Gray, 1997; assigned to evidence obtained from a systematic review of
Mulrow & Oxman, 1997; Sackett et al., 1997). It is all relevant randomized controlled trials, while level IV
considered the most reliable evidence because the pro- comprised opinions of respected authorities, descriptive
cesses used during the conduct of an RCT minimize the studies, or reports from expert committees. The Cochrane
risk of confounding factors influencing the results. As a Collaboration ranks the validity of studies on a scale of A
result of this, the findings generated by RCTs are likely to to C, with A indicating that the study met all quality
be closer to the true effect than the findings generated by criteria (Mulrow & Oxman, 1997). One hierarchy that was
other research methods. used during the development of clinical guidelines used an
This confidence in the findings of research has impor- alpha-numerical approach to rank both evidence and
tant implications for those developing practice guidelines recommendations (Meltzer et al., 1998; Sackett, 1986).
and clinical recommendations, or implementing the results The highest ranking in this hierarchy was ÔGrade A
of research in their area of practice. The aim during this Recommendations supported by Level I evidenceÕ (Cook
development and implementation is to use the best et al., 1992).
available evidence. The problem that arises from this With the increasing popularity of systematic reviews,
situation is how to determine the best evidence. To these are starting to replace the RCT as the best source of
address this, hierarchies of evidence have been developed evidence (NHMRC, 1995). More recently, one hierarchy
to allow research-based recommendations to be graded. listed N of 1 randomized trials as the highest level of
These hierarchies or levels are used to grade primary evidence (Guyatt et al., 2000). N of 1 randomized trials
studies according to their design, and so reflect the degree use a single patient who is randomly allocated to the
to which different study designs are susceptible to bias treatment and comparison interventions. Hierarchies have
now been developed to address a range of other areas, Evidence on effectiveness, appropriateness and feasi-
including prevention, diagnosis, prognosis, harm and bility provides a sounder base for evaluating healthcare
economic analysis (Carruthers et al., 1993; Ball et al., interventions, in that it acknowledges the many factors
1998; Meltzer et al., 1998). that can have an impact on success. This highlights the
Ultimately, these hierarchies aim to provide a simple range of dimensions that evidence should address before
way to communicate a complex array of evidence healthcare interventions can be adequately appraised. It
generated by a variety of research methods. From the also means that, no matter how effective an intervention
perspective of healthcare decision-makers, they provide a is, if it cannot be adequately implemented, or is
measure of the trust that can be placed in the recom- unacceptable to the consumer, its value is questionable.
mendations, or alert the user when caution is required. The risk with available hierarchies is that, because of
However, the exact format and order of rank for research their single focus on effectiveness, research methods that
designs within these hierarchies have not been deter- generate valid information on the appropriateness or
mined and existing systems have used a range of different feasibility of an intervention may be seen to produce
approaches. lower level evidence.
In response to these limitations of existing frameworks,
a new hierarchy of evidence was developed that acknowl-
Determining best evidence
edges the legitimate contribution of a range of research
A limitation of current hierarchies is that most focus solely methodologies for evaluating healthcare interventions (see
on effectiveness. Effectiveness is concerned with whether Fig. 1). This approach addresses the multidimensional
an intervention works as intended. While this is obviously nature of evidence and accepts that valid evidence can be
vital, the scope of any evaluation should be broader. For generated by a range of different types of research. It does
example, it is also important to know whether the not attempt to diminish the value of RCTs, or the
intervention is appropriate for its recipient. From this importance of determining effectiveness; rather, it accepts
perspective, the evidence on appropriateness concerns the that RCTs answer only some of the questions. Impor-
psychosocial aspects of the intervention and so would tantly, this framework acknowledges the contribution of
address questions related to its impact on a person, its interpretive and observational research.
acceptability, and whether it would be used by the From a slightly different perspective, the hierarchy
consumer. A third dimension of evidence relates to its was also developed to serve as a framework during the
feasibility, and so involves issues concerning the impact production of systematic review protocols. In this context,
it would have on an organization or provider, and the the aim of the hierarchy was to help formulate review
resources required to ensure its successful implementa- questions and to assist in determining what research could
tion. Feasibility encompasses the broader environmental provide valid evidence when questions extended beyond
issues related to implementation, cost and practice change. the effectiveness of an intervention.
Figure 1 Hierarchy of evidence: ranking of research evidence evaluating health care interventions.
findings (McKee et al., 1999). However, these approaches Firstly, as with effectiveness, results generated by multi-
can provide complementary evidence, and end-users must centre studies and systematic reviews represent the best
be aware that both methods have their strengths and evidence on the appropriateness of an intervention.
weaknesses (McKee et al., 1999). However, the systematic review and multicentre study
In addition to the studies already discussed, evidence is need not be limited to RCTs, but would focus on all
also produced by other methods such as non-randomized methods that can reasonably be used to evaluate the
controlled trials, un-controlled trials, and studies with intervention from the perspective of appropriateness.
historical controls; however, their results are at greater risk Recommendations based on these sources of evidence
of error (Dawson-Saunders & Trapp, 1994). With quasi- would be at least risk of error.
experimental designs, such as the non-randomized con- Good evidence can also be generated by a range of other
trolled trial, it is more difficult to show that any difference research methods. As with effectiveness, a well-conducted
in outcome is the result of the intervention rather than single-centre RCT or observational study can provide
differences between groups (Elwood, 1998). These non- valid evidence about the appropriateness of an interven-
randomized studies differ from observational studies tion through a focus on psychosocial outcome measures.
because the allocation to comparison groups is made by As previously stated, while experimental and observational
the researcher rather than healthcare workers who are studies evaluate the intervention from different perspec-
independent of the study. As a result of these factors, the tives, the evidence is complementary.
risk of error or bias is high. Interpretive studies can also contribute valid evidence,
Uncontrolled trials may also be used to evaluate an in that they represent the consumer’s perspective on the
intervention, but the lack of any comparison group makes treatment, illness or other such phenomenon, and thus
interpretation of findings difficult. The exception to this is help capture the subjective human experience that is often
studies with dramatic results, for example, the administra- excluded from experimental research. This interpretive
tion of oxygen to a hypoxic person or adrenaline to a person inquiry helps healthcare workers gain an understanding of
in shock. However, for most situations, the evidence everyday situations and experiences (Van Manen, 1990;
generated by uncontrolled trials should be regarded with Van der Zalm, 2000). While this information differs
suspicion, and must also be ranked at a lower level than the considerably from that generated by experimental or
findings of RCTs or observational studies. observational research, it contributes to our understanding
Finally, evidence about the effectiveness of an inter- of the impact of healthcare and is no less valid than that
vention may be generated through descriptive studies, produced by other methods.
expert opinion, case studies or poorly conducted studies. Evidence on appropriateness can also be generated by
This evidence is at the greatest risk of error and is descriptive studies such as surveys, questionnaires and case
inadequate for evaluating the effectiveness of an interven- studies. These contribute descriptive data related to
tion. As a result these methods are ranked as the lowest interventions, their use and consumer responses. In
level of evidence. addition to this, focus groups have emerged as a method
for gathering information on the feelings and opinions of
small groups of people, and so can aid in the evaluation
APPROPRIATENESS
of healthcare programmes (Beaudin & Pelletier, 1996;
Appropriateness, in this context, addresses the impact of Robinson, 1999). This information offers another perspec-
the intervention from the perspective of its recipient. It tive on appropriateness and is valid evidence. However, its
also relates to the impact of illness to enable this strength is less than that of the evidence produced
information to be integrated into healthcare management by experimental, observational or interpretive research.
and to assist in the prioritization of care. Appropriateness Finally, evidence can also be generated by expert opinion or
is concerned more with the psychosocial aspects of care poor quality studies; however, this is at the greatest risk of
than with the physiological and, with regard to the error and as a result is ranked as the lowest level of evidence.
intervention, is reflected in questions such as:
• What is the experience of the consumer?
FEASIBILITY
• What health issues are important to the consumer?
• Does the consumer view the outcomes as beneficial? Feasibility addresses the broader environment in which
The range of research methods that can contribute valid the intervention is situated and involves determining
evidence on the appropriateness of an intervention is whether the intervention can and should be implemented.
broader than that addressing effectiveness (see Fig. 1). This focus acknowledges that the process of intentional
psychosocial impact of interventions and that consumers’ Other issues, such as what outcome measures were used
priorities on important health needs may differ from those and the populations studied, also exert a major influence
of the providers of care. While the views of the consumer on the usability of the evidence. While I have used this
have long been part of the rhetoric, to date they have fitted hierarchy to provide a logical framework for a review, it has
poorly within the evidence-based framework. Through the not been subject to any formal evaluation and so caution is
use of this hierarchy, evidence addressing this aspect of needed. Finally, and most importantly, hierarchies cannot
the evaluation of an intervention can be ranked at a more be used to rank evidence without some consideration of
appropriate level. the quality of research. Regardless of the research method,
While an intervention may be effective, it must also be if the processes used during the study were poor, then the
feasible to implement. This relates to such things as cost, findings must be regarded with suspicion.
healthcare workers’ acceptance and the resources which
will be required to support the intervention. This
Conclusion
hierarchy recognizes that evidence addressing the feasi-
bility of an intervention is as important as that addressing The proposed hierarchy of evidence provides a tool by
effectiveness. A broader approach to the ranking of which research addressing the many dimensions of an
evidence will provide a more robust scientific base for intervention can be ranked at an appropriate level. This
healthcare, in that it moves beyond the single focus of approach takes the emphasis away from the RCT, to one
effectiveness that has dominated the evidence-based that accepts that different research designs may be
healthcare movement since its inception. required for different clinical questions. The focus on
effectiveness, appropriateness and feasibility provides a
broader base for evaluating healthcare, and one that better
The gold standard
fits the perspective of clinical practice.
In the context of this hierarchy it can be argued that there This hierarchical approach recognizes the greater
are two interpretations of the label Ôgold standardÕ. The strength of evidence generated by systematic reviews
common use of this term refers to the optimal research and multicentre studies because the findings have been
design to answer a question. Over the past decade, this derived from multiple populations, settings and circum-
label has most commonly been applied to RCTs evaluating stances. In all three dimensions of the evaluation of an
the effectiveness of interventions. However, for research intervention, these sources provide the most valid and
questions addressing issues other than effectiveness, reliable evidence. Importantly, this hierarchy acknowl-
different methods will be needed. The optimal research edges that a range of research methods can contribute
method will be determined by the type of question, and it valid evidence.
is the method that produces the most valid evidence that The hierarchy provides a guide that helps the determine
should become the standard to which others are compared. best evidence; however, factors such as research quality
Secondly, the use of this hierarchical structure for will also exert an influence on the value of the available
grading evidence provides another interpretation of what evidence. Finally, for an intervention to be fully evaluated,
is meant by the gold standard. The concept of gold evidence on its effectiveness, appropriateness and feasi-
standard could move beyond research design and refer to bility will be required.
evidence addressing all three dimensions of the evaluation
of an intervention. That is, evidence demonstrates that the
References
intervention works, can be implemented and fulfils the
needs of its consumers. Only when all these dimensions Ball C., Sackett D.L., Phillips B., Haynes B. & Straus S. (1998) Levels
of Evidence and Grading Recommendations. Centre for Evidence
have been subjected to investigation can an intervention be
Based Medicine, http://cebm.jr2.ox.ac.uk/index.extras.
fully appraised and the evidence considered to be of a gold Basche C.E. (1987) Focus groups interview: an underutilized
standard. research technique for improving theory and practice in health
education. Health Quarterly 14, 411–418.
Beaudin C.L. & Pelletier L.R. (1996) Consumer-based research:
Cautionary note using focus groups as a method for evaluating quality of care.
Journal of Nursing Care Quality 10, 28–33.
It must be acknowledged that the use of any hierarchy is, Benson K. & Hartz A.J. (2000) A comparison of observational
at best, a guide rather than a set of inflexible rules. A studies and randomised controlled trials. New England Journal of
hierarchy provides the end-user of research with a Medicine 342, 1878–1886.
framework to judge the strength of available evidence.
Black N. (1996) Why we need observational studies to evaluate the McKee M., Britton A., Black N., McPherson K., Sanderson C. &
effectiveness of healthcare. British Medical Journal 312, 1215– Bain C. (1999) Interpreting the evidence: choosing between
1218. randomised and non-randomised studies. British Medical Journal
Brewin C.R. & Bradley C. (1989) Patient preferences and 319, 312–315.
randomised clinical trials. British Medical Journal 299, 313–315. Meltzer S., Leiter L., Daneman D., Gerstein H.C., Lau D.,
Canadian Task Force on the Periodic Health Examination. (1979) Ludwig S., Yale J., Zinman B. & Lillie D. (1998) 1998 clinical
The periodic health examination. Canadian Medical Association practice guidlines for the management of diabetes in Canada.
Journal 121, 1193–1254. Canadian Medical Association Journal 159, S1–S29.
Carruthers S.G., Larochelle P., Haynes R.B., Petrasovits A. & Meyer J. (2000) Using qualitative methods in health related action
Schiffrin E.L. (1993) Report of the Canadian Hypertension research. British Medical Journal 320, 178–181.
Society consensus conference: 1. Introduction. Canadian Medical Miller J.N., Colditz G.A. & Mosteller F. (1989) How study design
Association Journal 149, 289–293. affects outcomes in comparisons of therapy. II. Surgical. Statistics
Chalmers T.C., Celano P., Sacks H.S. & Smith H. (1983) Bias in in Medicine 8, 455–466.
treatment assignment in controlled clinical trials. New England Muir Gray J.A. (1997) Evidence-Based Healthcare. Churchill
Journal of Medicine 309, 1358–1361. Livingstone, New York.
Colditz G.A., Miller J.N. & Mosteller F. (1989) How study design Mulrow C.D. (1987) The medical review article: state of the science.
affects outcomes in comparisons of therapy. I: Medical. Statistics Annals of Internal Medicine 106, 485–488.
in Medicine 8, 441–454. Mulrow C.D. & Oxman A.D. (1997) Cochrane Collaboration Hand-
Concato J., Shah N. & Horwitz R.I. (2000) Randomised controlled book (database on disk and CDROM). The Cochrane Library,
trials, observational studies and the hierarchy of research designs. The Cochrane Collaboration, Oxford, Updated Software.
New England Journal of Medicine 342, 1887–1892. NHMRC (1995) Guidelines for the Development and Implementation
Cook D.J., Guyatt G.H., Laupacis A. & Sackett D.L. (1992) Rules of Clinical Guidelines, 1st edn. Australian Government Publishing
of evidence and clinical recommendations on the use of Service, Canberra.
antithrombotic agents. Chest 102, 305s–311s. NHS Centre for Reviews and Dissemination (1996) Undertaking
Cook D.J., Guyatt G.H., Laupacis A., Sackett D.L. & Goldberg Systematic Reviews of Research on Effectiveness. CRD Guidelines for
R.J. (1995) Clinical recommendations using levels of evidence for Those Carrying Out or Commissioning Reviews. University of York,
antithrombotic agents. Chest 108, 227s–230s. York.
Cook D.J., Mulrow C.D. & Haynes B. (1998) Synthesis of best Robinson N. (1999) The use of focus group methodology: with
evidence for clinical decisions. In: Systematic Reviews: Synthesis of selected examples from sexual health research. Journal of
Best Evidence for Health Care Decisions (eds Mulrow C.D. & Advanced Nursing 29, 905–913.
Cook D.). American College of Physicians, Philadelphia. Sackett D.L. (1986) Rules of evidence and clinical recommendations
Dawson-Saunders B. & Trapp R.G. (1994) Basic and Clinical on the use of antithrombotic agents. Chest 89, 2s–3s.
Biostatistics. Prentice Hall International, London. Sackett D.L., Richardson W.S., Rosenberg W. & Haynes R.B.
Elwood M. (1998) Critical Appraisal of Epidemiological Studies and (1997) Evidence Based Medicine: How to Practice and Teach EBM.
Clinical Trials, 2nd edn. Oxford University Press, Oxford. Churchill Livingstone, New York.
Feinstein A.R. (1989) Epidemiologic analysis of causation: the Van der Zalm J.E. (2000) Hermeneutic-phenomenology: providing
unlearned scientific lessons of randomised trials. Journal of living knowledge for nursing practice. Journal of Advanced
Clinical Epidemiology 42, 481–489. Nursing 31, 211–218.
Guyatt G.H., Sackett D.L., Sinclair J.C., Hayward R., Cook D.J. & Van Manen M. (1990) Researching Lived Experience: Human Science
Cook R.J. (1995) Users guide to the medical literature: IX. A for an Action Sensitive Pedagogy. State University of New York,
method for grading healthcare recommendations. JAMA 274, London.
1800–1804. Wilson M.C., Hayward R.S.A., Tunis S.R., Bass E.B. & Guyatt G.
Guyatt G.H., Haynes R.B., Jaeschke R.Z., Cook D.J., Green L., (1995) Users guide to the medical literature. VIII. How to use
Naylor C.D., Wilson M.C. & Richardson W.S. (2000) Users clinical practice guidelines; B. What are the recommendations
guide to the medical literature XXV. Evidence-based medicine: and will they help you in caring for your patients. JAMA 274,
Principles for applying the users guides to patient care. JAMA 1630–1632.
284, 1290–1296. Woolf S.H., Battista R.N., Anderson G.M., Logan A.G. & Wang E.
Horwitz R.I., Viscoli C.M., Clemens J.D. & Sadock R.T. (1990) (1990) Assessing the clinical effectiveness of preventative
Developing improved observational methods for evaluating maneuvers: analytic principles and systematic methods in
therapeutic effectiveness. American Journal of Medicine 89, 630– reviewing evidence and developing clinical practice recommenda-
638. tions. Journal of Clinical Epidemiology 43, 891–905.