Professional Documents
Culture Documents
Daniel L Riddle
Dl, Riddle, PhD, PT, is Associate Professor, Department of Physical Therapy, Virginia Commonwealth University, 1200 East Broad, Richmond, VA
23298-0224 (USA) (driddle@hsc.vcu.edu).
Professional or Orthopedic surgery Physical therapy Physical therapy Many medical and nonmedical
discipline disciplines
orientation of
system
developer
TY pe Status index Clinical guideline index Clinical guideline index Mixed index
Method of Judgment approach Judgment approach Judgment approach Judgment approach
developmento
Purpose To determine the pathology To determine the To determine the For clinical decision making,
" A statistical approach t o developir~ga classification systeru relies primaril? on statistical procedures to guide drcisiorls about how t o group patirr~ts.A judgment
approach telies primal-ily on the clinical experience of the developer or on co~l~rnortly arcrpted clinical knowledge to assign patirnr to groups.
"LBP=lo\v back pain.
'Wzuvorking, ]=idle.
using a critical appraisal approach recommended by classification systems were judged to be most appropriate
Buchbinder and colleagues.")," The 4 systems selected for critical evaluation because the systems are thor-
were proposed by Bernard and Kirkaldy-Willis," Delitto oughly described in the literature2"-'hnd in continuing
and ~olleagues,-'!{,~4
McKenzie,'%nd the Quebec Task education courses,2Vhey are reported to be [wed in
Force on Spinal Disorders (QTF)z"Tab. 1 ) . These 4
TYpe Status index Status index Status index Status index Status index Status index Clinical guideline
5 index
T
U Method of Statistical approach Statistical approach Statistical approach Statistical Judgment approach Judgment approach Judgment
L developmenta approach approach
C
'<
9 Purpose To identify To identify To identify To identify To identify groups of To identify groups To guide treatment
9
w homogeneous homogeneous homogeneous homogeneous patients with of patients based for groups of
groups of patients groups of patients groups of patients groups based similar signs or on chronicity and patients with
with similar based on diagnosed with a on trunk motion symptoms pain distribution similar signs
physical demographic and psychiatric disorder measures indicating the and symptoms
impairments pain behavior based on DSM Ill presence of
riter ria^^,^^ pathology
Setting Outpatient Outpatient Outpatient Not specified Not specified Not specified Outpatient
Domain of Most patients with Patients with purely Patients with LBP who All patients with All patients with LBP All patients with Most patients with
interest LBP~ organic LBP have a psychiatric LBP LBP without LBP
disorder serious
pathology
Patients Patients with pain Patients with Patients with purely None None None Patients who have
excluded below knee, psychiatric organic disorders, had spinal
neurological disorders, pain pain below the surgery
signs, fracture, below the gluteal gluteal folds,
stenosis, folds, malignancy, malignancy,
infection, infection, fracture, infection, fracture,
pregnancy, neurological signs neurological signs
surgery in past
3 mo, cancer,
psychiatric
disease
" A statistical approach to developing a classification system relies primarily on statistical procedures to guide decisions about how to group patients. A judgment approach relies primarily on the clinical experience of
(he developer or o n cornmonly accepted clinical knowledge to assign patients to groups.
" . 6 ~ = l o u hack pain.
?? ......................,
a
a
b
l!
Domain of Interest
clinical practice,Sz5 or they use diagnostic terms that patients with LBP.37This large number of codes would
are familiar to physical therapists.22.26 appear to be excessive and impractical for routine
clinical use.
Prior to examining the 4 classification systems selected
for critical appraisal, I will review some background The use of clearly described classification systems may
material. I will present arguments as to why classification enhance the effectiveness of treatment. Data suggest that
systems should enhance the care of patients with LBP. I patients treated with an approach based on an assigned
will review the terminology proposed by Buchbinder et classification do better than patients whose treatment is
a120 to standardize the descriptions of the classification not based on their pretreatment
systems discussed in this article. Some of the work of Although these studies should be considered to be
Feinstein27 relating to the types of classification systems preliminary, they suggest that patients classified using a
and how they are derived will be reviewed. I will use system designed to guide treatment may be treated more
Feinstein's work to discuss other classification systems effectively than patients treated without regard to
not selected for critical review. Classification systems classification.
described by Moffroid et Coste et a1,29.30Marras et
al,31 Binkley et al," Mooney," and S i k ~ r s k will
i ~ ~ be Some researcher^^^.^^ contend that randomized clinical
discussed and are summarized in Table 2. trials (RCTs) could be better conducted if patients with
idiopathic LBP were placed into homogeneous groups
Why Classify? prior to treatment. Most RCTs have lumped apparently
Perhaps the most compelling argument for developing heterogeneous patients with either acute or chronic LBP
and using classification systems is that our current system into one group prior to randomly assigning the patients
for grouping patients appears to be i n a d e q ~ a t e The
.~ for t r e a t ~ n e n t . ~Because
~ - ~ ~ most RCTs have considered
most common classification used by physicians and patients with LBP as belonging to a homogeneous
physical therapists is the International Classification of group, these studies probably have not measured a
Diseases (ICD)." The ICD is a taxonomy of diagnostic treatment effect that might be expected from a truly
labels used by many practitioners for the purposes of homogeneous sample of patients. Not all researchers,
standardizing the nomenclature for patient diagnoses however, agree that the identification of homogeneous
for statistical and administrative purposes.35Because the subgroups of patients with LBP for RCTs is necessary.
ICD does not describe the procedures used to apply F a a argued,
~ ~ ~ for example, that no evidence exists to
diagnostic labels, the reliability and validity of assigning support the argument for classification prior to ran-
ICD codes are quite low." The ICD, therefore, would domly assigning patients to exercise therapy groups in a
appear to have very limited use for making judgments RCT. Based on the research priorities established by the
about treatment, prognosis, or the presence of pathol- International Forum for Primary Care Research on Low
ogy. The ICD-9, for example, lists 66 codes for use on Back Pain, other researchers7 apparently do not agree
Flexion Dysfunction
Derangement
Unilateral LBP is present, buttock or
Syndrome 5 thigh pain may be present, pain extends
\ \ I I \ be~bwthe knee, n o spinal deformity \
Derangement
Syndrome 6 h/
l
Unilateral LBP is present, pain usually
constant and below the knee, lateral shift
and reduced lordosis deformity are
present, neurologicaldeficits common
I
Derangement
Syndrome 7 Unilateralor bilateral LBP is present,
buttock or thigh pain may be
present, accentuated lumbar lordosis is
present
Figure 3.
An illustration of the domain, categories, and criteria for the classification system developed b y McKenzie.25 LBP=low back pain, SI=sacroiliac.
Patients with
musculoskeletal
Figure 4.
An illustration of the domain, categories, and criteria for the classification system developed by Moffroid and colleagues.28 LBP=low back pain.
with LBP who either do or d o not have evidence of of the gluteal folds, and patients with neurological
psychological impairment. They used an approach sim- involvement were not admitted to the studies. The
ilar to that used by Moffi-oid and colleague^,^" a statistical DSM-111 criteria were used to identify patients with
approach to divide patients with LBP into homogeneous evidence of psychiatric disease."'J." Of the 330 patients
categories. Coste and colleagues, like Moffroid et al, admitted to the studies, 136 patients were found to have
relied entirely on statistical procedures for group evidence of psychiatric disease. The authors divided the
assignments. sample into those subjects with no evidence of psychiat-
ric disease (purely organic LBP) and those subjects
Coste and c o l l e a g ~ e s 2collected
~ ~ ~ ~ demographic and diagnosed with a psychiatric illness in addition to their
physical examination data on 330 patients referred for LBP.
treatment of LBP. Patients reporting pain below the area
.
Physical Therapy Volume 78 . Number 7 . July 1998 Riddle . 719
Classification System of Binkley and Colleagues 32
Positive radiographlbonescan
Positive CT scanlbiopsy
Scheurmann
Positive radiograph
(
Spondylollsthesis Positive radiograph
(
Extension increases painlparesthesia, positive CT
scan, flexion relieves pain
Spinal Congenital
Positive radiograph
(
-
Positive bone scan or biopsy
(
Posterolateral Disk Pain increased with repeated flexion or
sustained flexion, onset involved flexion
Figun, 8.
An illustration of the domain, categories, and criteria for the classification system developed by Binkley and colleagues.32 LBP=low back pain,
CT=computed tomography, ROM=range of motion, SLR=straight leg raise, SI=sacroiliac.
1
Low back and th~ghpaln for
at least 3 mo Most of the classification systems that
are in clinical use were developed using
Figure 9.
111C Low back and
Ileg
at least 3 mo
pacn for
An illustration of the domain, categories, and criteria for the classification system developed by
the judgment approach and not pri-
marily a statistical approach. Clinicians,
therefore, might be tempted to avoid
using existing classification systems
because they were not developed using
approaches grounded in sound mea-
Mooney.33 surement science. Buchbinder and col-
leaguesa)," and F e i n ~ t e i n "have
~ sug
gested, however, that what ultimately
whether the system he proposed should be used for all determines the usefulness of a classification system is
patients with LBP or only for some patients. how well the classification system functions given the
purpose for which it was designed.
The usefulness of Mooney's system appears to be very
limited for physical therapists because it is based entirely Introduction to Classification Systems Selected
on symptom duration and distribution. All of the other for Critical Appraisal
classification systems described in this article used many
other variables. The domain was not clearly defined, and The Classification System of Bernard and Kirkaldy-Willis
the system does not appear to account for patients with The classification system proposed by Kirkaldy-Willis and
serious pathology or patients with pathology unrelated and later modified by Bernard and Kirkaldy-
to the disk. The assumption that disk pathology is WillisZ2 is a status index and a classic example of a
responsible for almost all cases of LBP appears to restrict pathology-based system (Tab. 1 ). This system is shown in
the use of this system to an unclearly defined and Figure 11. In their original article in 1979, Kirkaldy-Willis
relatively small group of patients. and Hill briefly described what they believed to be the
medical history, physical examination! and radiological
Figure 10.
An illustration of the domain, categories, and criteria for the classification system developed by Sik0rski.3~
ain on walking, pain relieved by rest, feeling the legs are goin
to give way, feeling of leg numbness, night pain relieved by
Central Stenosis walking, SLR only slightly limited, slight leg muscle weakness
after walking, positive radiologic tests
All patients with Trigger points in predictable areas resulting in referred pain,
ropy feeling to palpation of trigger point
Chronic Pain
Not described
(
Pseudarthrosis Positive radiologic tests
(
Not described
(
Postfusion Stenosis Positive radiologic tests
(
Positive radiologic tests
(
Positive radiologic tests
(
Positive radiologic tests
(
Arachnoiditis Positive radiologic tests
i
Not described
Nerve Entrapment
Figure 1 1.
An illustrotion of the domoin, categories, and criteria for the clossification system developed by Bernard and Kirkaldy-Willis.22 LBP=low back pain,
SI=sacroillioc, SLR=straight leg roise, ROM=range of motion, PSIS=posterior superior iliac spine.
I
Level 1
CONSULTATION - PHYSICAL
THERAPY
REFERRAL
Deficit
Mechanics
Detick
Figure 13.
A summary of the 3 levels of classification of Delitto and colleagues. (Reprinted with permission of the American Physical Therapy Association from
Delitto et 01.23)
level of clinical decision making requires the therapist to tinued development. The categories for stage I1 and
stage the patient into 1 of 3 groups (stage I, stage 11, or stage I11 have yet to be described in the peer-reviewed
stage 111) based on the presence and severity of various literature.
functional limitations and disabilities, work status infor-
mation, and scores on a disability scale. When making The Classification System of McKenzie
decisions at the second level, therapists use only histor- The McKenzie system is a clinical guideline index
ical and disability data obtained from the patient. The designed for most, but not all, patients with LBP
examination is not done until the therapist is prepared (Tab. 1).z5 The structure of the McKenzie system is
to make clinical decisions at the third level. shown in Figure 3. The medical history consists of
questions related to symptom onset and symptom behav-
The third level of clinical decision making involves the ior associated with several different postures. The exam-
assignment of the patient, after being assigned to a stage, ination requires the therapist to observe the patient's
to one of the syndromes (categories) described for each posture and the alignment of several bony landmarks.
stage. The examination procedures for stage I were Trunk movements are observed for limitations and
described in a recent article.2" more elaborate descrip- frontal-plane deviations. Movements of the trunk are
tion of the stage I categories and treatments as well as observed, and the patient is questioned about the effect
examination and treatment information for stages I1 and of the movements on symptom location and intensity.
I11 appear in a recently published book chapter.Z4 The The therapist is also required to complete a neurological
categories described in the recently published book examination and to examine the patient's hip and
chapter for stage I syndromes are slightly different from sacroiliacjoints.
those described in the article. For example, the book
chapter described 3 different extension syndrome cate- McKenzie's classification system requires the clinician to
gories, whereas the article described only 1 extension classify the patient's problem into 1 of 13 c a t e g o r i e ~ . ~ ~
syndrome. Apparently, the classification system of The most commonly discussed categories are the pos-
Delitto and colleagues is undergoing a process of con- tural syndrome, the 4 dysfunction syndromes, and the 7
Purpose
Are the purpose, population, and setting clearly specified?
Content validity
Are the domain and all specific exclusions from this domain clearly specified?
Are all relevant categories included?
Is the breakdown of categories appropriate, considering the purpose?
Are the categories mutually exclusive?
Was the method of development appropriate?
If multiaxial, are criteria of content validity satisfied for each additional axis?
Face validity
An Approach for Critically Appraising Existing Content validity deals with whether the instrument of
Classification Systems interest includes everything needed to describe the
Buchbinder and c 0 l l e a g u e s ~ ~developed
~2~ an approach concept of interest (ie, the thing being measured) .71For
for appraising classification systems. This approach to the concept of content validity, one item poses the
critical appraisal consists of 7 concepts: (1) appropriate- following question: "Was the method of development
ness of purpose, (2) content validity, (3) face validity, appropriate?" Buchbinder and ~olleaguesZ~~2~ suggested
(4) feasibility, (5) construct validity, (6) reliability, and that classification systems should undergo a develop-
(7) generalizability. The authors adapted their approach ment process similar to health status m e a s u r e ~ . ~ ~ . ~ z ~ ~ ~
for examining classification systems from the psycholog- The categories in a classification system, in their view,
ical literature(j7and from work done to construct health should be chosen based on the opinions of a committee
status measure^.^^^^-^" of experts, not on the opinion of an individual. Accord-
ing to their system, a formal group consensus technique
A summary of the approach to critical appraisal devel- should be used to identify the categories. In addition,
oped by Buchbinder et al" is presented in Table 3. The they contended that a review of the literature should be
table lists the items used to judge each of the 7 concepts. used to supplement the classification system and that
Some concepts have only one item (eg, purpose), statistical techniques should be used in the process of
whereas other concepts have several items (eg, content development.
validity) to judge whether the classification system ade-
quately meets the concept. Each item is written in the For the concept of face validity, an item states, "Is the
fo1.111 of a question and is generally self-explanatory, nomenclature used to label the categories satisfactory?"
although some items require elaboration. Some categories in a classification system imply the
Purpose
Are the purpose, populatian, and setting Yes Yes Yes Yes
clearly specified?
Content validity
Are domain of interest and all specific Yes Yes Yes Yes
exclusions specified?
Are all relevant categories included? Yes Unknown No No
Are the categories mutually exclusive? Yes Unknown Yes No
presence a specific pathology. For example, in the and the QTF.26A summary description of the 4 classifi-
classification system of Bernard and Kirkaldy-Willis,22 cation systems is presented in Table 1. Readers should
there is a "piriformis syndrome" category. F e i n ~ t e i n , ~ ~note that I was the only person who reviewed the
who is a physician, suggested that diagnostic labels are classification systems. The reliability ofjudgments made
appropriate only for entities that can be verified with using the critical appraisal approach was not assessed for
valid diagnostic tests. Buchbinder and colleaguesz this article. For a more thorough description of the
agreed and suggested that categories implying the pres- critical appraisal, the reader is referred to the article by
ence of unverifiable pathology should not be used in Buchbinder et a1.21
classification systems. Diagnostic tests for piriformis syn-
drome have not been studied for validity. The use of the Purpose
term "piriformis syndrome," therefore, would appear to The purpose is well-defined for the 4 classification
be inappropriate. systems (Tab. 1). Ultimately, each classification must be
judged in the context of the purpose for which it was
An item under the concept of construct validity asks, designed. A summary of judgments related to the pur-
"Does it discriminate between entities that are thought pose of the 4 classification systems is given in Table 4.
to be different in a way appropriate for the purpose?" A
construct is a conceptual idea that might be used to Content Validity
explain a p h e n ~ m e n o nConstruct
.~~ validation may dem-
onstrate that a proposed construct actually exists or that Domain of interest and inclusion of relevant categories.
a new classification system differs from an existing The 4 classification systems differ with respect to the
When determining whether a classification system dis- method of development and inclusivity of the system. All
criminates between entities that are thought to be dif- classification systems clearly defined the domain of inter-
ferent, hypotheses should be tested. To test hypotheses, est, although only the system of Bernard and Kirkaldy-
data need to be collected and examined for relation- Willis" appeared to include all relevant categories based
ships. For example, if a classification system were on the purpose. The QTF system used an additional axis
designed to identify an effective treatment for patients, a to classify patients as either working or not working. The
study demonstrating that the treatment was more effec- QTF did not report why they chose not to use the axis
tive than other treatments would need to be done. This related to work status for all categories, as work status has
would be a study of prescriptive validity.75 been shown to influence Atlas and col-
l e a g u e ~ 'concurred
~ that work status should be assessed
The critical appraisal approach proposed by Buchbinder in other categories of the QTF system.
and colleaguesx is used in this article to critique 4 of the
more commonly discussed classification systems for The systems developed by Bernard and Kirkaldy-Willis"
patients with LBP: the systems proposed by Bernard and and the QTFZ6both include a category that accounts for
Kirkaldy-Willis," Delitto and ~ o l l e a g u e sM
, ~~~K~e ~
n z~i e , ~ ~
patients who d o not meet the criteria of the other
Face validiv
Is nomenclature used to label categories Yes Yes Yes
satisfactory?"
Are criteria for inclusion into categories Unknown Yes Yes
specified?
If yes, are the criteria reasonablezb Yes Yes
Do the criteria have demonstrated No No
validiv?
criteria for development. Delitto and colleagues relied presence of a pathology when using the system. Because
on the input of approximately a dozen clinicians, includ- many of these category labels have not been studied for
ing physical therapists, physicians, and chiropractors, validity, I contend that the terms are unsatisfactory.
when developing the medical history and examination
portions of their classification system. Delitto and col- I believe that all 4 classification systems have unsatisfac-
leagues, however, also relied on personal experience to tory data supporting the reliability and validity of the
develop decision rules for classifying patients into vari- criteria. Some data exist to support the reliability of
ous treatment categories. The McKenzie system and the some of the criteria in the classification systems of
Bernard and Kirkaldy-Willis system appear to be based Delitto and c ~ l l e a g u e and
s ~ ~McKen~ie,~"ut
~~ many
primarily on the clinical experiences of the developers of the criteria in the 4 classification systems have not
and therefore, in my view, have not undergone an been studied for reliability. The definitions for all of the
appropriate method of development. A summary of the criteria are not clearly specified for all 4 classification
content validityjudgments for the 4 classification systems systems. Delitto and c ~ l l e a g u e s , ~ V oexample,
r did
is given in Table 4. not define how to interpret performance on the side-
bending test, a critical examination procedure used in
Face Validity stage I. McIienzieZ5did not clearly define how to differ-
Face validity is judged from a variety of different per- entiate between an accentuated, normal, and reduced
spectives. Most of the items (see Tab. 5) relate to the lumbar lordoses. Bernard and Kirkaldy-UTillis22did not
criteria used to place patients into the various categories. define procedures for the majority of categories in their
One item addresses the category labels. The nomencla- classification system. The developers of the QTF systemz6
ture used to label categories was judged to be unsatisfac- did not define the procedures used to determine when a
tory only for the Bernard and Kirkaldy-U'illis system." patient should be assigned to the "other diagnoses"
Because the Bernard and Kirkaldy-Willis system relies on category. A patient, for example, may have pain in the
pathology-based diagnostic labels, users must deduce the area of the lumbar spine without radiation (category 1)
Construct validity
Does it discriminate between entities No Partially Partially Partially
thought to be different in a way
appropriate for the purpose?"
Does it perform satisfactorilyb compared Unknown Unknown Unknown Unknown
with other systems with similar
purposes?
Reliability
"This item asks whether there are any ciala lo suggest the classification aystem ran hc. used for its intended purpose.
"Satistiirtory, in this contcxt, relates to whether data support the llse of a classification system fc)r clinical decisiot~nraking.
the authors artificially controlled for a major source of studied by several groups and appears to have the
error. In addition, the number of patients and therapists strongest evidence for generalizability of the 4 classifica-
participating in the study was small, which further limits tion systems that were reviewed. Table G summarizes the
the usefulness of the study. generalizability judgments for the 4 systems.
Kiddle and Kothsteinx' examined the intertester reliabil- Summary of Critical Appraisal
ity of classifications made o n 363 patients with I,BP The critical appraisal of the 4 classification systems
referred to 1 of 8 clinics. Therapists (N=49) were given demonstrates that each classification system has
written summaries of the McKenzie system that were strengths and weaknesses. The 4 classification systems
based o n McKenzie's b0ok.2~Randomly paired thera- have a clearly defined purpose, and the population of
pists examined each patient independently. The kappa interest and setting are either clearly defined or implied.
coefficient and percentage of agreement were used to In the area of content validity, the system of Delitto and
describe reliability. Therapists agreed 39% of the time colleague^^^^^^ appears to hold promise, hut much is
( ~ = . 2 6 )o n which syndrome was present. Therapists unknown because the system has yet to be fully described
with postgraduate training in the McKen~ie system in the peer-reviewed literature. The McKenzie system25
agreed on the type of syndrome 27% of the time demonstrated some problems in the area of content
( ~ = . 1 5 ) .These data suggest that classifications made validity, primarily because of the issue of exhaustiveness.
using the McKenzie system are unreliable. Modifications The QTF system'" does not have mutually exclusive
of the criteria and definitions appear to be needed to categories, and the work status and synlptom duration
enhance the reliability of classifications. Table 6 summa- axes are missing for some categories.
rizes the reliability judgments for the 4 systems.
The face validity is generally weak for all systems because
Generalizability of the lack of data supporting the reliability and validity
Generalizability is the final concept in the critical of the criteria used to form the categories. Buchbinder
appraisal approach. To assess generalizability, I reviewed et al" reported similar findings for classification systems
the literature to determine whether the classification of the neck and upper limb. The Bernard and Kirkaldy-
systems had been used in other studies and settings. No Willis system22 was especially weak in the area of face
other studies were found that examined the usefulness validity. With the exception of the system of Delitto and
of the Bernard and Kirkaldy-Willis system.22The system c0lleagues,~"~4all systems scored fairly high for the
of Delitto and ~ o l l e a g u e shas ~ ~been
~ ~ ~ examined in concept of feasibility. More description of the system by
other settings, but these studies were conducted by the Delitto et a1 is needed to make judgments related to
system d e v e l ~ p e r s .The
~ ~ : generalizability
~~ of the system feasibility.
of Delitto and colleagues has yet to be demonstrated.
One group of independent investigators has examined Constrrlct validity, reliability, and generalizability are
the QTF ~ y s t e m . ~ V hMcKenzie
e systemP5 has been concepts that require published data for making judg-