You are on page 1of 11

Best Practice & Research Clinical Rheumatology 24 (2010) 181–191

Contents lists available at ScienceDirect

Best Practice & Research Clinical


Rheumatology
journal homepage: www.elsevierhealth.com/berh

Treatment-based subgroups of low back pain: A guide to


appraisal of research studies and a summary of
current evidence
Steven J. Kamper, BAppSc, PhD, Physiotherapist a, *, Christopher G. Maher,
PhD, Professor of Physiotherapy a, Mark J. Hancock, PhD, Physiotherapist,
Lecturer b, Bart W. Koes, PhD, Professor of General Practice c, Peter R. Croft,
PhD, Professor of Primary Care Epidemiology d, Elaine Hay, MD, Professor of
Community Rheumatology d
a
The George Institute for International Health, University of Sydney, PO Box M201, Missenden Rd, Camperdown, NSW 2050, Australia
b
Faculty of Health Sciences, University of Sydney, Lidcombe, Australia
c
Erasmus University Medical Center, Rotterdam, the Netherlands
d
Keele University Primary Care Research Centre, Keele, UK

Keywords:
There has been a recent increase in research evaluating treatment-
low back pain based subgroups of non-specific low back pain. The aim of these
subgroups sub-classification schemes is to identify subgroups of patients who
effect modification will respond preferentially to one treatment as opposed to another.
interaction Our article provides accessible guidance on to how to interpret this
prognosis research and determine its implications for clinical practice. We
propose that studies evaluating treatment-based subgroups can be
interpreted in the context of a three-stage process: (1) hypothesis
generation–proposal of clinical features to define subgroups;
(2) hypothesis testing–a randomised controlled trial (RCT) to test
that subgroup membership modifies the effect of a treatment; and
(3) replication–another RCT to confirm the results of stage 2 and
ensure that findings hold beyond the specific original conditions. At
this point, the bulk of research evidence in defining subgroups of
patients with low back pain is in the hypothesis generation stage;
no classification system is supported by sufficient evidence to
recommend implementation into clinical practice.
Ó 2009 Elsevier Ltd. All rights reserved.

* Corresponding author. Tel.: þ61 2 8238 2413; fax: þ61 2 9657 0301.
E-mail address: skamper@george.org.au (S.J. Kamper).

1521-6942/$ – see front matter Ó 2009 Elsevier Ltd. All rights reserved.
doi:10.1016/j.berh.2009.11.003
182 S.J. Kamper et al. / Best Practice & Research Clinical Rheumatology 24 (2010) 181–191

For about 3 decades, the position adopted in evidence-based treatment guidelines has been that the
source of pain cannot be determined for most patients (up to 90%) presenting to primary care with
low back pain (LBP). Most guidelines recommend that such patients be assigned to the classification
‘non-specific LBP’ [1–3] and be provided with generic treatment. Recently, there has been some
reconsideration of this position and it has been suggested that it may be better to divide patients with
non-specific LBP into treatment-based subgroups that inform the choice of specific treatment for that
individual [4]. Importantly, this is also the position adopted by many clinicians who use a subgroup
approach to direct treatment [5].
Some subgroups are based to some extent on putative pathoanatomy [6], while others are based on
clinical findings such as psychosocial characteristics (or yellow flags) [7] or characteristic patterns of
signs and symptoms [8]. What unifies most schemes is an underlying belief that the effect of treatment
will be greater when patients receive the specific treatment that matches their subgroup. Proponents
of treatment-based subgroups argue that this approach offers the possibility of much larger treatment
effects than are typically observed after applying generic treatments to all patients with non-specific
LBP. The argument is that mean group treatment effects may be diluted by the inclusion of subgroups of
LBP subjects for whom the treatment is not effective [9]. If treatment-based subgroups could be reliably
identified, it would represent an important advance in LBP treatment, and the pursuit of this goal has
been identified as a priority for LBP researchers [10].
The aim of this article is to illustrate the key methodological issues in this area, provide clinicians
with a better understanding of the literature in LBP and thus present implications for clinical practice
and future research. We begin by defining some key concepts and then describe the process to identify
and test the existence of LBP subgroups which respond differently to a treatment. We conclude with
a brief summary of the state of evidence so far in relation to subgroups of subjects with LBP.

Key concepts

Treatment effect modification

The effect of treatment is the difference in outcome between the treatment and control groups.
A system for treatment-based subgroups needs to reliably identify patients where the effect of
treatment is consistently greater than it would be for the whole group. A characteristic that defines the
subgroup, for example, gender or high pain intensity, is described as a treatment effect modifier.
Subgroups may be defined by the presence of one or several effect modifiers.
The potential for treatment-based subgroups is often justified by reference to the variability of
patient outcomes observed in clinical practice and also within the treatment arm of clinical trials.
However, variability in treatment outcomes can arise for reasons other than treatment effect modifi-
cation. For example, variability in outcomes can be due to patients having variable prognoses
(regardless of treatment) or because of random variation in a patient’s response to treatment. Vari-
ability in outcome due to either of these reasons would not contribute to defining a subgroup of
patients for which the effect of treatment is consistently greater. What is required is treatment effect
modification where subgroups of patients reliably exhibit greater effects of treatment.

Distinguishing treatment effect modifiers and prognostic factors

It is important to distinguish between factors predictive of patient outcomes (prognostic factors)


and those that predict treatment effects (treatment effect modifiers). Prognostic factors relate to the
susceptibility of a patient’s condition to time, while treatment effect modifiers relate to the suscetibility
of their condition to a specific treatment. An important point is that single-arm studies cannot quantify
treatment effects (difference in outcomes between experimental and control groups) and so cannot
identify treatment effect modifiers. Clinically, there may be value in identifying patients with good
prognoses; this information may be used to reassure patients and can limit the implementation of
unnecessary interventions. However, recognition of the difference between the two concepts is crucial.
Illustrative example: A single-arm study incorrectly interpreted as providing evidence of effect
modification.
S.J. Kamper et al. / Best Practice & Research Clinical Rheumatology 24 (2010) 181–191 183

Predicting response of patients with neck pain to cervical manipulation. Tseng and colleagues con-
ducted a prospective cohort study on 100 patients with neck pain, all of whom received cervical
manipulation [11]. Outcome was assessed with subjective global improvement or changes in pain
rating and patients were classified as ‘responders’ or ‘non-responders’ based on these variables. The
authors used regression analyses to identify baseline demographic and clinical characteristics asso-
ciated with outcome. They reported that the following factors predicted the outcome: low disability
score, bilateral symptoms, not performing sedentary work, feeling better while moving, not feeling
worse when extending the neck and diagnosis of spondylosis without radiculopathy. Their conclusion,
however, that these variables predict response to treatment (cervical manipulation) is not supported.
While this study does enable us to identify factors associated with a favourable prognosis, we do not
know that it is the effect of manipulation that drives the improved outcome for patients with these
characteristics.
While prognostic factors and treatment effect modifiers may overlap in some instances [12], in
other cases they do not [13]. More importantly there are examples where the same factor predicts
a favourable response to treatment although an unfavourable response to time. For example, in
Stewart’s study of whiplash [14], high baseline pain predicted a greater response to exercise treatment
(when compared with advice) but, by itself, high pain is an adverse prognostic factor for spinal pain
[15,16]. Accordingly, the use of single-arm studies to generate information on treatment effect modi-
fication is unwise [17].

Study design considerations

While there has been a sharp rise in the amount of research evaluating treatment-based subgroups,
unfortunately, not all of it is methodologically sound. To establish that subgroup membership influ-
ences the effect of treatment, we need a design whereby patients are classified in one subgroup or
another, and they receive the treatment or the control, represented by the four cells of a 2  2 table
(Fig. 1a). A well-known study that used this design is the Childs et al. [12] trial, which reported that the
effect of spinal manipulation was greater in those who were positive on a clinical prediction rule than
in those who were negative. As shown in Fig. 1b, the subjects in the trial could be divided into the four
cells based upon the treatment they received and their rule status. A modified version of this approach
compares the outcomes of patients who were randomised to receive treatment matched to their
classification (subgroup) with patients who received treatment not matched to their classification. An
example of this is the study of Long et al. [18] where subjects with a directional preference were
allocated to exercise in the matched direction, the opposite direction, or all directions. In this case the
design can be represented in a 3  3 table (Fig. 1c); this more complex design is discussed further in
a later section.
Unfortunately, many researchers have used flawed designs to estimate if subgroups influence
treatment effects. One mistake is to give all subjects the same treatment and to compare outcomes in
those in the subgroup and those not. As there is no control group, this design cannot estimate the effect
of treatment and so cannot establish if subgroup membership influences the effect of treatment. The
study of Tseng and colleagues mentioned above is an example of this design (Fig. 1d). The second
mistake is to only enroll subjects who fit the subgroup and then allocate subjects to receive or not
receive the treatment. While this design can estimate the effect of treatment, it cannot establish if the
effect is greater in those within the subgroup than in those not in the subgroup. An example of this is
the O’Sullivan et al. [19] randomised controlled trial (RCT) of stabilisation exercise (Fig. 1e). This trial is
frequently misrepresented as providing evidence that stabilisation exercise works best in the subgroup
of patients with instability but, as all subjects enrolled in the O’Sullivan trial had instability, this
conclusion is erroneous.

Process of developing treatment-based subgroups

The process of developing treatment-based subgroups can be divided into three stages (Fig. 2):
(1) hypothesis generation; proposal of factors/variables that may be treatment effect modifiers;
(2) hypothesis testing; establishing preliminary evidence that subgroups respond differently to one
184 S.J. Kamper et al. / Best Practice & Research Clinical Rheumatology 24 (2010) 181–191

a Treatment Control

Subgroup 1 A B

Subgroup 2 C D

b Treatment Control

Rule +ve Manipulation Exercise

Rule -ve Manipulation Exercise

c Matched Unmatched Non-specific

Extension Extension Flexion Non-specific

(n=191) exercises exercises exercises

Flexion Flexion Extension Non-specific

(n=16) exercises exercises exercises

Lateral Lateral exercises Lateral exercises Non-specific

(n=23) (direction a) (direction b) exercises

d Treatment

Subgroup 1 Manipulation

Subgroup 2 Manipulation

e Treatment Control

Subgroup 1 Stabilization exercises Usual care

Fig. 1. a. Design to evaluate treatment subgroups. b. Design used in the Childs et al. trial. c. Design used in the Long et al. trial.
d. Inadequate design to evaluate treatment subgroups. This design provides information on prognosis not treatment effects so it
cannot provide information on treatment effect modification, e.g. Tseng et al. study. e. Inadequate design to evaluate treatment
subgroups. This design provides information on treatment effect but not treatment effect modification, e.g. O’Sullivan et al. study.

treatment as opposed to another; and (3) replication; testing whether preliminary observations are
generalisable when tested outside the bounds of the original RCT.

1. Hypothesis generation–proposal of potential effect modifiers

The aim of the hypothesis generation step is to obtain a list of plausible characteristics that are
worth investigating as potential treatment effect modifiers. Candidate variables may be identified via
generalisation, theoretical/biological rationale or clinical lore.

 Generalisation: It is possible that prognostic factors may also be treatment effect modifiers or that
effect modifiers for related treatments may generalise; so, candidate effect modifiers may be
sourced from previous research; for example, Tseng et al. [11] (example above) found that bilateral
symptoms predict better outcome in patients receiving manipulation – this finding may be worth
investigating as a potential effect modifier in an RCT.
S.J. Kamper et al. / Best Practice & Research Clinical Rheumatology 24 (2010) 181–191 185

Hypothesis Generation
Aim: Identify a small number of variables (effect modifiers) to define a subgroup
and a plausible reason as to why this subgroup would respond to a particular
treatment.

Methods: Variables may be identified via:


Previous research;
Biological rationale;
Clinical lore.

Hypothesis Testing
Aim: Evaluate whether subgroups patients defined by the candidate variable
respond differently to a particular treatment.

Methods: Randomized controlled trial with attention to:


Pre-specified analyses;
Adequate power;
Limited number of comparisons;
Appropriate analysis (interaction tests).

Replication and Generalisation


Aim: Confirm the results found in the previous stage (replication) and test the
extent to which they will hold outside the conditions of the original RCT
(generalization).

Methods: Repeat of RCT as above.


Replication: similar - patients, setting, therapists, interventions.
Generalization: slightly different – patients, setting, therapists,
interventions.

Fig. 2. Flowchart describing the process of defining subgroups of patients based on treatment response.

 Biological rationale: Where there is a strong physiological, anatomical or psychological theory as to


why some patients should respond to a particular treatment, for example, non-steroidal anti-
inflammatory drugs being more effective in patients with inflammatory pain [20].
 Clinical lore: the experience of respected clinicians or established texts may be used to propose
candidate variables, for example, centralisation as a clinical feature that predicts the effect of the
McKenzie treatment [6].

Guidelines for testing treatment effect modification emphasise the need to constrain testing to
a limited number of plausible effect modifiers [21,22]. Limiting the number of candidate variables
reduces the likelihood of chance findings due to multiple comparisons (Type 1 error).
Illustrative example: A large number of variables tested in a small sample. Predicting response to
lumbopelvic manipulation in subjects with patellofemoral pain.
Iverson and colleagues [23] recruited a sample of 50 subjects with patellofemoral pain. Over 30
demographic and clinical variables were collected before subjects received a manipulative treatment
followed by assessment of outcome. Regression analyses were used to develop a clinical prediction rule
comprising five variables; subjects positive on at least three items were found to have significantly more
favourable outcome than those with two or less. In this study the large number of candidate predictors
tested raises the likelihood that some associations will be found merely by chance (type 1 Error); that is,
the findings are an artefact of the data and peculiar to that study. As outlined previously, the fact that this
was a single-arm study also means that the authors’ conclusion that status on the rule predicts response
to treatment is not supported.
In preparation for the hypothesis testing stage, researchers should review the theoretical rationale
for each potential effect modifier; in particular, the case for why the variable should logically be
186 S.J. Kamper et al. / Best Practice & Research Clinical Rheumatology 24 (2010) 181–191

associated with the effect of a specific intervention. For example, researchers may posit that patients
with high levels of fear of pain will see greater improvements in pain and disability outcomes with
a graded exposure programme than patients with low levels of fear [24]. At this point it is useful to
distinguish between variables which are themselves targets for intervention (in the earlier example:
fear as the target of a graded exposure intervention) and those which potentially modify other
treatments (e.g., fear may nullify the effect of advice to exercise).
A practical goal of the hypothesis-generating stage is to limit the number of candidate variables
tested in the second stage (hypothesis testing) of the process.

2. Hypothesis testing – testing of the potential effect modifiers

The second stage involves performing an RCT with sufficient power to detect an interaction between
each of the potential effect modifiers and treatment allocation. The optimal design for testing effect
modifiers is a high-quality and large RCT with subjects randomised to the index treatment or a control.
The appropriate statistical test is an interaction term between the potential effect modifier and treat-
ment either within a single RCT or as part of a meta-analysis. Ideally, the analysis plan for testing
treatment interaction is pre-specified and recorded in the published trial protocol and registry [25].
These last two steps allow a reader to check whether selective reporting of results from a larger set of
analyses is a possibility. However, subgroup analyses will often be published as a post-hoc/secondary
analysis of the original trial [26]; these reports provide a less convincing form of evidence, and vali-
dation in a new sample is particularly important.
Illustrative example: A post-hoc analysis of data from an RCT testing for effect modifiers. Possible effect
modifiers for a workplace-graded activity programme in patients with occupational LBP.
Steenstra and colleagues conducted a secondary analysis, to investigate effect modifiers, of data
from an RCT that tested a workplace-graded activity programme versus usual care in patients off work
with LBP [27]. The authors chose the following six potential effect modifiers based on the literature on
prognosis and rationale: age, gender, pain, functional status, heavy work and sick leave in the past 12
months. They used suitable statistical methods (interaction tests), the number of variables (six) was
appropriate given the sample size (n ¼ 196) and there is reason to believe at least some of the variables
(e.g., heavy work and functional status) would be amenable to the intervention. They found that the
intervention was more effective in older subjects and those who reported sick leave in the past 12
months. In recognition of the post hoc nature of the study, the authors correctly concluded that their
findings are exploratory and still require formal testing.
To lessen the potential for spurious findings, it is advised to limit testing to the primary outcomes
and a small number of pre-defined, plausible treatment effect modifiers. Where it is not clear that this
has occurred, readers should view positive findings with caution. Negative results from hypothesis
testing are also not straightforward. This is because RCTs will have less power to test for treatment
effect modification than to test for the main effects of treatment. As a rule of thumb, the sample size
needed to test for an interaction is approximately 4 times that required for a test of the main effect [28].
Ideally, the analyses should obtain a sufficiently precise estimate of the interaction between treatment
and subgroup. This is best judged by inspecting the 95% confidence interval (CI) for the interaction
effect. A common result is that the observed p value for the interaction test is greater than the critical
value (e.g., 0.05) but the CI includes clinically important effect modification. In such a case, it would be
premature to rule out the possibility of treatment-effect modification.
If the treatment effect modification analyses are robust, it is important to then consider the
methodological quality of the RCTs. Methodological quality is important in the light of evidence that
low-quality trials may report exaggerated effect sizes [29]. It is essential to note that any treatment
subgroup identified in this stage may only be specific to the treatment contrast used in the study. For
example, having identified a subgroup that recovers more quickly following an exercise intervention
than following ultrasound, it does not necessarily follow that the same subgroup will recover more
quickly with exercise than with joint mobilisation. Lastly, when interpreting subgroup analyses, it is
important to acknowledge that confounding is possible as most factors that define a subgroup are non-
randomised comparisons. For example, treatment subgroups formed on the basis of level of fear
S.J. Kamper et al. / Best Practice & Research Clinical Rheumatology 24 (2010) 181–191 187

avoidance may differ on other clinical (e.g., pain intensity) or psychological (e.g., self-efficacy) char-
acteristics and it could be these other characteristics that moderate the effect of treatment.
Treatment effect modification observed in a large well-conducted RCT provides preliminary
evidence that subgroups of patients may respond differently to the same treatment. Completion of this
stage, however, is not sufficient to recommend application of the results to clinical practice. Replication
is necessary to verify and substantiate the findings.
Illustrative example: Appropriate methods for testing a hypothesis within an RCT. Predicting response to
lumbar manipulation in subjects with LBP.
Childs and colleagues [12] set out to validate the rule developed previously by the same group [30].
The rule was designed to predict response to manipulative treatment among subjects with LBP. The
hypothesis was generated based on a previous single-arm trial by Flynn et al. [30], who found that
subjects with certain characteristics (rule positive) had a better prognosis after receiving Spinal
Manipulative Therapy (SMT). The authors evaluate the rule as an effect modifier within a methodologi-
cally sound RCT. The likelihood of spurious findings was reduced by limiting the number of potential
effect modifiers tested to one (i.e., positive or negative on the rule), pre-planned analysis of the inter-
action term and power analysis to guide sample size requirement. The study found a significant inter-
action between status on the rule and the effect of manipulation.
RCTs where subjects are assigned to a treatment regimen ‘matched’ or ‘unmatched’ to their classifi-
cation offer an alternate approach to subgroup identification. In these studies, subjects are categorised
based on their clinical presentation according to a particular classification system. Two systems that have
been investigated in this way are the McKenzie method [18] and the system described by Brennan, Fritz
and colleagues [31,32]. The aim of this design is to test the effectiveness of the classification system as
a whole, as opposed to the effect of a treatment in one subgroup compared with another. These studies
therefore address a related, but slightly different, research question. While this approach provides some
information about the classification system, there are important limitations that prevent such studies
from elucidating the specific effects of a particular intervention on patients belonging to a subgroup. To
understand the interaction between a subgroup of patients and an intervention, we need a large enough
number of the members of each subgroup to be randomised to either a treatment or a control. In studies of
this design, however, the numbers of subjects within any one subgroup may be small and those rando-
mised to control may receive any of several interventions. Results are presented as a mean difference in
the effect of receiving a matched treatment and receiving an unmatched treatment for the whole cohort.
Illustrative example: Testing the effectiveness of a complete classification system. Predicting response to
a McKenzie programme in LBP patients.
Long and colleagues [18] conducted a study to test the McKenzie protocol. Subjects were placed in
one of three directional preference (DP) categories during the baseline assessment and they were then
randomised to receive a treatment which was matched or unmatched to their DP. Analysis compared
outcomes in those receiving treatment matched to their classification with those receiving one of three
unmatched (control) treatments and found better outcomes in the former group. While the design of
this study involves randomisation, sufficient data are not provided to enable readers to assess the
interaction between subgroups (e.g., extension DP) and a specific treatment contrast (e.g., extension
exercises versus non-specific exercises). For example, we do not know whether and by how much
a patient classified with extension DP will do better with extension exercises than with non-specific
exercises. Presentation of results for subjects allocated to each treatment regimen (extension, flexion,
lateral flexion and evidence based) divided according subgroups (extension, flexion and lateral flexion)
would enable readers to assess this interaction. It is likely, given the small proportion of the sample in
two of the subgroups (see Fig. 1c) that the study would not have sufficient statistical power to detect
such an interaction were it to exist.

3. Replication – assessing generalisability

Replication of initial findings involves re-testing the interaction between the effect modifier(s) and
treatment in an independent RCT. All the same issues regarding a priori specification of analyses, power and
methodological quality outlined above are relevant at this stage too. Replication is necessary to confirm
that initial results are not chance findings and also to test the extent to which the findings generalise [21].
188 S.J. Kamper et al. / Best Practice & Research Clinical Rheumatology 24 (2010) 181–191

Given the variability in effect sizes reported in RCTs testing the same intervention [33], we might
similarly expect that there is variability in treatment interaction effect sizes. As with the main effects of
treatment, some of this variability from study to study will be due to chance and some may be due to
study-level factors because true replication of an RCT is difficult to accomplish. Further, studies in
which models are derived commonly report greater predictive ability than those in which the models
are validated [34]. This expectation reinforces the view that replication of results is a critical step in
confirming the usefulness of the subgroups as first identified. Replication may be achieved through
conduct of new trial or meta-analysis of existing trials if suitable trials are available. On a similar theme,
collaboration between research groups offers the opportunity to increase the power of analyses by
combining individual patient data from multiple studies.
Illustrative example: Combining data sets to improve power. Predicting response to conservative treat-
ment in subjects with tennis elbow.
Although not specifically related to back pain, this study provides a worthwhile example of how
data from multiple studies can be combined for the purpose of identifying subgroups. Bisset and
colleagues [35] pooled individual patient data from two RCTs evaluating physiotherapy, corticosteroids
and wait list for patients with lateral epicondylalgia (tennis elbow). This technique increases the
statistical power of the analyses, improving the capacity to identify relevant interaction effects. The
authors formed subgroups based on four baseline variables and performed pre-planned analyses
assessing the interactions between the subgroups and treatment allocation. While they found statis-
tically significant effects in two of their analyses, the authors were cautious in attributing relevance to
them given the small effect sizes and likelihood of type 1 error due to multiple comparisons.
Generalisation is different and involves testing effect modification in different scenarios, for example,
different clinics, different therapists and different treatment doses. Some authors refer to replication and
generalisation as narrow and broad validation studies. Narrow validation refers to replication in an RCT
as similar as possible to the original in terms of setting, patient group, therapists and interventions.
Broad validation is intended to test the extent to which the findings will hold as these factors are
modified, for example, the Childs (2004) [12] and Hancock (2008) studies [36].
Illustrative example: Broad validation of a hypothesis. Predicting response to mobilisation in subjects
with LBP.
Hancock et al [36]. performed a pre-planned analysis on data from an RCT to test the generalisability
of the Flynn/Childs prediction rule. The aim was to determine whether the findings held in a typical
primary care setting, with the choice of manipulative technique left to the clinician. The study failed to
show a significant interaction between rule status and allocation to manipulative treatment. This study
can be considered a broad validation test of the above rule. At present, the rule to define subgroups of
LBP patients that respond preferentially to manipulative treatment has not been shown to generalise
beyond the conditions of the original RCT; this may be due to differences in the subject sample or the
intervention used.

Current evidence

Considerable research efforts have gone into attempting to identify subgroups within the pop-
ulation of patients with non-specific LBP. The vast majority of research to date however falls into the
hypothesis generation stage of investigation. The classification system that has undergone the most
thorough investigation thus far is the rule designed to predict response to manipulation (Childs and
colleagues [12,30,36]), this being part of a more extensive classification system described by Fritz and
colleagues [8,32,37]. While results have been encouraging, there is at present insufficient evidence to
recommend that either the rule or the wider classification system be adopted in clinical practice.
Psychosocial risk factors (yellow flags) have been identified as prognostic factors in studies on
patients with LBP [38] and some treatment guidelines recommend screening for these factors early in
the course of the condition [39]. This being the case, it is believed that specific interventions designed
to modify these factors might improve outcome in those patients with a suitable clinical profile.
However, findings from studies that apply targeted treatment to patients with psychological
dysfunction have so far been equivocal [24,40–42]. In general, evidence remains far from conclusive
and no classification systems have passed through the three-stage process.
S.J. Kamper et al. / Best Practice & Research Clinical Rheumatology 24 (2010) 181–191 189

At this point, research has failed to demonstrate the utility of any classification system with suffi-
cient certainty to recommend incorporation into clinical practice. There are some promising lines of
research but all require further refinement and testing.

Summary

In brief, although guidelines place the vast majority of patients with back pain in one homogeneous
category, many believe that it is possible to divide patients into smaller subgroups. The aim of some
classification schemes is to match subgroups of patients with particular treatments in the belief that
they will experience better outcomes than with a generic management course. To define subgroups, it
is necessary to identify factors that differentiate between those in a particular subgroup and those not.
The factors that define a subgroup based on response to a treatment are called ‘effect modifiers’ and
their identification requires a particular empirical approach. We provide some background to the issue
of defining subgroups and present a three-stage process by which effect modifiers can be identified and
tested. The stages are: hypothesis generation, hypothesis testing and replication. Hypothesis genera-
tion involves the proposal of a limited number of candidate effect modifiers: this may be conducted
through any number of study designs but the proposed factors should be linked by a plausible rationale
as to why they should interact with a particular treatment. Hypothesis testing requires an RCT to test
for the interaction between candidate effect modifiers and the selected treatment. The final stage
requires replication of the RCT testing the effect modifiers necessary to confirm the results and ensure
that the findings hold outside the confines of the original trial. The process presented here is intended
as a guide to design of new research in the area and to assist in interpretation of published studies.

What does this mean for clinical practice?

The evidence for treatment-based subgroups is not compelling and this is unlikely to change in the
near future. At present, the best estimate of the likely effect of treatment for an individual is the group
effect from a large high-quality trial or systematic review. In clinical scenarios where there are multiple
effective treatments available, treatment selection can be based upon consideration of factors such as
patient preferences, treatment availability, likely cost and inconvenience of competing treatments and
the likely risk of side effects. With complex interventions such as cognitive behavioural treatment or
motor control exercise, the expertise of the clinician should also be borne in mind. Health-care
professionals should continue to rely on their clinical judgement when making management decisions
for patients with LBP.

Research agenda

 Research should concentrate on prospective evaluation of existing hypotheses regarding


candidate effect modifiers. This would involve pre-planned analyses of appropriately pow-
ered RCTs.

Conflicts of interest statement

The authors have no conflicts of interest to declare.

References

[1] Chou R, Qaseem A, Snow V, et al. Diagnosis and Treatment of Low Back Pain: a joint clinical practice guideline from the
american college of physicians and the American Pain Society. Ann Intern Med 2007;147(7):478–91.
[2] Rossignol M, Arsenault B, Dionne C, et al. Clinic in Interdisciplinary Practice (CLIP) guidelines. Montreal: Direction de
sante publique, agence de la sante et des services sociaux de Montreal; 2007.
190 S.J. Kamper et al. / Best Practice & Research Clinical Rheumatology 24 (2010) 181–191

[3] van Tulder MW, Becker A, Bekkering T, et al. European Guidelines for the management of acute non specific low back pain
in primary care. Eur Spine J 2006;15(Suppl. 2):S169–91.
*[4] Kent P, Keating J. Do Primary-Care Clinicians Think That Nonspecific Low Back Pain Is One Condition? Spine 2004;29(9):
1022–31.
*[5] Kent P, Keating JL, Kent P, Keating JL. Classification in nonspecific low back pain: what methods do primary care clinicians
currently use? [Research Support, Non-U.S. Gov’t]. Spine 2005 Jun 15;30(12):1433–40.
[6] McKenzie R, May S. In: The lumbar spine: mechanical diagnosis and therapy. 2nd edn. Waikanae, New Zealand: Spinal
Publications Ltd; 2003.
[7] Gheldof ELM, Vinck J, Vlaeyen JWS, et al. Development of and recovery from short- and long-term low back pain in
occupational settings: A prospective cohort study. Eur J Pain 2007;11(8):841–54.
[8] Fritz J, Clelland J, Childs J. Subgrouping patients with low back pain: evolution of a classification approach to physical
therapy. J Orthop Sports Phys Ther 2007;37(6):290–302.
*[9] Delitto A. Research in low back pain: time to stop seeking the elusive ‘‘magic bullet’’ Phys Ther 2005;85(3):
202–4.
[10] Guccione A, Goldstein M, Elliott S. Clinical research agenda for physical therapy. Phys Ther 2000;80(5):499–513.
[11] Tseng Y, Wang W, Chen W, et al. Predictors for the immediate responders to cervical manipulation in patients with neck
pain. Man Ther 2006;11:306–15.
*[12] Childs J, Fritz J, Flynn T, et al. A clinical prediction rule to identify patients with low back pain most likely to benefit from
spinal manipulation: a validation study. Ann Intern Med 2004 Dec 21;141(12):920–8.
*[13] Underwood M, Mortin V, Farrin A, Uk BT. Do baseline characteristics predict response to treatment for low back pain?
A secondary analysis of the UK BEAM data set. 2007;46:1297–302.
[14] Stewart MJ, Maher CG, Refshauge KM, et al. Randomized controlled trial of exercise for chronic whiplash-associated
disorders. Pain 2007;128(1-2):59–68.
[15] Kent P, Keating J. Can we predict poor recovery from recent-onset nonspecific low back pain? A systematic review. Man
Ther 2008;13:12–28.
[16] Scholten-Peeters GGM, Verhagen AP, Bekkering GE, et al. Prognostic factors of whiplash-associated disorders:
a systematic review of prospective cohort studies. Pain 2003;104:303–22.
[17] Thiel H, Bolton J. Predictors for immediate and global responses to chiropractic manipulation of the cervical spine.
J Manipulative Physiol Ther 2007;31:172–83.
[18] Long A, Donelson R, Fung T. Does it matter which exercise? A randomized control trial of exercise for low back pain. Spine
2004;29(23):2593–602.
[19] O’Sullivan PB, Twomey LT, Allison GT. Evaluation of specific stabilizing exercises in the treatment of chronic low back pain
with radiologic diagnosis of spondylolysis or spondylolisthesis. 1997;22(24):2959–67.
[20] Koes B, Scholten R, Mens J, Bouter L. Efficacy of non-steroidal anti-inflammatory drugs for low back pain: a systematic
review of randomised clinical trials. Ann Rheum Dis 1997;56(4):214–23.
[21] Klebanoff M. Subgroup analysis in obstetrics clinical trials. Am J Obstet Gynecol 2007:119–22.
*[22] Rothwell PM. Treating individuals 2. Subgroup analysis in randomised controlled trials: importance, indications, and
interpretation. Lancet 2005;365:176–86.
[23] Iverson C, Sutlive T, Crowell M, et al. Lumbopelvic manipulation for the treatment of patients with patellofemoral pain
syndrome: development of a clinical prediction rule. J Oprthop Sports Phys Ther 2008;38(66):297–309.
[24] Klaber Moffett J, Carr J, Howarth E. High fear-avoiders of physical activity benefit from an exercise program for patients
with back pain. Spine 2004;29(11):1167–73.
[25] Hay E, Dunn K, Hill J, et al. A randomised clinical trial of subgrouping and targeted treatment for low back pain compared
with best current care. The STarT Back Trial Study Protocol BMC. Musculoskel Dis 2008;9.
[26] Petersen T, Larsen K, Jacobsen S. One-year follow-up comparison of the effectiveness of McKenzie treatment and
strengthening training for patients with chronic low back pain. Spine 2007;32(26):2948–56.
[27] Steenstra I, Knol D, Bongers P, et al. What works best for whom? Spine 2009;34(12):1243–9.
*[28] Brookes S, Whitely E, Egger M, et al. Subgroup analyses in randomized trials: risks of subgroup-specific analyses; power
and sample size for the interaction test. J Clin Epidemiol 2004;57(2):229–36.
[29] Schultz K, Chalmers I, Hayes R, Altman D. Empirical evidence of bias. Dimensions of methodological quality associated
with estimates of treatment effects in controlled trials. J Am Med Assoc. 1995;273:408–12.
[30] Flynn T, Fritz J, Whitman J, et al. A Clinical Prediction Rule for Classifying Patients with Low Back Pain Who Demonstrate
Short-Term Improvement With Spinal Manipulation. Spine 2002;27(24):2835–43.
[31] Brennan G, Fritz J, Hunter S et al. Identifying subgroups of patients with acute/subacute "nonspecific" low back pain:
results of a randomized clinical trial Spine 2006.
[32] Fritz J, Delitto A, Erhard R. Comparison of classification-based physical therapy with therapy based on clinical practice
guidelines for patients with acute low back pain. Spine 2003;28(13):1363–72.
[33] Machado LA, de Souza MS, Ferreira PH, Ferreira ML. The McKenzie method for low back pain: a systematic review of the
literature with a meta-analysis approach. Spine 2006 Apr 20;31(9):E254–62.
*[34] Toll D, Janssen K, Vergouwe Y, Moons K. Validation, updating and impact of clinical prediction rules: A review. J Clin
Epidemiol 2008;61:1085–94.
*[35] Bisset L, Smidt N, Van der Windt DAWM, et al. Conservative treatments for tennis elbowddo subgroups of patients
respond differently? Rheumatol 2007:46.
*[36] Hancock MJ, Maher CG, Latimer J, et al. Independent evaluation of a clinical prediction rule for spinal manipulative
therapy: a randomised controlled trial. Eur Spine J 2008 Jul;17(7):936–43.
[37] Fritz J, Lindsay W, Matheson J, et al. Is there a subgroup of patients with low back pain likely to benefit from mechanical
traction? Results of a randomized clinical trial and subgrouping analysis. Spine 2007.
[38] Hockings RL, McAuley JH, Maher CG. A systematic review of the predictive ability of the orebro musculoskeletal. Pain
Questionnaire 2008 Jul 1;33(15):E494–500.
[39] WorkCover NSW. Management of soft tissue injuries: Guidelines for treatment providers. WorkCover; 2006.
S.J. Kamper et al. / Best Practice & Research Clinical Rheumatology 24 (2010) 181–191 191

[40] Gatchel R, Polatin P, Noe C, et al. Treatment- and cost-effectiveness of early intervention for acute low back pain patients:
A one-year prospective study. J Occup Rehabil 2003;13(1):1–9.
[41] George S, Zeppieri G, Cere A, et al. A randomized controlled trial of behavioral physical therapy interventions for acute
and sub-acute low back pain. Pain 2008;140:145–57.
[42] Werneke M, Hart D, George S, et al. Clinical outcomes for patients classified by fear-avoidance beliefs and centralization
phenomenon. Arch Phys Med Rehabil 2009;90:768–77.

You might also like