Professional Documents
Culture Documents
R
esearch suggests that low back pain (LBP) causes The objective of this study was to systematically review
more years lived with disability than any other the level of evidence of the measurement properties of
condition.1 Many patients with acute LBP recover physical capacity tasks that are designed to assess
quickly, but the pain becomes a chronic condition for functioning in patients with LBP.
some, with significant consequences related to restricted
functioning.2 The measurement of functioning is complex Methods
and covers multidimensional constructs.3 Functioning is The study protocol was registered at the International
divided into 3 domains according to the International Prospective Register of Systematic Reviews (PROSPERO;
Classification of Functioning, Disability, and Health (ICF). http://www.crd.york.ac.uk/PROSPERO; registration
The body functions and structures domain refers to number: CRD42016042011). We followed the working
psychological and physiological processes and anatomical procedure developed by the Consensus-Based Standards
structures; the activity domain refers to people’s ability to for the Selection of Health Measurement Instruments
review. Two authors (M.J. and A.G., R.S., or M.L.) then Two authors (M.J. and M.L.) independently assessed all of
independently reviewed the full-text articles for eligibility, the included studies for methodological quality using the
and a third author (A.G., R.S., or M.L.) was consulted if COSMIN 4-point checklist,31,32 and a third author (A.G.,
needed for consensus. R.S., or L.B.M.) was consulted if needed to reach
consensus. The COSMIN 4-point checklist is specifically
Articles that met 4 criteria (target population, construct, designed to determine methodological quality scores of
outcome measure, and article type) were included. studies of measurement properties. The scores (excellent,
good, fair, or poor) are assigned for each measurement
Target population. Patients in the study population property using the “worst score counts method.”31,32 The
were at least 18 years old and had had LBP for 6 weeks or item for sample size in the checklist was excluded from
more.28 Articles that contained pregnant participants or the “worst score counts method” as recommended when
participants with fibromyalgia, confirmed rheumatic performing systematic reviews of measurement properties
Construct. The test was a measure of “capacity” under Data Synthesis and Analysis
the ICF activity domain, defined as the ability to execute a A “best-evidence synthesis” approach was performed by
task or an action in a standardized environment.3 consensus of all authors to synthesize the level of
evidence of measurement properties per physical capacity
Outcome measure. The test was a physical capacity task.35 This procedure is similar to the Grading of
task, defined as a standardized test that is used for an Recommendations Assessment, Development, and
evaluative purpose and that is administered by an Evaluation (GRADE),36 but different in that the
observer, includes an activity (as classified by the ICF) that best-evidence synthesis determines the level of evidence
is performed in a standardized setting, and requires for the results of measurement properties, rather than the
readily available, low-cost, and portable equipment. results of clinical trials.23 First, the results of each
Articles that exclusively investigated test batteries and did measurement property for each physical capacity task
not present results for individual physical capacity tasks were rated as “positive,” “negative,” or “indeterminate”
were excluded. If an article cited an original test manual according to result rating criteria accepted with consensus
that could then not be obtained, the test was excluded. in an international Delphi study (Tab. 1).30 Then, the level
of evidence for the ratings of measurement properties was
determined on the basis of the consistency of the result
Article type. The article presented original data
ratings of the measurement properties, the sample size of
reporting the reliability (including reliability, measurement
the combined eligible articles, and the methodological
error, and internal consistency), validity (including content
quality of the articles. Multiple articles were combined if
validity, construct validity, and criterion validity), or
they concerned the same physical capacity task and
responsiveness of physical capacity tasks.29
included samples with comparable characteristics. The
possible levels of evidence were assigned according to 5
Data Extraction and Quality Assessment criteria (strong, moderate, limited, unknown, and
A pilot version of a data extraction form was developed by conflicting) used in previous systematic reviews of the
the first author (M.J.) and piloted by 3 authors (M.J., R.S., reliability, validity, and responsiveness of physical capacity
and M.L.) on 5 randomly selected, included articles. The tasks.33,35
data extraction form was modified in response to the pilot
to include the following 6 data items: (1) patient sample Strong. This criterion encompassed consistent result
characteristics; (2) eligibility criteria; (3) setting; (4) ratings in at least 2 good-quality articles or at least 1
procedure and equipment for performing the physical excellent-quality article, with a total sample size of eligible
capacity tasks; (5) results of the measurement properties, articles equal to or greater than 100.
and (6) minimal important change. Minimal important
change is not considered a measurement property in the Moderate. This criterion encompassed consistent result
COSMIN taxonomy29 but was included in the data ratings in at least 2 fair-quality articles or 1 good-quality
extraction form because this measure is necessary to article, with a total sample size of eligible articles equal to
determine the level of evidence for measurement error.30 or greater than 50.
Data from all included articles were subsequently
extracted by 1 author (M.J.) and then independently Limited. This criterion encompassed at least 1 fair-,
checked for accuracy by a second author (A.G., R.S., or good-, or excellent-quality article, with a total sample size
M.L.). of eligible articles of 25 to 49.
Table 1.
Criteria for Result Ratings of Measurement Propertiesa
? No hypotheses defined
Responsiveness + 75% of the results in accordance with the hypotheses or AUC ≥ 0.70
? No hypotheses defined
12,301 Articles Identified Through 2302 Articles Identified Through 1557 Articles Identified Through
the First Database Search the Second Database Search Hand Search of 362 Reference
[Inception – May 11, 2017] [Jan 1, 2016 – Aug 29, 2018] Lists
Figure.
Flow of information through the different phases of the systematic review.
validity (hypothesis testing), criterion validity, and best-evidence synthesis. The poor scores were due to the
responsiveness. fact that patients received treatment between the first and
second administration of the tasks.
Quality Assessment and Data Synthesis With regard to best-evidence synthesis, a strong level of
evidence for positive ratings for test-retest reliability was
Reliability. The results for reliability are shown in
found for 3 tasks (50-ft [∼15.3-m] walk, 5-minute walk,
Table 4. Regarding methodological quality, 11 articles
and 5-repetition sit-to-stand tasks). A moderate level of
investigated reliability.5,17,37,39,40,42,46–50 Three studies of
evidence for positive ratings for test-retest reliability was
reliability in 1 article48 were rated as poor for
found for 4 tasks (1-minute stair-climbing, Progressive
methodological quality and therefore excluded from the
Physical Therapy
Characteristics of Included Studies in Alphabetical Order of First Authors’ Namesa
Patient-
Sample Age, Eligible
Criteria for Setting Back Pain Back Pain Reported
Study Size Mean (SD) Physical
Patient Inclusion (Country) Durationb Intensityb Disability,
(% Women) [Range] Capacity Task
Volume 99
Mean (SD)
Andersson Nonspecific LBP for >3 198 (47) 41.9 (9.1) Tertiary care Median: 24.0 VAS: 48.9 RMDQ: 13.8 (3.6) 1 min of stair climbing,
et al51 (2010) mo (the Netherlands) mo (IQR: (24.7) 5-min walk, 50-ftc walk,
12.0–73.0 PILE, sit-to-stand
mo)
Number 4
Armstrong LBP for ≥6 mo 10 (80) 40.8 (11.3) Tertiary care 77.1 (35.8) VAS: 46 (23) Shuttle walking test
Systematic Review of Physical Capacity Tasks
Campbell LBP for >12 mo, 250 (44) 40 (8.7) Hospitals 7.6 (6.7) y ODI for improved: Shuttle walking test
et al38 (2006) considered for surgical [19–55] (United Kingdom) 42.6 (13.6); ODI
stabilization of the spine for stable: 46.8
(unspecified LBP, (13.7); ODI for
spondylolisthesis, or deteriorated: 51.8
postlaminectomy) (13.5)
Conway Lumbar spinal stenosis 12 (25) 66.3 (9.8) Tertiary care 4.7 (1.5) y VAS: 39.9 ODI: 48.9 (11.6); Self-paced walking test
et al20 (2011) [53–81] (United States) (27.0) QBPDS: 48.2 (16.8)
Deen et al39 Lumbar spinal stenosis 28 (39.3) 73.8 [57–91] Treadmill examination
(2000) (only the test parameter
“total ambulation”)
Gautschi Lumbar disk herniation, 253 (42.3) 58.4 (16.1) Hospitals VAS: 39.1 RMDQ: 11.8 (5.2); Timed “Up & Go” Test
et al15 (2016) lumbar spinal stenosis, (Switzerland) (28.5); VAS ODI: 48.4 (20.8)
and lumbar degenerative for the leg:
disk disease scheduled for 48.8 (29.0)
lumbar spine surgery
(continued)
2019
Downloaded from https://academic.oup.com/ptj/article/99/4/457/5252375 by guest on 19 January 2021
2019
Table 2.
Continued
Patient-
Sample Age, Eligible
Criteria for Setting Back Pain Back Pain Reported
Study Size Mean (SD) Physical
Patient Inclusion (Country) Durationb Intensityb Disability,
(% Women) [Range] Capacity Task
Mean (SD)
Gautschi Lumbar disk herniation, 136 (44.1) 57.7 (15.8) Hospitals VAS: 45.6 ODI: 45.6 (17.6); Timed “Up & Go” Test
et al45 (2016) lumbar spinal stenosis, (Switzerland) (17.6); VAS RMDQ: 11.3 (5.1)
and lumbar degenerative for the leg:
disk disease scheduled for 5.5 (2.6)
lumbar spine surgery
Gautschi Lumbar disk herniation, 100 (43) 56.2 (16.1) Hospitals VAS: 38.0 RMDQ: 12 (5); Timed “Up & Go” Test
et al44 (2017) lumbar spinal stenosis, (Switzerland) (10); VAS for ODI: 46 (18)
and lumbar degenerative the leg: 55.0
disk disease scheduled for (28)
lumbar spine surgery
Kahraman Nonspecific LBP for >12 38 (37) 35 (10) 7 (3) mo VAS with ODI: 22.6 (14.2) 30-s chair stand test
et al47 (2016) wk rest: 21.3
(15.1); VAS
with activity:
64.5 (20.1)
Lee et al19 LBP with mechanical, 83 (57.8) 45.64 (10.12) Orthopedic spine 108.2 (127.9) VAS: 36.9 RMDQ: 10.4 (6.1) 5-min walk, 50-ft walk,
(2001) structural, or nonspecific [24–65] clinic (United mo (27.0) 5-repetition sit-to-stand
origins States)
Magnussen LBP for >8 wk 32 (66%) 38.1 (9.2) Tertiary care RMDQ: 10.0 (4.0); Lift test
et al37 (2004) (Norway) FFbH-R: 22.5 (4.9)
Ocarino Nonspecific LBP for ≥3 30 43.16 (11.23) University clinic 42.3 (80.6) RMDQ: 9.9 (3.7) 50-ft walk, sit-to-stand
et al53 (2009) mo (Brazil) mo
Volume 99
Odebiyi LBP for ≥3 mo with 23 [25–65] Two hospitals VAS for men: RMDQ for men: 8.4 50-ft walk, 5-min walk,
et al54 (2007) radiating leg pain (Nigeria) 45.2 (19.4); (5.3); RMDQ for 5-repetition sit-to-stand
VAS for women: 9.6 (4.9)
women: 64.1
Number 4
(22.4)
Pratt et al40 Lumbar spinal stenosis 29 (41) 69 [49–82] Orthopedic hospital ODI: 39.8 (16.5) Shuttle walking test
(2002) (United Kingdom)
(continued)
Physical Therapy
463
Systematic Review of Physical Capacity Tasks
Physical Therapy
Patient-
Sample Age, Eligible
Criteria for Setting Back Pain Back Pain Reported
Study Size Mean (SD) Physical
Patient Inclusion (Country) Durationb Intensityb Disability,
(% Women) [Range] Capacity Task
Mean (SD)
Rainville et al41 Lumbar spinal stenosis 50 (42) 68 (7.9) Spine center of a VAS: 61; VAS ODI: 35 Motorized treadmill test,
Volume 99
(2012) [48–86] hospital for the leg: self-paced walking test
(United States) 54
Simmonds Nonspecific, mechanical 44 (63.6) 42.6 Orthopedic clinic 12.4 (20.8) VAS: 27.3 RMDQ: 8.3 (6.0) 1 min of stair climbing,
et al17 (1998) LBP (United States) [1–72] mo (23.9) 5-min walk, 50-ft walk,
Number 4
[0.0–7.2] 50-ft walk (preferred
speed), 5-repetition
Systematic Review of Physical Capacity Tasks
sit-to-stand
Smeets Nonspecific LBP for >3 53 (52.8) 43.19 (9.27) Tertiary care 53.4 (67.7) VAS: 44.5 RMDQ: 13.2 (4.2) 1-min of stair climbing,
et al5 (2006) mo (the Netherlands) mo (23.5) 5-min walk, 50-ft walk,
PILE, 5-repetition
sit-to-stand
Soer et al55 LBP for >3 mo 53 (39.6) 38.5 (9.8) 250 (375) wk RMDQ: 9.2 (5.5) PILE
(2006)
Staartjes and Lumbar disk herniation, 157 (49) 49.9 (14.1) Outpatient spine VAS: 5.8 ODI: 43.0 (17.6); 5-repetition sit-to-stand
Schroder46 lumbar spinal stenosis, surgery clinic (the (2.8); VAS for RMDQ: 11.7 (5.3)
(2018) spondylolisthesis, Netherlands) the leg: 7.4
degenerative disk disease (2.0)
Strand LBP for >3 mo 114 (60) 43.9 (10.6) Tertiary care 10 (9.0) y VAS: 52.6 Disability Rating Lift test
et al56 (2002) (Norway) (20.5) Index: 56.2 (13.3)
Strand LBP for >3 mo 98 (49.0) 37.4 (10.4) Tertiary care RMDQ: 11.8 (4.4); 15-m walk test, lift test
et al48 (2011) [18–60] (Norway) FFbH-R: 7.9 (5.3) (modified), PILE
Taylor Mechanical LBP with or 44 (59.2) 48.2 (9.1) Tertiary care (Great ODI: 28.5
et al49 (2001) without sciatica for ≥6 Britain)
mo
(continued)
2019
Downloaded from https://academic.oup.com/ptj/article/99/4/457/5252375 by guest on 19 January 2021
2019
Table 2.
Continued
Patient-
Sample Age, Eligible
Criteria for Setting Back Pain Back Pain Reported
Study Size Mean (SD) Physical
Patient Inclusion (Country) Durationb Intensityb Disability,
(% Women) [Range] Capacity Task
Mean (SD)
Teixeira da LBP for >3 mo 30 (63.3) 33.0 (10.6) Tertiary care (Brazil) VAS: 44 (26) RMDQ: 9.8 (5.8) 50-ft walk, 5-min walk,
Cunha-Filho 5-repetition sit-to-stand,
et al50 (2010) Timed “Up & Go” Test
Tomkins Lumbar spinal stenosis 45 (57.8) 66.9 (9.6) 10 (11) y; leg Self-paced walking test,
et al42 (2009) pain: 7 (9) y treadmill protocol
Whitehurst Lumbar spinal stenosis 57 (56.1) Men: 69 (5); Treadmill walking test,
et al43 (2001) women: 70 (3) weight-carrying test
a
FFbH-R = Hannover Functional Ability Questionnaire; IQR = interquartile range; LBP = low back pain; ODI = Oswestry Disability Index; PILE = Progressive Isoinertial Lifting Evaluation;
QBPDS = Quebec Back Pain Disability Scale; RMDQ = Roland-Morris Disability Questionnaire; SD = standard deviation; VAS = 100-mm visual analog scale.
b
Reported as mean (SD) [range] unless otherwise indicated.
c
50 ft ≈ 15.3 m.
Volume 99
Number 4
Physical Therapy
465
Systematic Review of Physical Capacity Tasks
Table 3.
Brief Descriptions of Included Physical Capacity Tasks
30-s chair stand test Rising from a chair No. of repetitions (sitting to standing) performed in Chair, stopwatch
30 s
5-repetition sit-to-stand Rising from a chair Seconds to complete 5 repetitions of sitting to Chair, stopwatch
standing
50-fta walk Walking Seconds to complete a 50-ft course at maximum Measuring tape, stopwatch, markers
speed for indicating track end points
5-min walk Walking Meters walked in 5 min Measuring tape, stopwatch, markers
for indicating track end points
Lift test Lifting Ordinal scale of no. of repetitions of lifting a box with Table, 1.35-kg box with 5-kg sandbag
a sandbag from floor to table and back
Lift test, modified Lifting No. of repetitions of lifting a box from floor to table Table, 1.35-kg box with 5-kg sandbag
and back in 1 min (4 kg for women)
Motorized treadmill test Walking Total walking time and distance walked at preferred Treadmill, stopwatch
speed (nonmodifiable during the test) at the moment
when walking-related symptoms make the
participant stop
Progressive isoinertial lifting Lifting Weight (in kg) of the box during the last completed Standardized box, an assortment of
evaluation cycle48 /no. of completed lifting cycles5,51 weights
Self-paced walking test Walking Total walking time, speed, and distance walked at the Distance instrument, stopwatch
moment when walking-related symptoms make the
participant stop
Shuttle walking test Walking Meters walked until the participant fails to complete a Standardized audiotape, measuring
predefined “shuttle” in the time allocated tape, markers for indicating track end
points
Timed “Up & Go” Test Rising from a chair Seconds to complete rising from a chair, walking 3 m, Chair, stopwatch
and walking turning around, walking back to the chair, and sitting
down
Treadmill examination, 1.2 Walking Total time walked at 1.2 mph on a treadmill at the Treadmill, stopwatch
mphb moment when walking-related symptoms make the
participant stop (time limit: 15 min)
Treadmill examination, Walking Total time walked at preferred speed on a treadmill at Treadmill, stopwatch
preferred speed the moment when walking-related symptoms make
the participant stop (time limit: 15 min)
Treadmill protocol Walking Total distance, time, and average speed walked on a Treadmill, stopwatch
treadmill at preferred speed (modifiable during the
test) at the moment when symptoms of lumbar
spinal stenosis or other reasons make the participant
stop (time limit: 30 min)
Treadmill walking test Walking Distance walked at 53.6 m/min on a treadmill at the Treadmill, stopwatch, heart rate
moment when pain or fatigue makes the participant monitor
stop or when 70% of the predicted maximum heart
rate (70[220 – age]/100) is reached
Weight-carrying test Walking and Time needed to walk 20 m as quickly as possible Stopwatch, an assortment of
carrying while carrying dumbbells equaling 10% of the dumbbells
person’s weight
a
50 ft ≈ 15.3 m.
b
1.2 miles ≈ 1.9 km.
30-s chair stand Kahraman et al47 Intrarater 38 Goodf ICC(2,1): Intrarater: limited No information
test (2016) 0.94 (+) (+)
(95% CI:
0.89–0.97)
5-repetition Andersson et al51 Test-retest 134 Test-retest: strong Poord SDC: 11.6 se Test-retest: strong −4.1 s
sit-to-stand (2010) (+); interrater: (−) (−); interrater:
unknown unknown
Volume 99
tested on the
same day)
Number 4
Smeets et al5 Test-retest 53 Good ICC(1,1): Good LoA: ±/7.6 s
(2006) 0.91(+) (−)
(95% CI:
0.84–0.94)
(continued)
Physical Therapy
467
Systematic Review of Physical Capacity Tasks
Physical Therapy
Physical Study Sample Best-Evidence Best-Evidence
Study Result Result MICc
Capacity Task Designb Size COSMIN Synthesis: COSMIN Synthesis:
Ratings Ratings
Score32 Level of Score32 Level of
(+/?/−)30 (+/?/−)30
Evidence Evidence
Staartjes and Test-retest 66 Fair 0.97 (95% SDC: 4.1 se
Volume 99
Schroder46 (2018) CI:
0.94–0.98)
Number 4
(2010)
Systematic Review of Physical Capacity Tasks
50-ftb,g walk Andersson et al51 Test-retest 133 Test-retest: strong Poord SDC: 3.0 se Test-retest: strong −0.7 s
(2010) (+); interrater: (−) (−); interrater:
unknown unknown
Strand et al48 (2011) Test-retest 9 Poord ICC(2,1): Poord SDC: 0.6 s (+)
0.77 (+)
(95% CI:
0.24–0.94)
(continued)
2019
Downloaded from https://academic.oup.com/ptj/article/99/4/457/5252375 by guest on 19 January 2021
2019
Table 4.
Continued
50-ft walkg , Simmonds et al17 Test-retest 44 Good ICC(1,1): Test-retest: Goodf SDC: 7.10 se Unknown
preferred walking (1998) (participants 0.64 (−) conflicting; (?)
speed tested on interrater:
different days) unknown
5-min walk Andersson et al51 Test-retest 132 Test-retest: strong Poord SDC: 105.1 Test-retest: 21.4 m
(2010) (+) me (−) moderate (−)
Volume 99
(95% CI:
0.81–0.93)
Number 4
Lift test Magnussen37 (2004) Interrater 32 Goodf κ: 1.00 (+) Interrater, subacute No information
(95% CI: LBP: limited (+);
1.00–1.00) test-retest,
subacute LBP:
limited (−)
(continued)
Physical Therapy
469
Systematic Review of Physical Capacity Tasks
Continued
Physical Therapy
Magnussen37 (2004) Test-retest 28 Goodf κ: 0.55 (−)
(95% CI:
0.51–0.59)
Lift test, modified Strand et al48 (2011) Test-retest 9 Poord ICC(2,1): Test-retest: Poord SDC: 6.5 Unknown 2.5
Volume 99
0.87 (+) unknown lifts/min (−) lifts/min
(95% CI:
0.50–0.97)
Progressive Andersson et al51 Test-retest 126 Test-retest: Poord SDC: 4.2 Test-retest: 1.5
isoinertial lifting (2010) moderate (+) stagese (−) moderate (−) stages
Number 4
evaluation
Systematic Review of Physical Capacity Tasks
Self-paced walking Tomkins et al42 Test-retest 33 Fair ICC(3,1): Test-retest, LSS: No information
test (2009) 0.98 (+) limited (+)
Shuttle walking Armstrong et al52 Test-retest Test-retest: limited Goodf LoA: ±22 m Unknown
test (2005) (+); test-retest, (?)
LSS: limited (+)
Timed “Up & Go” Gautschi et al44 Test-retest 18–62 Test-retest: Poord Average SDC: Test-retest:
Test (2017) moderate (+); 4.9 s (?) unknown; LDH/LSS/
interrater: test-retest, DDD:
unknown; LDH/LSS/DDD: 3.4 s
test-retest, unknown
LDH/LSS/DDD:
unknown
(continued)
2019
Downloaded from https://academic.oup.com/ptj/article/99/4/457/5252375 by guest on 19 January 2021
2019
Table 4.
Continued
Treadmill Deen et al39 (2000) Test-retest 28 Fairf CCC: 0.90 Test-retest, LSS: No information
examination, 1.2 (+) limited (+)
mi/hh
Treadmill Deen et al39 (2000) Test-retest 28 Fairf CCC: 0.96 Test-retest, LSS: No information
examination, (+) limited (+)
preferred speed
a
CCC = concordance correlation coefficient (comparable to ICC67 ); COSMIN = Consensus-Based Standards for the Selection of Health Measurement Instruments; DDD = lumbar degenerative disk
Volume 99
disease; ICC = intraclass correlation coefficient; LBP = low back pain; LDH = lumbar disk herniation; LoA = limits of agreement; LSS = lumbar spinal stenosis; MIC = minimal important change;
SDC = smallest detectable change. + = positive rating: ICC or weighted κ ≥ 0.70 (reliability), or SDC or LoA < MIC (measurement error); ? = indeterminate rating: ICC or weighted κ not reported
(reliability) or MIC not defined (measurement error); − = negative rating: criteria for “+” not met.
b
Intrarater reliability = when presenting repeatedly the same observations to 1 observer; interrater reliability = when presenting the same observations to 2 or more observers; test-retest reliabil-
ity = when presenting the same task to the same subjects 2 or more times.68
c
The MIC (defined as “the smallest change in score that is perceived as important by patients”23 ) was added for reference because this value is needed to determine the result ratings for measurement
Number 4
error.66
d
Not included in the data synthesis because of the poor methodological score.23
e
The SDC (defined as “the smallest change that can be detected by the measurement instrument,
√ beyond measurement error”23 ) was calculated23 by the authors of the present systematic review by
multiplying the standard error of measurement presented in the original article by 1.96 × 2.
f
The rating changed after removal of the sample size item from the “worst score counts” summary in the COSMIN 4-point checklist.
g
50 ft ≈ 15.3 m.
h
1.2 miles ≈ 1.9 km.
Physical Therapy
471
Systematic Review of Physical Capacity Tasks
Isoinertial Lifting Evaluation, and Timed “Up & Go” tasks). affect walking capacity (Timed “Up & Go” task, motorized
Four tasks were specifically evaluated for patients with treadmill test, treadmill walking test, and weight-carrying
back-related diagnoses known to severely affect walking test).
capacity (self-paced walking test, shuttle walking test, and
treadmill examination at 1.2 mph [∼1.9 km/h] and at Criterion validity. The results for criterion validity are
preferred speed), and all of these had limited evidence for shown in eTable 1.
positive ratings for test-retest reliability. Intrarater
reliability was assessed for 1 task (30-second chair stand With regard to methodological quality, 1 article
test), which had limited evidence for a positive rating. investigated criterion validity42 and was rated as good for
Most of the tasks investigated for interrater reliability methodological quality and was therefore included in the
displayed unknown evidence due to small sample sizes. best-evidence synthesis.
Table 5.
Overview of the Level of Evidence Per Measurement Property Per Physical Capacity Taska
5-repetition Strong (+) Unknown 0 Strong (−) Moderate (+) 0 Limited (+)
sit-to-stand
50-ftb walk Strong (+) Unknown 0 Strong (−) Moderate (+) 0 Moderate (−)
Progressive isoinertial Moderate (+) 0 0 Moderate (−) Moderate (+) 0 Moderate (−)
lifting evaluation
Shuttle walking test Limited (+), 0 0 Unknown 0 0 Limited (+), Limited (+)c
Limited (+)c
Timed “Up & Go” Moderate (+) Unknown 0 Unknown Moderate (+), Limited 0 Limited (+)c
Test (+)c
Of the walking tasks that were specifically investigated for confident that the scores of these tasks have a high degree
patients with diagnoses known to severely affect walking of reliability and are likely to measure the constructs they
capacity, the Timed “Up & Go” task and shuttle walking are designed to measure.23 A previous systematic review
test were the only tasks that displayed positive ratings for investigated the reliability of physical capacity tasks, but
more than 1 measurement property. not the validity and responsiveness.58 That review
reported similar results for reliability, despite a different
method for determining the level of evidence.58
Discussion
To our knowledge, this is the first study to determine the
The 1-minute stair climbing task, 5-repetition sit-to-stand
level of evidence for the reliability, validity, and
task, shuttle walking test, and Timed “Up & Go” displayed
responsiveness of physical capacity tasks designed to
limited evidence for positive ratings for responsiveness.
assess functioning for patients with LBP. The 5-repetition
These results suggest that these tasks appear to have the
sit-to-stand, 50-ft walk, 5-minute walk, Progressive
ability to detect changes in functioning after a treatment,
Isoinertial Lifting Evaluation, and Timed “Up & Go” tasks
although the evidence is limited. In contrast, most
displayed moderate to strong evidence for positive ratings
included tasks that were investigated for responsiveness
for both test-retest reliability and construct validity. These
displayed limited to moderate evidence for negative
results suggest that a clinician or researcher can be
ratings. An important aspect of the definition of evidence that the heterogeneity would affect the results of
responsiveness in the COSMIN taxonomy is the ability to measurement properties. Not least, this is seen in the
detect change over time specifically in the construct to be almost unanimous positive results for test-retest reliability
measured.29 However, most of the included articles and construct validity regardless of the characteristics of
investigated responsiveness by comparison with global the study populations. A subset of the results is, however,
rating scales of constructs such as “LBP-associated primarily generalizable to patients with back-related
disability,”51 “general health improvements,”38 and “return diagnoses known to severely affect walking capacity such
to work,”56 all constructs that are different from those as lumbar spinal stenosis and spondylolisthesis (see the
found in the physical capacity tasks. In contrast, several level of evidence marked in italics in Tab. 5).62–64 In the
authors have recommended using global rating scales of data synthesis of walking tasks, we therefore did not
the same constructs as the outcome measures of interest.23 combine studies of such patients with other studies.
For instance, the responsiveness of a walking task could
We argue that the results of the systematic review are Another strength is the clearly defined construct that
generalizable to most individuals with chronic LBP. formed the basis for the selection of articles, ie, “capacity”
Research suggests that a representative sample is required in the ICF activity domain.3 This approach generated more
to generalize the results of measurement properties homogeneous test content, which can make the results of
beyond the research setting.23 The patients in the the systematic review easier to interpret. Capacity denotes
systematic review comprise a heterogeneous sample, a patient’s ability to perform an activity in a standardized
which is partly reflected in the differences in levels of pain environment and is useful to assess and compare patients’
and disability seen between the included studies (Tab. 2). functioning in a standardized way.3 However, it is
Although we acknowledge that such heterogeneity can important to note that results from a physical capacity task
influence the results of physical capacity testing (eg, are not directly transferable to a patient’s environment
meters walked or weight lifted), there seems to be little outside the test conditions, but rather signify the patient’s
potential of performing the activity outside the test Writing: M. Jakobsson, A. Gutke, L.B. Mokkink, R. Smeets, M. Lundberg
conditions. Data collection: M. Jakobsson, M. Lundberg
Data analysis: M. Jakobsson, A. Gutke, L.B. Mokkink, R. Smeets, M. Lundberg
This review also has a few limitations. First, the COSMIN Project management: M. Jakobsson, M. Lundberg
Fund procurement: M. Lundberg
4-point checklist was originally developed for PROMs,32
Providing facilities/equipment: M. Lundberg
which could compromise its validity for assessing physical
Providing institutional liaisons: R. Smeets, M. Lundberg
capacity tasks. However, we argue that the checklist is Consultation (including review of manuscript before submitting): A. Gutke,
indeed appropriate for physical capacity tasks because the L.B. Mokkink, R. Smeets, M. Lundberg
items of the checklist are phrased in a general manner and
The authors of the systematic review would like to acknowledge the medical
draw upon the researchers’ knowledge of the test at hand,
librarians Anders Wändahl and Klas Moberg at Karolinska Institutet
such as the appropriate time interval between the first and
University Library for their valuable help in developing the search strategy
second administrations in a reliability study. The checklist and performing the search.
10. Kopec JA, Esdaile JM, Abrahamowicz M et al. The Quebec nonspecific low back pain in primary care. Eur Spine J.
Back Pain Disability Scale: conceptualization and 2006;15:S169–S191.
development. J Clin Epidemiol. 1996;49:151–161. 29. Mokkink LB, Terwee CB, Patrick DL et al. The COSMIN study
11. Roland M, Morris R. A study of the natural history of back reached international consensus on taxonomy, terminology,
pain. Part I: development of a reliable and sensitive measure and definitions of measurement properties for health-related
of disability in low-back pain. Spine (Phila Pa 1976). patient-reported outcomes. J Clin Epidemiol.
1983;8:141–144. 2010;63:737–745.
12. Food and Drug Administration Center for Drug Evaluation 30. Prinsen CAC, Vohra S, Rose MR et al. How to select outcome
and Research. Guidance for Industry. Patient-Reported measurement instruments for outcomes included in a “Core
Outcome Measures: Use in Medical Product Development to Outcome Set”—a practical guideline. Trials. 2016;17:449.
Support Labeling Claims. Silver Spring, MD: Food and Drug
Administration; 2009. 31. Mokkink L, Terwee C, Patrick D et al. The COSMIN checklist
for assessing the methodological quality of studies on
13. Brodke DS, Goz V, Lawrence BD, Spiker WR, Neese A, Hung measurement properties of health status measurement
M. Oswestry Disability Index: a psychometric analysis with instruments: an international Delphi study. Qual Life Res.
1,610 patients. Spine J. 2017;17:321–327. 2010;19:539–549.
45. Gautschi OP, Joswig H, Corniola MV et al. Pre- and 57. Gautschi OP, Joswig H, Corniola MV et al. Pre- and
postoperative correlation of patient-reported outcome postoperative correlation of patient-reported outcome
measures with standardized Timed Up and Go (TUG) test measures with standardized Timed Up and Go (TUG) test
results in lumbar degenerative disc disease. Acta Neurochir results in lumbar degenerative disc disease. Acta Neurochir
(Wien). 2016;158:1875–1881. (Wien). 2016;158:1875–1881.
46. Staartjes VE, Schroder ML. The five-repetition sit-to-stand test: 58. Denteneer L, Van Daele U, Truijen S, De Hertogh W, Meirte J,
evaluation of a simple and objective tool for the assessment Stassijns G. Reliability of physical functioning tests in
of degenerative pathologies of the lumbar spine. J Neurosurg patients with low back pain: a systematic review. Spine J.
Spine. 2018;29:380–387. 2018;18:190–207.
47. Kahraman T, Ozcan Kahraman B, Salik Sengul Y, Kalemci O. 59. Maughan EF, Lewis JS. Outcome measures in chronic low
Assessment of sit-to-stand movement in nonspecific low back back pain. Eur Spine J. 2010;19:1484–1494.
pain: a comparison study for psychometric properties of
field-based and laboratory-based methods. Int J Rehabil Res. 60. van der Roer N, Ostelo RW, Bekkering GE, van Tulder MW,
2016;39:165–170. de Vet HC. Minimal clinically important change for pain
intensity, functional status, and general health status in
48. Strand LI, Anderson B, Lygren H, Skouen JS, Ostelo R, patients with nonspecific low back pain. Spine (Phila Pa