You are on page 1of 8

[ research report ]

Terese L. Chmielewski, PT, PhD, SCS1 • Michael J. Hodges, PT, MHS2 • MaryBeth Horodyski, EdD, ATC, LAT3
Mark D. Bishop, PT, PhD, CSCS1 • Bryan P. Conrad, M Eng4 • Susan M. Tillman, PT, CSCS, SCS5

Investigation of Clinician Agreement in


Evaluating Movement Quality During
Unilateral Lower Extremity Functional Tasks:
A Comparison of 2 Rating Methods
Downloaded from www.jospt.org at on November 7, 2015. For personal use only. No other uses without permission.

B
iomechanical analysis has been used to identify faulty reduced peak coronal plane knee motion
lower extremity movement patterns for the purposes of during a stepping task has been reported
to accompany an improvement in patel-
preventing and improving the rehabilitation of knee injuries.
lofemoral pain.11 The knee is part of a
Copyright © 2007 Journal of Orthopaedic & Sports Physical Therapy®. All rights reserved.

Coronal plane knee movement is a key focus area. For kinetic chain and, as such, movement
example, increased coronal plane motion at the knee (peak knee at joints above and below can influence
valgus angle) when landing from a jump is associated with higher knee position. The influence of the hip
risk for anterior cruciate ligament injury in females.9 In addition, appears to be particularly important to
coronal plane knee position. Hip adduc-
t Study Design: Nonexperimental. t Results: Interrater and intrarater percent tion accompanies a valgus knee position
during a variety of functional tasks, 9,11,18
t Objectives: To determine interrater and agreement were higher using the overall method.
intrarater agreement for 2 methods of evaluating Interrater weighted kappa coefficients were and hip internal rotation is associated
movement quality during 2 lower extremity func- similar between rating methods (overall method, with valgus knee moment during a side-
Journal of Orthopaedic & Sports Physical Therapy®

tional tasks, and to descriptively compare levels of 0-0.55; specific method, 0.23-0.53). Intrarater stepping task.12 Trunk position may ex-
agreement between the 2 methods. weighted kappa coefficients were higher for the ert an additional, indirect influence on
t Background: Clinicians typically use specific method (0.38-0.68) compared to the coronal plane knee position by altering
observational analysis to evaluate movement overall method (0.13-0.50). Generalized kappa the location of the center of gravity rela-
quality during functional tasks, but the extent of coefficients were also higher for specific method tive to the lower extremity and by affect-
agreement is unknown. compared to the overall method (unilateral squat,
ing the length-tension relationship of
t Methods and Measures: Twenty-five
0.19 and 0.01, respectively; lateral step-down,
muscles that control the hip.17 For this
0.22 and 0.18, respectively) and 95% confidence
uninjured subjects performed 3 trials of unilateral reason, several studies have included the
intervals remained above zero.
squat and lateral step-down tasks. Three clinicians trunk in the biomechanical analysis of
evaluated the trunk, pelvis, and hips for coronal t Conclusions: Rating movement at body lower extremity tasks.6,18
plane and transverse plane movement deviations. segments appears to result in agreement among
Two rating methods were used: assessment of the
Clinicians rarely have access to the
raters that is better than chance. Neither rating
entire movement (“overall method”) and rating equipment and expertise needed for bio-
method produced high agreement, indicating a
each segment individually (“specific method”). need to develop more explicit criteria for rating
mechanical analysis of movement and,
Movement deviation severity was rated using basic movement deviation severity. J Orthop Sports instead, rely on visual analysis to make
clinical guidelines and ratings were repeated from judgments about movement quality.
Phys Ther 2007;37(3):122-129. doi:10.2519/
videotape. Percent agreement and weighted kappa Most clinicians receive basic instruction
jospt.2007.2457
coefficients were calculated between rater pairs
t Key Words: functional testing, hip, knee,
in visually analyzing movement patterns
and rating sessions. Generalized kappa coefficients
were calculated across raters. movement analysis, neuromuscular, reliability for common functional tasks (eg, gait)
during their formal education. Instruc-

Assistant Professor, Department of Physical Therapy, University of Florida, Gainesville, FL. 2 Staff Physical Therapist, UF&Shands, Gainesville, FL. 3 Associate Professor, Director
1 

of Research, Department of Orthopaedics and Rehabilitation, University of Florida, Gainesville, FL. 4 Biomedical Engineer, Department of Orthopaedics and Rehabilitation,
University of Florida, Gainesville, FL. 5 Clinical Coordinator, Shands Rehab at the Orthopaedics and Sports Medicine Institute, Gainesville, FL. This work was supported by a grant
from the University of Florida Research Opportunity Fund. The University of Florida Institutional Review Board approved the protocol for this study. Address correspondence to
Terese L. Chmielewski, University of Florida, Department of Physical Therapy, PO Box 100154, HSC, Gainesville, FL 32610. E-mail: tchm@ufl.edu

122 | march 2007 | volume 37 | number 3 | journal of orthopaedic & sports physical therapy
tion is typically focused on identifying of this study was to (1) determine inter- verbal instruction in the task and a visual
the presence of specific movement de- rater and intrarater agreement for the 2 demonstration. Subjects were allowed 3
viations (eg, increased knee flexion dur- methods of rating quality of movement to 5 practice trials of each task, during
ing midstance) and final assessments and (2) descriptively compare levels of which they were given verbal cues to re-
are often based on the identification of agreement between the 2 rating methods. mind them to maintain neutral trunk,
multiple possible deviations. In con- We hypothesized that the specific method pelvis, and hip position in the coronal and
trast, instruction does not usually in- would result in higher interrater and in- transverse planes. Subjects also received
clude guidelines for rating the severity trarater agreement. verbal feedback if they were performing
of movement deviation, nor does it en- the task incorrectly. Electromagnetic
compass the wide variety of functional METHODS motion-tracking sensors (Fastrak; Pol-
tasks that may be observed. Without hemus, Colchester, VT) were then placed
structure for visually analyzing move- Subjects on the foot, shank, and thigh of the test

T
ment it is possible that clinicians could wenty-five subjects between lower extremity and on the pelvis. These
Downloaded from www.jospt.org at on November 7, 2015. For personal use only. No other uses without permission.

make dissimilar conclusions about the the ages of 18 and 25 years partici- sensors were used to standardize the
same patient and, ultimately, apply dif- pated in this experiment (7 males, knee flexion angle during the unilateral
ferent treatment plans. 18 females; mean 6 SD age, 22.4 6 1.3 squat task. The order of functional tasks
Recently, greater emphasis has been years; mean 6 SD height, 168.9 6 9.1 was randomized among subjects using a
placed on visually analyzing movement cm; mean 6 SD body mass, 66.4 6 13.6 random-number generator. Three trials
patterns during functional tasks to iden- kg). All subjects were free from pain in of each functional task were collected to
Copyright © 2007 Journal of Orthopaedic & Sports Physical Therapy®. All rights reserved.

tify candidates for knee injury prevention their low back or lower extremities at the obtain a general representation of sub-
programs or neuromuscular control inter- time of testing. Subjects were excluded ject performance. A trial was repeated if
ventions during knee rehabilitation.3,10,17 from participation if they reported (1) a a subject experienced a loss of balance or
The unilateral squat1,10,17 and lateral step- history of significant low back or lower failed to meet the performance require-
down tasks14 have been used because they extremity injury that required physician ments of the task (eg, achieving 60° of
can elicit compensations that lead to knee care, (2) balance or vestibular problems, knee flexion during the unilateral squat
valgus.10,17 Understanding the extent of or (3) an incidence of concussion in the or touching the contralateral heel to the
clinician agreement in rating movement month prior to testing. The test limb floor during the lateral step-down). Cli-
quality for these functional tasks is a was the limb used for stance when kick- nician raters were positioned 3 m (9 ft)
Journal of Orthopaedic & Sports Physical Therapy®

necessary step toward developing stan- ing a ball. Each subject provided written away from the subject, next to a digital
dardized visual analysis methods. In this informed consent. The protocol for this camcorder that was used to record all tri-
study, we compared 2 rating methods. study was approved by the University of als for retrospective review.
The first method involved an assessment Florida Institutional Review Board. Unilateral Squat  The starting position
of the entire movement pattern (“overall for the unilateral squat was standing on
method”), for which quality of move- Testing Protocol the test leg with the hip and knee in a
ment was classified into 1 of 3 categories The functional tasks chosen for this study neutral anatomical position. The trunk
(“good,” “fair,” or “poor”). The second rat- were the unilateral squat and the lateral was upright, without rotation or lateral
ing method entailed identifying specific step-down, which are shown in Figures 1 flexion, and the contralateral leg was po-
movement deviations and assigning a and 2. These tasks were chosen because sitioned with the hip in neutral and the
severity rating at individual body seg- they are used clinically, are easy to admin- knee in approximately 90° of flexion. Sub-
ments (“specific method”). The purpose ister, are performed unilaterally, and can
induce coronal plane and transverse plane
changes at the pelvis and hip if weakness
is present.7,10,17 In addition, initial pilot
testing of other functional tasks dem-
onstrated that rater vision was obscured
during reaching tests and that tasks per-
formed in bilateral stance were not suffi-
ciently demanding to uninjured subjects
without the inclusion of external weight.
All subjects reviewed a written de-
scription of the functional tasks and were
FIGURE 1. A subject performing the unilateral squat task. FIGURE 2. A subject performing the lateral step-down task.
given standardized training that included

journal of orthopaedic & sports physical therapy | volume 37 | number 3 | march 2007 | 123
[ research report ]
were considered a movement pattern de-
Scoring Used for the Specific Method of
TABLE 1 viation. Pelvic drop in the coronal plane
Evaluating Movement Quality*
was also considered a movement pattern
Symbol Point Value Criteria deviation for the lateral step-down task.
0 0 No deviation from neutral alignment Furthermore, oscillatory movements of
| 5 A small-magnitude or barely observable movement out of a neutral position and/or low these segments (repetitive movement
frequency of segment oscillation away from and back toward neutral) were
√ 10 A moderate-magnitude or marked movement out of a neutral position and/or moderate considered representative of a movement
frequency of segment oscillation
pattern deviation, with greater frequency
X 15 Excessive or severe magnitude of movement out of a neutral position and/or high
frequency of segment oscillation
and magnitude of oscillation representing
a more severe deviation. Recent biome-
*A symbol was used to express the rater’s assessment of adherence to a neutral position at each segment
(trunk, hips, and pelvis) during the functional task performance. Each symbol corresponded to a point chanical studies support lower extremity
value that increased with increasing severity of deviation. oscillations as a movement deviation.5,13
Downloaded from www.jospt.org at on November 7, 2015. For personal use only. No other uses without permission.

Overall Method Scoring System  The


jects moved at a self-selected pace into a Movement Assessment overall method of evaluating move-
squat position, were given a verbal signal Clinician Raters  Three clinicians (2 ment quality required categorization
when the test knee reached 60° of flexion, physical therapists and 1 athletic trainer of the movement pattern as good, fair,
and then returned to the starting position. with a mean 6 SD of 14 6 6 years of clini- or poor. In keeping with current clini-
Trials were repeated if the subject did not cal experience) rated movement patterns cal practice, raters were only given ba-
Copyright © 2007 Journal of Orthopaedic & Sports Physical Therapy®. All rights reserved.

reach 60° of knee flexion or if balance for each functional task. Methods for rat- sic guidelines for scoring the severity
was lost, resulting in placement of the ing movement quality were developed by of movement deviation. Raters were
contralateral foot on the floor. A 60° knee the raters in conjunction with another instructed to consider the regions of
flexion angle was chosen for this task be- physical therapist with 10 years of clini- the body discussed above in scoring the
cause it represents an angle achieved dur- cal experience and were meant to reflect movement. A good score represented no
ing dynamic tasks like cutting,13 and pilot current clinical practice. After initial de- deviation in the movement pattern; a
testing revealed that some subjects were velopment, the raters used the scoring fair score was given if movement devia-
unable to achieve 90° of knee flexion. methods to assess movement quality for tions were barely observable; and a poor
Lateral Step-Down  For the lateral 11 uninjured subjects and subsequently score indicated marked deviations in
Journal of Orthopaedic & Sports Physical Therapy®

step-down, subjects stood on the test leg, met to discuss and resolve any confusion the movement pattern. Raters were not
which was positioned on the edge of an regarding the rating guidelines. For this given instruction in resolving situations
adjustable step (Reebok Step System; study, the raters assessed movement pat- where a more severe deviation might be
Reebok International, Canton, MA), with terns at 2 different times: the first scoring present at 1 segment compared to the
the hip and knee in a neutral anatomical occurred while the subject performed the other segments.
position. The trunk was upright without functional task and the second scoring oc- Specific Method Scoring System  The
rotation or lateral flexion, the iliac crests curred when the raters viewed the video- specific method involved scoring each
were level, and the contralateral leg was taped performance. Testing occurred over segment (trunk, pelvis, and hip) individu-
unsupported, with the hip in a slightly a span of 6 weeks and scoring of the video- ally, using a set of symbols to indicate the
flexed position and the knee extended tape occurred an average (6SD) of 10.0 6 severity of deviation from a neutral posi-
(Figure 2). Subjects lowered themselves 1.5 weeks after the completion of testing. tion (Table 1). Each symbol corresponded
at a self-selected pace until the contralat- During the first scoring, raters refrained to a numerical equivalent that was set
eral heel contacted the ground and then from making comments related to the in multiples of 5, with higher numbers
returned to the starting position. Step subject performance and were required to indicating more severe deviation from a
height ranged between 15.24 and 25.4 cm complete scoring within 30 seconds after neutral position. Similar to the overall
and was determined by the height of the the subject completed the functional task. method, raters were only given basic in-
subject (subjects under 163 cm in height While reviewing the videotaped perfor- struction for determining “mild,” “mod-
used a 15.24-cm step; those between 163 mance, raters were asked to limit them- erate” and “severe” movement deviations
and 180 cm in height used a 20.32-cm selves to 1 opportunity for rewinding the (Table 1), so that the main difference be-
step; subjects over 180 cm in height used videotape. tween the specific and overall methods
a 25.4-cm step). Trials were repeated if Movement Deviations  For both func- of evaluating movement quality was that
the subject failed to touch the contralat- tional tasks, trunk and hip movement the specific method required the scoring
eral heel to the ground or stepped off the away from the neutral starting position of individual body segments. In this way,
adjustable step due to loss of balance. in the coronal and transverse planes the specific method reflected movement

124 | march 2007 | volume 37 | number 3 | journal of orthopaedic & sports physical therapy
served values)/∑(weighted expected
Frequency Distribution of Ratings for Each
values)].
TABLE 2 Functional Task Using the Overall Method of
The weighting was configured such
Evaluating Lower Extremity Movement Quality
that larger discrepancies in agreement
Poor Fair Good Total were assigned a greater penalty. We
Unilateral squat chose this analysis because, while an
Rater 1 1 14 10 25 unweighted kappa treats all disagree-
Rater 2 2 18 5 25 ment as equivalent, weighted kappa
Rater 3 ... 10 12 22 can penalize larger discrepancies in
Total* 3 (4.2) 42 (58.3) 27 (37.5) 72 (100) agreement16 that would likely result in
Lateral step-down different treatment plans. Figure 3 pro-
Rater 1 ... 17 8 25 vides an example of the calculation of
Rater 2 7 15 3 25 the weighted kappa statistic. Interrater
Downloaded from www.jospt.org at on November 7, 2015. For personal use only. No other uses without permission.

Rater 3 1 18 3 22 agreement was determined between


Total* 8 (11.2) 50 (69.4) 14 (19.4) 72 (100) each combination of raters, creating 3
*The percentage of all ratings is included in parentheses. comparisons. Intrarater agreement was
calculated for each rater by comparing
scoring from the actual trials to scoring
deviations at all segments, whereas in the the specific method was used in statis- from videotape.
Copyright © 2007 Journal of Orthopaedic & Sports Physical Therapy®. All rights reserved.

overall method there was potential for tical analyses, as opposed to individual Additionally, a generalized kappa
more severe deviations at 1 segment to be segment scores, because the trial score coefficient was calculated. Generalized
minimized in the overall score. Symbols reflected the entire movement pattern as kappa coefficients can be used to com-
were used to improve scoring efficiency rated with the overall method. pare multiple raters.4 In contrast to the
and a trial score was computed by sum- Data for each functional task were weighted kappa statistic that compares
ming all deviations. Scoring movement analyzed separately. A general indication observed frequencies of agreement and
deviations in multiples of 5 was used to of agreement was obtained for both scor- those expected by chance, the general-
assist in tallying the trial score. ing methods by calculating the percent ized kappa represents a comparison of
agreement (number of exact agreements/ the proportion of possible rater agree-
Journal of Orthopaedic & Sports Physical Therapy®

Data Management and Statistical number of possible scores). In addition, ments to the proportion of classifica-
Analysis a symmetrically weighted kappa statistic tions in each category.
The median score from the 3 trials was was used to determine rater agreement
computed for each rater and used in all greater than that expected by chance2: RESULTS
statistical analyses. The trial score from weighted kappa = 1 – [∑(weighted ob-

T
he left leg was the test leg for
24 of 25 subjects. Raters 1 and 2
Frequency Distribution of Ratings for Each evaluated movement patterns for all
TABLE 3 Functional Task Using the Specific Method of 25 subjects, whereas rater 3 only evalu-
Evaluating Lower Extremity Movement Quality* ated 22 of 25 subjects due to scheduling
constraints.
0 5 10 15 20 25 30 Total
Ratings for both functional tasks were
Unilateral squat
clustered, regardless of method used for
Rater 1 2 8 9 6 ... ... ... 25
evaluating movement (Tables 2 and 3).
Rater 2 ... 6 8 8 2 1 ... 25
The majority of the overall method rat-
Rater 3 1 9 5 7 ... ... ... 22
ings fell into the fair and good categories
Total† 3 (4.2) 23 (31.9) 22 (30.5) 21 (29.2) 2 (2.8) 1 (1.4) ... 72 (100)
(unilateral squat, ~96%; lateral step-
Lateral step-down
down, ~89%). Similarly, using the spe-
Rater 1 ... 8 10 7 ... ... ... 25
cific method, almost 92% of the ratings
Rater 2 ... 2 5 4 7 5 2 25
for the unilateral squat were in the 5- to
Rater 3 ... 3 10 7 2 ... ... 22
15-point range, and 90.3% of the ratings
Total† ... 13 (18.1) 25 (34.7) 18 (25) 9 (12.5) 5 (6.9) 2 (2.8) 72 (100)
for the lateral step-down were in the 5-
*Based on ratings of live performance.

The percentage of all ratings is included in parentheses.
to 20-point range.
Although percent agreement between

journal of orthopaedic & sports physical therapy | volume 37 | number 3 | march 2007 | 125
[ research report ]
Interrater Agreement for the Overall and Specific
TABLE 4
Methods of Evaluating Movement Quality*

Overall Method Specific Method


Rater 1 Versus 2 Rater 2 Versus 3 Rater 1 Versus 3 Rater 1 Versus 2 Rater 2 Versus 3 Rater 1 Versus 3
Percent agreement
Unilateral squat 60 64 41 36 41 32
Lateral step-down 56 82 55 20 50 27
Weighted kappa
Unilateral squat 0.37 (0.27 to 0.47) 0.18 (0.05 to 0.26) 0 (–0.11 to 0.11) 0.42 (0.34 to 0.60) 0.29(0.02 to 0.59) 0.41 (0.16 to 0.66)
Lateral step-down 0.32 (0.21 to 0.43) 0.55 (0.49 to 0.61) 0.21 (0.12 to 0.31) 0.23 (0.07 to 0.37) 0.53 (0.41 to 0.69) 0.31 (0.21 to 0.55)
Generalized kappa
Downloaded from www.jospt.org at on November 7, 2015. For personal use only. No other uses without permission.

Unilateral squat 0.01 (–0.27 to 0.25) 0.18 (0.04 to 0.32)


Lateral step-down 0.19 (–0.15 to 0.53) 0.22 (0.07 to 0.36)
* Based on ratings of live performance. Percentage agreement and weighted kappa coefficients are given for each combination of raters. The generalized kappa
coefficient, which summarizes agreement across raters, is given along with 95% confidence intervals in parentheses.

raters was higher using the overall meth- DISCUSSION beyond that expected by chance) was
Copyright © 2007 Journal of Orthopaedic & Sports Physical Therapy®. All rights reserved.

od, and values from the weighted and only 0.18. In addition, 95% confidence

T
generalized kappa statistics tended to be he purpose of this study was intervals for the generalized kappa co-
higher for the specific method (Table 4). to determine interrater and intra- efficients of the overall method include
Percent agreement between rater pairs rater agreement for 2 methods of zero, which indicates that no agreement
ranged from 41% to 82% for the overall rating quality of movement during uni- beyond that expected by chance is a po-
method and 20% to 50% for the specific lateral lower extremity functional tasks tential result.
method. Weighted kappa coefficients for and to compare levels of rater agreement Requiring clinicians to rate specif-
rater pairs ranged from 0.00 to 0.55 for between the 2 rating methods. We hy- ic body segments separately (specific
the overall method compared to 0.23 to pothesized that evaluating movement method) had the opposite effect. Agree-
Journal of Orthopaedic & Sports Physical Therapy®

0.53 for the specific method. General- at individual body segments (specific ment for rater pairs never exceeded 50%
ized kappa coefficients for the overall method) would result in better interrater with the specific method, which is lower
method were 0.01 and 0.19 (unilateral and intrarater agreement than a more than the overall method. This result is not
squat and lateral step-down, respective- global assessment of movement (overall unexpected because the greater number
ly), and 0.18 and 0.22 (unilateral squat method). Our data partially support our of scoring options in the specific method
and lateral step-down, respectively) for hypothesis. The percent agreement was compared to the overall method in-
the specific method. The 95% confidence higher between and within raters us- creases the probability of disagreement.
intervals for generalized kappa coeffi- ing the overall method compared to the On the other hand, point estimates
cients included zero for both functional specific method. In contrast, the values for weighted kappa statistics reached
tasks using the overall method, whereas generated by kappa statistics, which de- similar magnitude between the 2 rat-
they did not for the specific method. termine agreement beyond that expected ing methods, and the lowest weighted
Results for intrarater agreement fol- by chance alone, tended to be higher for kappa coefficient of the specific meth-
lowed a similar trend. Percent agree- the specific method. od had a higher value than the overall
ment was higher for the overall method Based on our results, the clinical method. More importantly, 95% confi-
and weighted kappa coefficients were practice of rating movement patterns dence intervals for generalized kappa co-
more favorable for the specific method into good, fair, or poor categories (over- efficients of the specific method did not
(Table 5). The intrarater percent agree- all method) without explicit guidelines contain zero. This indicates, at the very
ment ranged from 56% to 76% for the leads to clinician agreement a little over minimum, that rater agreement using
overall method and 32% to 60% for the half the time. Results from the kappa the specific method will almost always
specific method. In contrast, weighted statistics, however, are less favorable. For produce agreement better than expected
kappa coefficients for intrarater agree- example, raters 2 and 3 had 64% agree- by chance alone.
ment ranged from 0.13 to 0.50 for the ment in rating movement patterns dur- Results for intrarater agreement fol-
overall method and 0.38 to 0.68 for the ing the unilateral squat, but the weighted lowed the trends of interrater agreement.
specific method. kappa coefficient (indicating agreement That is, the overall method produced

126 | march 2007 | volume 37 | number 3 | journal of orthopaedic & sports physical therapy
Intrarater Agreement for the Overall and Specific
TABLE 5
Methods of Evaluating Movement Quality*

Overall Method Specific Method


Rater 1 Rater 2 Rater 3 Rater 1 Rater 2 Rater 3
Percent agreement
Unilateral squat 64 60 59 40 48 32
Lateral step-down 56 76 68 40 60 50
Weighted kappa
Unilateral squat 0.42 (0.12 to 0.72) 0.29 (–0.06 to 0.64) 0.18 (–0.25 to 0.62) 0.53 (0.31 to 0.75) 0.38 (0.11 to 0.65) 0.35 (0.11 to 0.60)
Lateral step-down 0.13 (–0.02 to 0.29) 0.50 (0.14 to 0.95) 0.35 (0.07 to 0.64) 0.56 (0.36 to 0.77) 0.68 (0.39 to 0.97) 0.57 (0.30 to 0.84)
* Percent agreement and weighted kappa values are given for each rater.
Downloaded from www.jospt.org at on November 7, 2015. For personal use only. No other uses without permission.

higher percentage agreement compared evaluate clinician agreement for 2 func- quality. Finally, raters were not given
to the specific method, but weighted tional tasks commonly used to drive clin- detailed instruction for rating severity
kappa coefficients tended to be higher ical decision making for lower extremity of the movement deviation, such as us-
for the specific method than the overall injuries.10,14 It was not our intention to ing anatomical or joint angle reference
method. The highest kappa coefficients create a scoring system for all lower ex- points. Intuitively, providing greater
Copyright © 2007 Journal of Orthopaedic & Sports Physical Therapy®. All rights reserved.

were obtained for the lateral step-down tremity functional tasks, which may re- structure to rating severity of deviation
task using the specific method. quire consideration of other movement should assist in higher rater agreement.
This study was conducted as a first deviations. Second, we specifically chose We decided, however, that the first step
step toward developing a standardized not to include measures associated with should be to understand the strengths
method of evaluating movement qual- balance (eg, touching the contralateral and limitations of current methods used
ity during lower extremity functional foot to the floor) in our rating criteria. for visually analyzing movement.
tasks. In designing our study, we made In this way, our study purely investigat- As a general point, level of agreement
3 important decisions. First, we chose to ed agreement in discerning movement between and within raters was not high
for either method of evaluating move-
Journal of Orthopaedic & Sports Physical Therapy®

ment. However, caution must be taken


Rater 1
when comparing the rating methods and
Row
Poor Fair Good
Total interpreting the results of this study. The
number of rating categories was greater in
1  (0.1) 0  (0.7) 0  (0.2)
Poor 1 the specific method than the overall meth-
0 1 4
od to allow for a more responsive rating
1  (1.1) 11  (10.1) 2  (2.8) method. The difference in the number of
Fair 14
Rater 2

1 0 1
rating categories affects the kappa statis-
0  (0.8) 7  (7.2) 3  (2.0) tic, meaning that the rating methods are
Good 10
4 1 0 not directly comparable. Second, ratings
Column tended to be clustered (tight distribution)
2 18 5 25
Total for both rating methods. Such clustering
increases chance agreement and can lower
 observed frequencies (fo) = 0×1 + 1×0 + 4×0 + 1×1 + 0×11 + 1×2 + 4×0 + 1×7 + 0×3 = 10 the values for kappa statistics.16 Including
 expected frequencies (fe) = 0×0.1 + 1×0.7 + 4×0.2 +1×1.1 + 0×10.1 + 1×2 + 4×0.8 + 1× 7.2 + 0×3 = 15.8
subjects with greater movement compen-
weighted = 1 – (fo ⁄ fe) = 1 – (10 ⁄ 15.8) = 0.37 sations (eg, patients with lower extremity
pathology) might have resulted in higher
FIGURE 3. Example calculation of the weighted kappa statistic. Data presented are ratings from the unilateral squat kappa values for both rating methods.
task. On the top row of each cell are the observed (bold) and expected (in parentheses) frequencies. Expected fre-
Thirdly, kappa values are affected by the
quencies are calculated by multiplying the column total by the row total and dividing by the grand total. For example,
the expected frequency of observations in the “poor-poor” cell is calculated as follows: (2 × 1)/25. The number on the distribution of disagreements.16 If dis-
bottom row of each cell is the “weighting” assigned to an observation in that cell. The weighting matrix is configured agreements are distributed such that 1 rat-
such that greater penalty is given for larger discrepancies in agreement. For example, the value 4 for a “poor-good” er tends to rate deviations as more severe
combination indicates a higher penalty than the value 1 for “fair-good” combination. Note that perfect agreement than another rater, kappa values will be
receives a weighting of 0.
higher than if the disagreements are more

journal of orthopaedic & sports physical therapy | volume 37 | number 3 | march 2007 | 127
[ research report ]
random. We observed that in the lateral gle-leg standing ability in 78 uninjured using anatomical reference points assist
step-down task rater 1 typically scored subjects and 17 patients. Trials were rater agreement.
movement deviations less severely than scored as good, fair, and poor, according The main limitations of this study that
the other raters, whereas rater 2 scored to whether compensatory movements merit discussion are the use of symbols
them as more severe, possibly inflating were made at the trunk and extremities. for scoring (specific method), inclusion of
the kappa values for this task. The use of Guidelines were provided for judging only uninjured subjects, and the distribu-
explicit guidelines for rating movement compensatory movements; for example, tion of responses for both rating meth-
deviation severity should improve consis- poor grade meant there were 3 or more ods. We arbitrarily chose 3 easily written
tency across raters, eliminating the effect compensatory movements and postural symbols to denote severity of movement
of rater bias on the data. It is important sway fell outside the base of support. This deviation. It is possible that raters may
to remember that the primary concern rating method, which is similar in struc- have confused the different symbols,
in using visual observation to evaluate ture to our overall method, resulted in a which would have affected our results;
movement deviations is the avoidance of weighted kappa coefficient of 0.70. The but because each of the raters assisted in
Downloaded from www.jospt.org at on November 7, 2015. For personal use only. No other uses without permission.

large discrepancies in ratings that could larger subject pool, inclusion of patient developing the scoring methods and par-
result in different treatment plans. A re- subjects (potentially increasing the dis- ticipated in pilot testing, it is unlikely that
view of our data demonstrates that there tribution of ratings), and use of specific confusion between the symbols skewed
was only 1 trial for the lateral step-down guidelines for rating severity of move- our results. In retrospect, a more intui-
task and 2 trials for the unilateral squat ment deviation may all have contributed tive scoring method should have been
task where rater pairs had polar-opposite to the high kappa value. Bjorklund and implemented, especially if the ultimate
Copyright © 2007 Journal of Orthopaedic & Sports Physical Therapy®. All rights reserved.

ratings using the overall method (eg, good colleagues1 reported interrater agreement goal was to develop a standardized scor-
versus poor). For the specific method, if for observational analysis of subjects per- ing method for widespread use in clinical
disagreement by 1 cell (eg, 10 versus 15) forming a wide variety of lower extrem- practice. We also tested uninjured sub-
was included in the calculation, percent ity functional tasks, including running jects, which resulted in tight clustering of
agreement ranged from 50% to 95% for and hopping. Each task had 5 different the ratings. It is possible that including a
all rater pairs. Despite fairly low rater criteria and each criterion had 2 choices. patient population would have provided
agreement in this study, it is questionable Using an unweighted kappa statistic, in- a better indication of agreement across a
whether the disagreements would have terrater coefficients ranged between 0.62 wide range of movement patterns; how-
resulted in remarkably different treat- and 0.78 for all tests, whereas intrarater ever, important points must be made to
Journal of Orthopaedic & Sports Physical Therapy®

ment approaches. coefficients ranged from 0.34 and 0.64. this regard. Observational analysis of
A few other studies have investigated Not unexpectedly, intrarater agreement movement quality is performed in re-
rater agreement in evaluating movement was worse for tests that involved subjec- habilitation and as part of screenings to
during lower extremity functional tasks. tive judgments of movement quality, in- identify individuals that are predisposed
Because of differences in study design, cluding the “springiness” and “rhythm” to lower extremity injury. Our study re-
results from these studies cannot be di- during hops and jumps. Finally, Piva et sults have direct relevance for the latter
rectly compared to the current study, but al14 reported an interrater kappa coeffi- clinical application. Furthermore, Piva et
they indicate the status of visual analysis cient of 0.67 (95% confidence intervals al14 reported that for patients with patel-
development. Reimann et al15 described [CI]: 0.58-0.76) for movement quality lofemoral pain who performed a lateral
interrater agreement for a single-leg hop assessment during a lateral step-down step-down task only 17% of the move-
stabilization test in which subjects made task performed by individuals with ment ratings fell into the poor category.
10 hops and were rated on landing and patellofemoral pain. The 5 rating crite- This is comparable to the 11.5% poor
balance errors. Almost all criteria were ria included trunk, pelvis, and knee po- ratings for the lateral step-down task in
related to performance or loss of balance sition, and use of arms for balance, and our study. It is possible that movement
with the exception of 1 movement quality loss of balance. Criteria were scored di- patterns in either injured or uninjured
rating in the balance error category (non- chotomously (1 point if the deviation was populations tend to be acceptable or have
dominant limb moving into greater than present), except knee position, for which minor deviations. Assuming that very se-
30° flexion, extension, or abduction). A deviation severity was rated based on vere movement deviations would be eas-
lower intraclass correlation coefficient for anatomical reference points. Total scores ily discriminated, the challenge may be to
the balance score (0.74) than the landing were then grouped into 1 of 3 categories. develop standardized scoring methods to
score (0.92) may have resulted from the Results of this study indicate that col- discern minor movement deviations.
difficulty in evaluating movement qual- lapsing the ratings of individual body Continued research is needed to de-
ity. Harrison and colleagues8 investigated segments into categories, providing fewer velop and validate standardized methods
clinician agreement for evaluating sin- rating choices for deviation severity, and for observational analysis of movement

128 | march 2007 | volume 37 | number 3 | journal of orthopaedic & sports physical therapy
quality. Future efforts should include ACKNOWLEDGEMENTS 9. H  ewett TE, Myer GD, Ford KR, et al. Biome-
chanical measures of neuromuscular control
stricter criteria with explicit instruction

T
and valgus loading of the knee predict anterior
(eg, knee moves to the inside of the great he authors acknowledge physi- cruciate ligament injury risk in female ath-
toe) as a means to improve agreement be- cal therapists Chu Soh and Mary letes: a prospective study. Am J Sports Med.
tween raters. In addition, rating catego- Gregory for assisting with the devel- 2005;33:492-501.
10. Kibler WB, Press J, Sciascia A. The role of
ries should be limited to as few choices opment of this project. In addition, the
core stability in athletic function. Sports Med.
as possible—even dichotomous—if it is authors acknowledge Heidi Betz, Yadira 2006;36:189-198.
not necessary to have a more sensitive DeJesus, and Michelle Sanchez for their 11. Mascal CL, Landel R, Powers C. Management
measure (ie, more categories) to quantify assistance with data processing. t of patellofemoral pain targeting hip, pelvis, and
trunk muscle function: 2 case reports. J Orthop
movement deviation severity. Intuitive Sports Phys Ther. 2003;33:647-660.
documentation methods, such as using 12. McLean SG, Huang X, van den Bogert AJ. As-
“+/++/+++” to denote severity of move- references sociation between lower extremity posture at
ment deviation or a 0/1 scoring system contact and peak knee valgus moment during
Downloaded from www.jospt.org at on November 7, 2015. For personal use only. No other uses without permission.

1. B jorklund K, Skold C, Andersson L, Dalen N. Re- sidestepping: implications for ACL injury. Clin
for dichotomous rating would also be liability of a criterion-based test of athletes with Biomech (Bristol, Avon). 2005;20:863-870.
beneficial. Analysis methods that consist knee injuries; where the physiotherapist and 13. McLean SG, Lipfert SW, van den Bogert AJ.
of ratings for individual body segments, the patient independently and simultaneously Effect of gender and defensive opponent on
assess the patient’s performance. Knee Surg the biomechanics of sidestep cutting. Med Sci
strict rating criteria, the fewest number
Sports Traumatol Arthrosc. 2006;14:165-175. Sports Exerc. 2004;36:1008-1016.
of rating categories, and the inclusion 2. Cohen J. Weighted kappa: nominal scale agree- 14. Piva SR, Fitzgerald K, Irrgang JJ, et al. Reliability
of more easily observed balance criteria ment with provision for scaled disagreement or of measures of impairments associated with
Copyright © 2007 Journal of Orthopaedic & Sports Physical Therapy®. All rights reserved.

(eg, stepping down with contralateral partial credit. Psychol Bull. 1968;70:213-220. patellofemoral pain syndrome. BMC Musculosk-
3. Cook G, Burton L, Hoogenboom B. Pre-par- elet Disord. 2006;7:33.
leg) should raise rater agreement to ac-
ticipation screening: the use of fundamental 15. Riemann BL, Caggiano NA, Lephart S. Examina-
ceptable levels. Once reliable evaluation movements as an assessment of function-Part I. tion of a clinical method of assessing postural
methods are developed, validation of rat- North Am J Sports Phys Ther. 2006;1:62-72. control during a functional performance task. J
ings by comparing to objective measures 4. Crewson PE. Reader agreement studies. AJR Am Sport Rehabil. 1999;8:171-183.
J Roentgenol. 2005;184:1391-1397. 16. Sim J, Wright CC. The kappa statistic in reliabil-
(eg, motion capture data) or clinical out- 5. Ford KR, Myer GD, Smith RL, Vianello RM, Seiwert ity studies: use, interpretation, and sample size
comes will be necessary. SL, Hewett TE. A comparison of dynamic coronal requirements. Phys Ther. 2005;85:257-268.
plane excursion between matched male and 17. Willson JD, Dougherty CP, Ireland ML, Davis
CONCLUSION female athletes when performing single leg land- IM. Core stability and its relationship to lower
Journal of Orthopaedic & Sports Physical Therapy®

ings. Clin Biomech (Bristol, Avon). 2006;21:33-40. extremity function and injury. J Am Acad Orthop
6. Griffin LY, Agel J, Albohm MJ, et al. Noncontact Surg. 2005;13:316-325.

R
ating movement at individual anterior cruciate ligament injuries: risk factors 18. Zeller BL, McCrory JL, Kibler WB, Uhl TL.
body segments appears to result and prevention strategies. J Am Acad Orthop Differences in kinematics and electromyo-
Surg. 2000;8:141-150. graphic activity between men and women dur-
in agreement among raters that is
7. Hardcastle P, Nade S. The significance of ing the single-legged squat. Am J Sports Med.
better than chance. Neither the overall the Trendelenburg test. J Bone Joint Surg Br. 2003;31:449-456.
method nor the specific method of move- 1985;67:741-746.
ment analysis produced high rater agree- 8. Harrison EL, Duenkel N, Dunlop R, Russell G.

@
Evaluation of single-leg standing following ante-
ment, indicating a need to develop more
rior cruciate ligament surgery and rehabilitation. more information
explicit criteria for rating movement de- Phys Ther. 1994;74:245-252. www.jospt.org
viation severity.

journal of orthopaedic & sports physical therapy | volume 37 | number 3 | march 2007 | 129

You might also like