You are on page 1of 5

Clinical Reliability of Manual Muscle Testing

Middle Trapezius and Gluteus Medius Muscles


ETHEL FRESE,
MARYBETH BROWN,
and BARBARA J. NORTON

The purposes of this study were to develop a protocol to examine the reliability
of manual muscle testing in a clinical setting and to use that protocol to assess
the interrater reliability of manually testing the strength of the middle trapezius
and gluteus medius muscles. One hundred ten patients with various diagnoses
participated as subjects, and 11 physical therapists participated as examiners in
this study. The results showed that interrater reliability for right and left middle
trapezius and gluteus medius muscles was low. The percentage of therapists
obtaining a rating of the same grade or within one third of a grade ranged from
50% to 60% for the four muscles. This study indicates that using manual muscle
testing to make accurate clinical assessments of patient status is of questionable
value.
Key Words: Manual muscle testing, Muscle hypotonia, Physical therapy.

Manual muscle testing is an impor- necessary if the tests are to be used. strength. Other variables that influence
tant clinical tool used by physical ther- Manual muscle testing reliability in a the accuracy of a muscle test are 1) the
apists to determine a patient's muscle clinical setting has been studied mini- point and line of force application, 2)
strength. Muscle testing originated in mally. Lilienfeld et al found muscle test the magnitude of resistive force, 3) the
the United States in the early 1900s grades from Zero to Normal assigned by speed of resistive force application, 4)
during the study of muscle function in 12 to 39 examiners in four different the duration of the contraction, 5) the
patients with poliomyelitis. Despite the trials to be within one grade, although degree of cooperation from the patient,
change in the role of manual muscle the testing method was controlled be- 6) fatigue, 7) various distracting influ-
testing with the end of the last polio- cause the examiners were trained by the ences, 8) the type of instructions given,
myelitis epidemic in this country, it same instructor.1 Iddings et al also found 9) the tone of the therapist's voice, and
remains an important clinical tool for manual muscle testing to be reliable 10) the amount of interaction between
assessing the muscular causes of move- among 10 examiners whose ratings were the therapist and patient.4,9-15
ment dysfunction. Testing of muscles is within one grade in 90.6% of the trials.2 Beasley attempted to increase objec-
considered to be an essential prerequi- All of the subjects in both of these stud- tivity in manual muscle testing by de-
site for treatment program planning and ies had the diagnosis of poliomyelitis, veloping a standardized scale of norms
modification. The results of manual and the examiners were highly skilled in for muscle strength.16 Using an elec-
muscle testing also are used to make manual muscle testing. tronic myodynagraph, Beasley found a
clinical judgments concerning the pa- The reliability of manual muscle tests discrepancy between the percentage of
tient's progress or deterioration, as well has been the most difficult to achieve Normal strength assigned in a manual
as to assess the effectiveness of a partic- for grades greater than Fair because of muscle test and the percentage of
ular treatment. the examiner's subjective judgment of strength found by a quantitative meas-
The study of the reliability of exam- the amount of resistance applied during ure.16 The Good muscle strength group,
iners performing manual muscle tests is the test. One of the problems central to usually rated at 75% of Normal in the
manual muscle testing is the variable manual muscle testing system,7 had only
"frame of reference" for making an as- 43% of the Normal value on Beasley's
Mrs. Frese is Instructor, Department of Physical sessment. Such subjective judgments in- standardized scale. The Fair group had
Therapy, St. Louis University, 1504 S Grand Blvd, clude determining what is normal mus- a rating of only 9% of Normal, rather
St. Louis, MO 63104 (USA). She was a master's
degree student, Program in Physical Therapy,
cle strength for an individual given the than 50% of Normal usually assigned.
School of Medicine, Washington University, St. person's age and size, in addition to the The Poor group, ordinarily rated at 25%
Louis, MO, when this study was completed. relative strengths of the tester and pa- of Normal on the manual scale, had a
Dr. Brown is Instructor, Program in Physical
Therapy, PO Box 8083, School of Medicine, Wash- tient.3-6 rating of only 2.6% of Normal on the
ington University, 660 S Euclid Ave, St. Louis, MO Many other factors influence the re- standardized scale. The standard devia-
63110. producibility of a manual muscle test. tions showed considerable overlap in the
Mrs. Norton is Instructor, Program in Physical
Therapy, School of Medicine, Washington Univer- The testing method may vary among percentage of Normal scores in grades
sity. therapists (eg, Kendall and McCreary7 below Fair, indicating poor differentia-
This study was completed in partial fulfillment
of the requirements for Mrs. Frese's master's-degree,
vs Daniels and Worthingham8), both be- tion in grades below Fair, the range in
Washington University. cause the therapists' training may have which manual muscle testing suppos-
This article was submitted April 14, 1986; was differed and because physical therapists edly is more accurate.16
with the authors for revision 10 weeks; and was
accepted August 27, 1986. Potential Conflict of In- tend to develop their own techniques The purposes of this study were to
terest: 4. and standards for grading muscle develop a protocol to examine the reli-

1072 PHYSICAL THERAPY


RESEARCH
ability of manually testing muscle patient. A different therapist's name ap- obtained in the study and gave the ratio-
strength in a physical therapy depart- peared in each space so every examiner scaled degrees of disagreement assigned
ment and to use that protocol to assess was paired with 1 of 10 different thera- to each cell. Each cell in the matrix
the interrater reliability for manually pists. Each therapist also received a sec- represents one score for each examiner.
testing the middle trapezius and gluteus ond work sheet with 10 spaces to be For example, the cell for Normal-Nor-
medius muscles. We chose the two mus- used for recording muscle grades of an- mal had a weight value of 1.0, the cell
cles indicated 1) because we wanted to other therapist's patient when her name for Good-Normal was 0.7, and the cell
examine muscles from both the upper appeared on that therapist's list for that for Poor minus-Normal was 0.0.
and lower extremities and 2) because patient. Each examiner then selected 10 To determine whether eliminating the
the selected muscles are difficult to test patients to be included in the study. The pluses and minuses would improve the
owing to the stabilization required by Appendix gives the muscle testing scale reliability coefficient, we compressed
other muscle groups during testing. In and definitions that all of the therapists the original scores into afive-pointscale.
addition, the two muscles selected for used. Pluses and minuses were assigned the
study frequently are found to be weak same score as the main grade (eg, Fair
in patients. The hypothesis was that a Testing Procedures plus and Fair minus became Fair), and
staff of physical therapists working to- a weight matrix was designed for these
gether in a physical therapy department Manual muscle testing was performed scores.
would demonstrate interrater reliability during the patient's daily treatment ses- The muscle test scores of every patient
in testing the middle trapezius and glu- sion. A rest period of at least three min- whom Therapist 1 examined were com-
teus medius muscles. utes was allowed between the two ex- pared with the scores of each of the other
aminers' tests and the two therapists 10 therapists with whom she was paired.
METHOD kept their results confidential. The ex- An interrater reliability coefficient then
aminers used a "break" test, and for the was computed for Therapist 1. This
Subjects gluteus medius muscle test, the patient's process was repeated for each therapist
One hundred ten patients, who were hip was placed in as much extension as so that an interrater reliability coeffi-
referred for physical therapy at St. Louis possible. cient was computed for all 11 exam-
University Hospital, participated in the iners. By doing so, we wanted to deter-
study. The patients had various muscu- Testing Sequence mine whether any particular therapist
loskeletal and neurological disorders in- appeared to be less reliable compared
cluding low back pain, degenerative The testing sequence involved the fol- with the other 10, and whether the
joint disease, cervical pain, gunshot lowing steps: school the therapist graduated from or
wound, chondromalacia, rheumatoid 1. The examiner first identified a pa- her years of experience were factors af-
arthritis, and connective tissue disease. tient suitable for the study. fecting reliability.
The patients had to exhibit sufficient
2. The examiner performed the middle
range of motion to allow the body part RESULTS
trapezius and gluteus medius muscle
to be placed in the test position and
tests bilaterally. The side and muscle Table 2 summarizes the percentages
either pain-free motion or pain that did
to be tested first was assigned ran- of the total number of subjects on which
not interfere with the muscle test. The
domly before the beginning of the the examiners agreed, in addition to per-
test group consisted of 50 female and 60
test phase. The examiner used her centages of agreement within several
male subjects, aged 15 to 76 years, with
accustomed technique of muscle test- ranges of disparity (ie, fractions of grades
a mean age of 41 years (± 15 years).
ing to determine the appropriate they were apart). The percentage of sub-
grade and repeated the test several jects on whom the same grade was ob-
Examiners times, if needed, to assign a grade. tained by two examiners ranged from
Eleven staff physical therapists at St. She then recorded the grades in the 28% to 45% for the four muscles, and
Louis University Hospital served as the appropriate space on her work sheet. for 89% to 92% of the subjects we found
examiners. All examiners were gradu- 3. A second therapist, who had been either complete agreement or agreement
ates of accredited university programs. paired randomly with her for that within one grade.
Seven were graduates of the same uni- patient, then performed the same The percentage of patients who were
versity, 2 others graduated from another two muscle tests in the same order, rated Fair plus or above by one or both
university, and the remaining 2 thera- but using her own testing technique. examiners was 88% for the right middle
pists graduated from two other different The second therapist also repeated trapezius muscle, 90% for the left mid-
schools. The mean number of years of the test several times, if necessary, to dle trapezius muscle, 91% for the right
experience of the staff members was 2.3 determine a grade. She then recorded gluteus medius muscle, and 95% for the
± 1.2 years. Eight of the therapists pre- the grades on her work sheet. left gluteus medius muscle. One or both
ferred the Kendall and McCreary mus- examiners assigned a grade of Normal
cle testing technique,7 2 preferred that Data Analysis in 50% of the tests for the right middle
of Daniels and Worthingham,8 and 1 trapezius muscle, in 44% of the tests for
used both. Cohen's weighted Kappa (Kw) deter- the left middle trapezius muscle, in 67%
Each therapist received a work sheet mination17 was used as an index of of the tests for the right gluteus medius
with 10 spaces for 10 different patients. agreement for interrater reliability. This muscle, and in 70% of tests for the left
Next to each space was the name of the index weighs disagreements by the gluteus medius muscle.
therapist with whom the examiner had amount of disagreement. A weight ma- Table 3 gives the interrater reliability
been paired randomly for that particular trix (Tab. 1) was designed for the scores coefficients for both the original and the

Volume 67 / Number 7, July 1987 1073


compressed muscle testing scores. The TABLE 1
reliability for the original scores was low, Weight Matrix for Original Scoresa
ranging from .11 to .58. Compressing
the scores did not change the interrater Muscle Test Muscle Test Scores for Examiner 2
reliability coefficient appreciably (.26- Scores for
Examiner 1 P- P P+ F- F F+ G- G G+ N- N
.42). Even for grades below Fair, we
found poor interrater reliability. P- 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0
P 0.9 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1
Table 4 summarizes the results of P+ 0.8 0.9 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2
comparing each of the examiners with F- 0.7 0.8 0.9 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3
every other examiner for each test. Re- F 0.6 0.7 0.8 0.9 1.0 0.9 0.8 0.7 0.6 0.5 0.4
liability coefficients ranged from .04 to F+ 0.5 0.6 0.7 0.8 0.9 1.0 0.9 0.8 0.7 0.6 0.5
.66 with no pattern of high reliability G- 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.9 0.8 0.7 0.6
being established by any one therapist. G 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.9 0.8 0.7
G+ 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.9 0.8
Those therapists with more clinical ex-
N- 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.9
perience did not demonstrate any N 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
greater level of reliability than those who a
had graduated more recently. The Eleven possible scores ranging from P - to N.
school from which the therapist gradu-
ated did not appear to affect reliability TABLE 2
because those therapists who graduated Percentage of Agreement Among Scores for Subjectsa
from the same university did not dem-
onstrate any greater reliability among Musclesb
each other than the therapists who grad- Grade RMT LMT RGM LGM
uated from different schools. Therapist
3 demonstrated low reliability coeffi- n % n % n '% n %
cients on all four tests (.08-. 19). Same grade 31 28 32 29 52 47 50 45
1/3 grade apart 24 22 27 25 11 10 17 15
2/3 grade apart 19 17 23 21 24 22 15 14
DISCUSSION 1 grade apart 25 23 15 14 14 13 16 15
1 1/3 grades apart 6 5 8 7 4 4 7 6
Using Cohen's weighted Kappa deter- 1 2/3 grades apart 5 5 4 4 1 1 5 5
mination, we found interrater reliability Within 1 grade apart 68 62 65 60 49 45 48 44
for manually testing the strength of mid- Same grade or within 1
dle trapezius and gluteus medius mus- grade 99 90 97 89 101 92 98 89
cles in a clinical setting to be poor. When a
Each grade was divided into thirds with the use of pluses and minuses; therefore, the
the results were expressed as percentages difference between 2 and 2+ was considered 1/3, the difference between 2 - and 2+ was
of agreement, however, they were simi- 2/3, and the difference between 2 and 3 was one grade.
lar to the findings of Lilienfeld et al1 and b
RMT-right middle trapezius, LMT-left middle trapezius, RGM-right gluteus medius, LGM-
Iddings et al2 who reported good relia- left gluteus medius.
bility within one grade among experi-
enced examiners (more experienced
TABLE 3
than those in our study). The results
Interrater Reliability of Original and Compressed Scores
(28%-47% agreement) did not agree
with those of Williams,10 who found that Musclesa
two examiners agreed completely be- Conditions RMT LMT RGM LGM
tween 60% and 75% of the time. The N
examiners in our study agreed more fre- Kwb Kw Kw Kw
quently on the gluteal muscle tests than Original 110 .58 .29 .25 .11
on the middle trapezius muscle tests for Compressed 110 .26 .26 .30 .42
reasons we could not determine. We a
RMT-right middle trapezius, LMT-left middle trapezius, RGM-right gluteus medius, LGM-
also found poor interrater reliability in left gluteus medius.
grades below Fair, which agrees with b
Kw = weighted Kappa coefficient.
Beasley's16 finding of poor differentia-
tion in grades below Fair.
The distribution of the scores might strating an accurate measure of agree-
Compressing the scores by eliminat- have affected the reliability or agree- ment. Because we established the crite-
ing pluses and minuses did not appre- ment coefficient. Because the majority rion of pain not interfering with the
ciably change the interrater reliability of the subjects' scores were Fair plus or muscle test, some of the weaker patients
coefficients. The coefficient for the right greater for all of the muscles (88%-95%), may have been excluded from the study.
middle trapezius muscle decreased, pos- the scores were not well distributed One procedural problem that could
sibly because the interval widened be- across all possible muscle grades. This have affected our results was the diffi-
tween grades with pluses and minuses skewed distribution might have reduced culty of positioning some of the patients
when they were compressed (eg, Fair spuriously the magnitude of the Kappa for a particular test. Different therapists
plus-Good minus was changed to Fair- coefficient. A broader range of scores adjusted the procedure differently to
Good). should improve the chances of demon- solve the problem.

1074 PHYSICAL THERAPY


RESEARCH
progress or deterioration, therefore,
TABLE 4
would be questionable despite reliability
Interrater Reliability Among Therapists
within one grade.
Musclesa Manual muscle testing is an inexpen-
Therapist RMT LMT RGM LGM sive, relatively quick, and convenient
b
method for assessing a patient's muscle
Kw Kw Kw Kw strength. In view of the results of this
1 .22 .15 .31 .58 study, however, physical therapists
2 .21 .52 .34 .55 should consider supplementing their
3 .19 .16 .08 .13 manual muscle test scores with isoki-
4 .06 .33 .52 .44 netic testing, dynamometry, or tensiom-
5 .42 .25 .48 .38
etry. Griffin et al compared the results
6 .28 .30 .26 .34
7 .42 .63 .50 .29 of manual muscle testing with isokinetic
8 .15 .47 .66 .37 testing for knee extensor muscles in pa-
9 .37 .46 .25 .29 tients with neuromuscular disease and
10 .14 .28 .56 .49 found that a lack of strength improve-
11 .62 .04 .20 .11 ment or a decrease in strength was dem-
a onstrated by both manual muscle testing
RMT-right middle trapezius, LMT-left middle trapezius, RGM-right gluteus medius, LGM-
left gluteus medius.
and isokinetic testing.18 They also
b
Kw = weighted Kappa coefficient. found, however, that in patients with a
manual muscle test score of 9 to 10
(Normal minus-Normal), isokinetic
APPENDIX testing revealed either muscle strength
Muscle Testing Scale and Definitions* deficits or improvement not detectable
with manual muscle testing methods.
Normal (5) able to hold the test against gravity and maximum They concluded that isokinetic testing
resistance, or to move the part into the test adds valuable information when pa-
position and hold against gravity and maximum tients have manual muscle test scores of
pressure
Normal. Bohannan found a significant
Normal minus (5-) same as for Normal except slightly less resistance
can be given
reliability correlation between manual
Good plus (4+) same as for Good but slightly more resistance muscle test scores and dynamometer
can be given test scores for knee extensor muscles,
Good (4) same as for Normal except able to hold against which indicated that both testing meth-
moderate resistance ods measure muscle strength similarly.19
Good minus (4-) same as for Good but slightly less resistance can He found a significant difference, how-
be given ever, between theoretical percentage
Fair plus (3+) able to hold the test position against gravity, or to manual muscle test scores and calcu-
move the part into the test position and hold lated dynamometer percentage test
against gravity and slight resistance
scores, which indicated that theoretical
Fair (3) able to hold the test position against gravity, or to
move the part into the test position and hold
percentage scores based on manual mus-
against gravity cle testing are likely to overestimate a
Fair minus (3-) able to release gradually from the test position patient's muscle strength. Supplement-
against gravity, or to move the part toward the ing manual muscle test scores with iso-
test position against gravity almost through full kinetic testing, dynamometry, or ten-
range siometry would decrease the subjectivity
Poor plus (2+) able to move the part through full range with in assessing a patient's disability.
gravity eliminated, but against slight resistance
Poor (2) able to move the part through full range with Further study is needed in this area
gravity eliminated with each therapist being paired more
Poor minus (2-) able to move the part through partial range with than twice with another therapist. One
gravity eliminated potential study might incorporate sev-
Trace (1) muscle contraction can be palpated eral staff in-service training sessions be-
Zero (0) no contraction can be elicited fore the start of testing to help standard-
a
Adapted from Kendall and McCreary7 and Daniels and Worthingham.8 ize the muscle testing techniques among
the staff members as much as possible.
Reliability then could be reassessed to
The patients' age did not appear to be clinical value, especially when consid- determine whether any improvement is
a factor in the low interrater reliability ering the differences between Poor and noted. Garraway et al were able to in-
coefficients because the scores for the Fair, or Fair and Good, versus the dif- crease the proportion of examinations
youngest and the oldest subjects in the ference between Good and Normal. The for stroke assessment, which included
study were not consistently any farther interval between each of these pairs of motor function, in which total agree-
apart than those of the subjects in the grades is one grade, although the thera- ment was reached from 41% to 68%
middle age range. pists' subjective judgments of patient after standardizing definitions, discus-
Achieving reliability within one function may have been quite different. sion and interpretation of instructions
grade, as in this study, has questionable The accuracy of assessments of patient by the examiners, and practice.20

Volume 67 / Number 7, July 1987 1075


CONCLUSIONS may not be adequate for making clinical grades of Good and Normal so that
judgments. Supplementing muscle test subjective judgment is minimized is an
The results of this study do not sup- scores with isokinetic testing, dyna- area in which further study is needed.
port the research hypothesis that staff mometry, or tensiometry is suggested.
physical therapists can perform manual The development of a standardized Acknowledgments. We thank the
muscle tests reliably in a clinical setting. method of muscle testing is needed so physical therapy staff of St. Louis Uni-
The results do demonstrate that the that different examiners can obtain versity Hospital for their cooperation
therapists are reliable within one grade; comparable results in a clinical setting. and Carolyn Heriza for her advice in
however, this degree of reproducibility Standardizing the resistance given in planning the study.

REFERENCES

1. Lilienfeld AM, Jacobs M, Willis M: A study of 8. Daniels L, Worthingham C: Muscle Testing: 15. Trombly CA: Occupational Therapy for Physi-
the reproducibility of muscle testing and certain Techniques of Manual Examination, ed 4. Phil- cal Dysfunction, ed 2. Baltimore, MD, Williams
other aspects of muscle scoring. Phys Ther adelphia, PA, W B Saunders Co, 1980 & Wilkins, 1982, pp 173-229
Rev 34:279-289, 1954 9. Smidt GL, Rogers MW: Factors contributing to 16. Beasley WC: Quantitative muscle testing: Prin-
2. Iddings DM, Smith LK, Spencer WA: Muscle the regulation and clinical assessment of mus- ciples and applications to research and clinical
testing: Part 2. Reliability in clinical use. Phys cular strength. Phys Ther 62:1283-1290, 1982 services. Arch Phys Med Rehabil 42:398-425,
Ther Rev 41:249-256, 1961 1961
3. Molnar GE, Alexander J, Grutfield N: Reliability 10. Williams M: Manual muscle testing: Develop-
of quantitative strength measurements in chil- ment and current use. Phys Ther Rev 36:797- 17. Cohen J: Weighted Kappa: Nominal scale
dren. Arch Phys Med Rehabil 60:218-221, 805,1956 agreement and provision for scaled disagree-
1979 11. Wintz MN: Variations in current manual muscle ment or partial credit. Psychol Bull 70:213-
4. Editorial: The accuracy of the manual muscle testing. Phys Ther Rev 39:466-475, 1959 220,1968
test. Arch Phys Med Rehabil 35:515-517, 12. Johannson CA, Kent BE, Shepard KF: Rela- 18. Griffin JW, McClure MH, Bertorini TE: Sequen-
1954 tionship between verbal command volume and tial isokinetic and manual muscle testing in
5. Bechtol CO: Grip test: The use of a dynamom- magnitude of muscle contraction. Phys Ther patients with neuromuscular disease. Phys
eter with adjustable hand spacing. J Bone Joint 63:1260-1265,1983 Ther 66:32-35, 1986
Surg [Am] 36:820-824, 1954
13. Westers BM: Factors influencing strength test- 19. Bohannan RW: Manual muscle test scores and
6. Nicholas JA, Sapega A, Kraus H, et al: Factors
ing and exercise prescription. Physiotherapy dynamometer test scores of knee extension
influencing manual muscle tests in physical
therapy. J Bone Joint Surg [Am] 60:186-190, 68:42-44, 1982 strength. Arch Phys Med Rehabil 67:390-392,
1978 14. Gonnella C, Harmon G, Jacobs M: The role of 1986
7. Kendall FP, McCreary EK: Muscles: Testing the physical therapist in the gamma globulin 20. Garraway WM, Akhtar AJ, Gore SM, et al:
and Function, ed 3. Baltimore, MD, Williams & poliomyelitis prevention study. Phys Ther Rev Observer variation in the clinical assessment
Wilkins, 1983 33:337-345, 1953 of stroke. Age Ageing 5:233-240, 1976

1076 PHYSICAL THERAPY

You might also like