Validity of The Retrospective Application of Oxford Hip and Knee Scores

Journal of Musculoskeletal Research, Vol. 18, No.
1 (2015) 1550003 (8 pages)

© World Scientific Publishing Company
DOI: 10.1142/S0218957715500037
VALIDITY OF THE RETROSPECTIVE APPLICATION

OF OXFORD HIP AND KNEE SCORES
Travis M. Falconer,¶, Julie Headford†, Stephen Edmondston‡

and Piers J. Yates§

J. Musculoskelet. Res. 2015.18. Downloaded from www.worldscientific.com
Perth Orthopaedic & Sports Medicine Center

by GRAND VALLEY STATE UNIV on 09/16/16. For personal use only.
Sir Charles Gairdner Hospital

31 Outram Street, West Perth, WA 6008, Australia
†
Fremantle Hospital, Alma Street
Fremantle WA 6160, Australia
‡
University of Notre Dame, Western Australia
§
Fremantle Hospital, Kaleeya Hospital
St. John of God Hospital Murdoch
¶travisfalconer@me.com
Received 14 December 2014

Accepted 6 April 2015
Published 25 June 2015
ABSTRACT
The Oxford Hip Score (OHS) and Oxford Knee Score (OKS) are validated, reliable and reproducible
outcome measures, however their use retrospectively has not been examined. The aim of this pro-
spective cohort study was to examine the accuracy and reliability of patients’ ability to recall their OHS
and OKS in a retrospective manner. A total of 137 patients undergoing primary hip (40) or primary
knee (97) arthroplasty with a mean age of 70.8 years (range, 47–88) and a mean time to follow up of
27.2 months (range, 6–46) were included in the study. The mean retrospective OHS and OKS de-
creased compared to the pre-operative score (OHS ¼ 1:6 SD, p ¼ 0:36, OKS ¼ 4:7 SD, p < 0:001).
There was only a weak positive relationship between the actual pre-operative scores and the retro-
spective scores (OHS: r 2 ¼ 0:30, OKS: r 2 ¼ 0:19). Bland–Altman analysis demonstrated 95% limits of
agreement between scores of 19.9 to 23.1 for the OHS and 15.3 to 24.8 for the OKS. This study
¶ Correspondence to: Travis M. Falconer, Perth Orthopaedic & Sports Medicine, Center, Sir Charles Gairdner Hospital,
31 Outram Street, West Perth, WA 6008, Australia.
1550003-1
T. M. Falconer et al.
shows that patients are poor at retrospectively recalling their pre-operative OHS and OKS and
therefore these scores should not be used in a retrospective manner.
Keywords: Retrospective; Hip; Knee; Oxford; Arthroplasty; Recall.
INTRODUCTION arthroplasty. While a similar concept has been

Since their inception in the mid to late 90s, the 4,5 explored for the Oxford Shoulder Score (OSS),10
Oxford Hip Score (OHS) and Oxford Knee Score to our knowledge there have been no studies
(OKS) have become useful tools for the overall examining this application of OHS and OKS. This
clinical assessment of patients undergoing hip study examines the reliability and accuracy of the
and knee arthroplasty. These patient-centered retrospective application of the OHS and OKS.
outcome measures have shifted the focus from
morbidity and complication-based outcomes, to

patient focused functional ability and satisfac- MATERIALS AND METHODS

tion-based scores as measures of successful sur- All patients who underwent a joint replacement
gery.1 Quantifying functional and symptomatic at Fremantle Hospital Health Service (FHHS) are
improvements with the use of such outcome pre-assessed medically and surgically. They are
measures is a paramount for well-designed clin- evaluated using pre-operative outcomes score
ical research.10 relevant to the type of joint replacement that they
The OHS and OKS are reliable, internally valid are having, at the time of their surgical consent
and simple assessments that have a high response process approximately 1–2 weeks before surgery.
rate, and eliminate surgeon bias.4–6,8 Because The medical records for all patients undergoing
they are patient derived, quick and easy they can primary hip or knee replacements by the De-
be reliably performed via a postal questionnaire, partment of Orthopaedics at the FHHS between
making them very time efficient and cost effective the May 2007 and August 2010 were examined.
for clinicians.8 They are also joint specific and Those that had successfully completed a pre-op-
relatively eliminate the effect of patient co-mor- erative OHS or an OKS and were having a pri-
bidities7,11 allowing a standardized comparison mary arthroplasty were included.
across many applications. Patients with documented cognitive impairment
These outcome measures are not only useful and children under 16 years of age were excluded.
when comparing the outcomes of arthroplasty Approval from the Clinical Governance Unit at
between patient groups, institutions and between the FHHS was obtained before taking consent
prostheses, but can also be highly useful when from participants.
scores are coupled pre and post operatively to Once these patients were identified, they were
examine individual patients outcomes and elim- invited to take part in the study by way of postal
inate other sources of bias.10 There are however, mail. Each patient was sent an instructional letter,
many instances where the pre-operative score has consent form, the relevant OHS or OKS and an
not been recorded. In such cases, if a pre-opera- SF-12 Health Outcomes Score. Patients who did
tive score was to be obtained, patients would not respond after six weeks, were contacted by
have to retrospectively recall their symptoms, phone to request their participation. Patients
pain and functional status preceding their were asked to complete the score as if it were the
1550003-2
Retrospective Recall of OHS and OKS
day before their joint replacement. They would and retrospective scores, and linear regression
have to remember their symptoms prior to their analysis was used to examine the linear rela-
hip or knee replacement as opposed to filling the tionship between the two scores. To examine the
survey out based on current symptoms. influence of time since surgery on recall accuracy,
A power calculation was performed based each patient group was stratified with annual
upon a similar previous study looking at the increments and one factor analysis of variance
retrospective use of the OSS.10 Wilson et al. as- was used to test for the effect of time on recall
sumed that a clinically significant change was 5 accuracy. For all analyses, the criterion for sta-
points across the OSS. They calculated the mini- tistical significance was set at p < 0:05.
mum number of participants to be 10, giving a
power of 90% to detect a five-point difference in
the OSS. In the present study, we are examining RESULTS
two sets of data and have focused upon the level A total of 42 of 70 (60%) patients successfully
of agreement rather than the statistically signifi-

completed the retrospective OHS and 97 of 156

cant differences between the pre-operative and (62.2%) patients completed the OKS. This gave an
retrospective scores. The five-point difference is overall completion rate of 61.5% (139/226). The
important in the interpretation of differences mean time to follow-up was 26.1 months (range
in OHS and OKS, as this has been defined 9–46) for the hip group and 27.6 months (range 6–
as the threshold of Minimal Clinical Important 46) for the knee group. The mean age of the
Difference.8 patients was 70.3 years (range 47–88) and 71.0
years (range 53–88) for the hip and knee groups,
respectively.
Statistical Analysis Frequency histograms showing the distribu-
Statistical analysis was carried out using SPSS tion of pre-operative and retrospective OHS and
13.0 (SPSS Inc.). The two sets of scores were ex- OKS are shown in Figs. 1 and 2, and descriptive
amined for normality of distribution, and paired statistics summarized in Table 1. The mean dif-
t-tests used to test for significant differences be- ference between the pre-op and retrospective
tween pre-operative and retrospective scores. OHS was 1.6 points, which was not significantly
Bland–Altman analysis3 was used to define the different ( p ¼ 0:36). There was also a decrease
95% limits of agreement between pre-operative of 4.7 points between the pre-operative (42.4)
Fig. 1
1550003-3
Fig. 2
Table 1 Summary of Pre-Operative and Retrospective OHS and OKS.

Minimum Maximum Mean Standard Dev.
OHS Actual Pre-op 20 57 42.6 9.2

OHS Retrospective Estimate 14 55 41.0 11.1
OKS Actual Pre-op 13 57 42.4 7.8
OKS Retrospective Estimate 13 57 37.7 10.2
and retrospective (37.7) mean OKS scores (p ¼ poor predictor of the actual pre-operative score.
< 0:001). This difference is consistent with the Adjusting for age, gender and health (SF-12,
MCID for the OHS and OKS proposed by Murray mental and physical scores) did not improve the
et al.8 who proposed that clinically significant model fit.
differences in the OHS and OKS are likely to be The level of agreement between the pre-oper-
between 3 and 5. Table 1 displays the pre-operative and retrospective scores, which is arguably
ative and post-operative means and associated the most important factor in this analysis is best
standard deviations. It is important to note displayed using the Bland–Altman plot, first de-
that the post-operative standard deviations are scribed in 1986.2 This technique plots the mean
much larger than the actual pre-operative scores, differences between scores against the mean of
showing the considerable variability between the the two scores, as shown in Fig. 4. While the
two sets of scores in both the OHS and OKS. mean differences for both the OHS and OKS were
When using regression analysis to determine small (1.6 and 4.7, respectively), the 95% limits of
the strength of the relationship between the pre- agreement are large in both cases (19.9 to 23.1
operative and retrospective scores for both for the OHS, and 15.3 to 24.8 for the OKS). This
groups, the Pearson’s rank correlation coefficient then suggests that while the level of agreement
shows only a weakly positive relationship in both across the group is good, the limits of agreement
the OHS (r 2 ¼ 0:30) and OKS (r 2 ¼ 0:19) as dis- are too large to permit accurate estimation of in-
played in Fig. 3. This shows that the retrospective dividual scores.
estimates only explain 30% and 19% of the vari- The data was also analyzed for the influence of
ance in the pre-operative actual scores for the time, and patients grouped according to their
OHS and OKS, respectively, thus making it a time between surgery and recall. The sub-groups
1550003-4
Fig. 3
were 0–12 months, 13–24 months, 25–36 months time periods, suggesting a more accurate recall
and 37–48 months for both the OHS and OKS as during the first year.
shown in Figs. 5 and 6. This was done using one
factor ANOVA analysis comparing the means in
the subgroups over the four time periods. It is DISCUSSION
important to note that the number of patients in The OHS and OKS underwent extensive assess-
each time interval is not equal in both analyses, ment of validity, reliability and responsiveness in
and each OHS sub-group is small. There was no many prospective trials as described by Wilson
significant difference between means across all 4 et al.10 However, their retrospective use has not
periods for both the OHS and OKS (p > 0:05). yet been addressed in the literature. Knowing the
When examining the OKS group closer, there was strengths and limitations of a scoring system is
a tendency for the spread of scores to be less in integral to the overall applicability as well as the
the 0–12 month group, as indicated by a lower validity and reliability.9 The simple, responsive
standard deviation when comparing to the other and patient centered nature of the OHS and OKS
Fig. 4
1550003-5
Fig. 5
makes them ideal for retrospective application, Selection bias was minimized in this study by
especially in the setting of trauma. Since they including all patients who had undergone a pri-
require no clinician contact, no radiology mea- mary hip or knee replacement over a three-year
surements and can be easily applied by mail, in- period and had no documented cognitive im-
vestigating the validity in this retrospective pairment. This allowed patients from a number of
manner is important for overall clinical practice. different surgeons, using different approaches
Fig. 6
1550003-6
and different implants to be included. As the difference unreliable. Similarly, the Bland–Alt-
participants were invited to take part by mail, it man analysis showed 95% limits of agreement of
could be assumed that only those who were about 20 points for both OHS and OKS. These
mentally capable of understanding the study results suggest that while the OHS and OKS may
were included, with an overall response rate of be used retrospectively to evaluate surgical out-
60.6%. comes of groups of patients, this approach would
In the present study, we found no significant not be sufficiently accurate for estimation of the
influence of time since surgery on accuracy of pre-operative status of individual patients.
recall when comparing across different post-sur- This is again reflected in the Bland–Altman
gical time increments. When comparing the four plots examining the level of agreement between
time intervals, it is apparent that there is lower the two scores. While the mean differences are
variability in the difference between pre-opera- low for both the OHS and OKS, the 95% confi-
tive and retrospective scores in the first 12 dence limits are large, with ranges of 19.9–23.1
months following surgery, which can likely be for the OHS, and 15.3–24.8 for the OKS.
attributed to better recall. This conclusion is Therefore, there is a 95% chance that an indivi-
supported Wilson et al.10 who examined retro- duals’ retrospective estimate could be approxi-
spective recall of the OSS. With a recall period of mately 20 points higher or lower than their actual
between 6 and 8 weeks they reported a strong pre-op score, displaying the inaccurate nature of
correlation between the two scores (r 2 ¼ 0:81). individuals’ ability to recall their symptoms. This
Further, the limits of agreement for individual finding is also consistent with those made by
scores was þ/8.8 points which is less than half Wilson et al. when examining the OSS.10
that recorded in the present study where the Some imitations of this study should be con-
mean time since surgery was 27 months. This sidered in the interpretation of the results. While
suggests that the most suitable time to perform the number of patients would provide sufficient
retrospective analysis of surgical outcome using power to address the research question, the issue
the OHS and OKS is within first 12 months after of short-term recall could have been addressed
surgery. more specifically if the number of patients pro-
When examining the scores overall, it was in- viding recall within the first year following sur-
teresting to note that there was a trend in both gery was increased. The current level of pain and
groups for patients to underestimate their pre- physical function were not measured at the time
operative scores. There was a statistically signif- the recall questionnaire was administered. Con-
icant difference in the OKS group with a mean sequently, the association between current func-
decrease in mean of 4.7 points as compared to the tional status and recall accuracy could not be
OHS group having a mean decrease of only 1.6 examined. Finally this study examined the issue
points. This is in contrast to the study examining of recall accuracy using only one type of outcome
the OSS by Wilson et al.,10 who found that questionnaire. Accuracy of recall may vary
patients tended to overestimate their pre-opera- according to the nature of the questionnaire and
tive shoulder symptoms when asked to recall inclusion of another function, or quality of life
them. While the mean difference for the OKS is questionnaire may have helped to address the
considered to be within the range to be consid- research objective more broadly.
ered clinically significant (3–5 points),8 spread of These findings suggest that individuals are
scores about this mean is large, making this unreliable at accurately recall their pre-operative
1550003-7
symptoms by way of the OHS and OKS. Patients 3. Bland JM, Altman DG. Statistical methods for assessing
tend to underestimate their pre-operative symp- agreement between two methods of clinical measure-
ment. Lancet 1(8476): 307–310, 1986.
toms according to these outcome measures and 4. Dawson J, Fitzpatrick R, Murray D, Carr A. Compari-
there is considerable variability between indivi- son of measures to assess outcomes in total hip re-
duals’ ability to recall. While the overall mean placement surgery. Qual Health Care 5(2): 81–88, 1996.
difference between the pre-operative and retro- 5. Dawson J, Fitzpatrick R, Murray D, Carr A. Question-
naire on the perceptions of patients about total knee
spective recalled scores is low, and there is a replacement. J Bone Joint Surg Br 80(1): 63–69, 1998.
tendency for patients to have more reliable recall 6. Impellizzeri FM, Mannion AF, Leunig M, Bizzini M,
within the first 12 months, it is our recommen- Naal FD. Comparison of the reliability, responsiveness,
dation that the OHS and the OKS are not used as and construct validity of 4 different questionnaires for
evaluating outcomes after total knee arthroplasty. J
retrospective tools. Arthroplasty 26(6): 861–869, 2011.
We represent that this submission is original 7. McMurray R, Heaton J, Sloper P, Nettleton S. Mea-
work, and is not under consideration for publi- surement of patient perceptions of pain and disability in
cation with any other journal. relation to total hip replacement: The place of the Ox-
ford hip score in mixed methods. Qual Health Care 8(4):

228–233, 1999.
8. Murray DW, Fitzpatrick R, Rogers K, Pandit H, Beard
DISCLOSURE STATEMENT DJ, Carr AJ et al. The use of the Oxford hip and knee
There were no financial disclosures relating to scores. J Bone Joint Surg Br 89(8): 1010–1014, 2007.
this study. 9. Scholtes VA, Terwee CB, Poolman RW. What makes a
measurement instrument valid and reliable? Injury
42(3): 236–240, 2011.
10. Wilson J, Baker P, Rangan A. Is retrospective applica-
References
tion of the Oxford Shoulder Score valid? J Shoulder
1. Ahmad MA, Xypnitos FN, Giannoudis PV. Measuring Elbow Surg 18(4): 577–580, 2009.
hip outcomes: Common scales and checklists. Injury 11. Wylde V, Learmonth ID, Cavendish VJ. The Oxford hip
42(3): 259–264, 2011. score: The patient’s perspective. Health Qual Life Out-
2. Altman DG, Bland JM. Comparison of methods of comes 3: 66, 2005.
measuring blood pressure. J Epidemiol Commun Health
40(3): 274–277, 1986.
1550003-8

Validity of The Retrospective Application of Oxford Hip and Knee Scores

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Validity of The Retrospective Application of Oxford Hip and Knee Scores

Uploaded by

Copyright:

Available Formats

Journal of Musculoskeletal Research, Vol. 18, No.

1 (2015) 1550003 (8 pages)

VALIDITY OF THE RETROSPECTIVE APPLICATION

Travis M. Falconer,¶, Julie Headford†, Stephen Edmondston‡

Perth Orthopaedic & Sports Medicine Center

Sir Charles Gairdner Hospital

Received 14 December 2014

Keywords: Retrospective; Hip; Knee; Oxford; Arthroplasty; Recall.

INTRODUCTION arthroplasty. While a similar concept has been

morbidity and complication-based outcomes, to

patient focused functional ability and satisfac- MATERIALS AND METHODS

of agreement rather than the statistically signiﬁ-

completed the retrospective OHS and 97 of 156

Table 1 Summary of Pre-Operative and Retrospective OHS and OKS.

Minimum Maximum Mean Standard Dev.

OHS Actual Pre-op 20 57 42.6 9.2

ford hip score in mixed methods. Qual Health Care 8(4):

You might also like