Professional Documents
Culture Documents
3 8 9 - 3 9 4
Validation of the Health of the Nation laboriously gathered information. The cur-
rent paper describes an extensive validation
of HoNOS in an ordinary clinical setting,
Outcome Scales and summarises a more detailed technical
report to the Department of Health (avail-
PAUL BEBBINGTON, TERRY BRUGHA. TREVOR HILL, LUCY MARSDEN
able from the first author upon request).
and SUZANNE W I N D O W
METHOD
In order to assess overall item perfor- and Social functioning subscores in We tested for systematic bias in key-
mance, we examined all 196 measures of HoNOS. Many of these correlations were workers' and research workers' ratings by
agreement relating to individual items substantial and significant, but there were comparing mean HoNOS total scores and
between keyworkers and research workers still appreciable inconsistencies: several sub-scores as generated by the keyworkers
and between keyworkers and item equiva- analyses showed virtually no correlation and researchers. Although the additional
lents (details available from the first author (Bebbington et al, 1998). information available to the research
upon request). No item had a mean kappa
value above the 'fair' level. Eight items
can be regarded as having 'fair' agreement
(items 1-3, 5, 6, and 9-1 1). The remainder
fell into the 'poor' range. There are particu-
lar problems with item 8, which is a catch- Sub-score Initial data Follow-up data
all, with several different possible symptoms
London
included. However, items 4 (Cogrutive pro-
A (Behavioural problems)
blems), 7 (Problems with depressed mood),
B (Impairment)
and 12 (Problems with occupational activ-
ities) also performed badly. Item 8 covers a C (Symptoms)
number of additional symptom areas, in- D (Social functioning)
cluding anxiety, eating disorders, sleep dis- Total score
orders, sexual problems, and a section for Leicester
rating problems not specifically covered. A (Behavioural problems)
There were very few positive endorsements
B (Impairment)
of eating, sexual and 'other' problems.
The agreement about whether these pro- c (Sym-1
blems should be rated was very variable, D (Social functioning)
but in view of their rarity, it is difficult to Total score
make any definitive statements. Examples Total
of anxiety and sleep disorders did occur fre- A (Behavioural problems) 0.59 0.58
quently enough for an overall view of the b (Impairment) 0.52 0.39
performance of item 8 in relation to these
C (Symptoms) 0.22* 0.56
sub-items. The item performed badly,
D (Social functioning) 0.59 0.58
although it was marginally better for
Total score 0.43 0.70
anxiety than for sleep disturbance.
In Table 4 we examine the relationship All ye rignifkant at the I% kvel, except (dgnifkamat the 5% kvd), and "(where Pd.064).
between the total scores provided by each
of the criterion instruments, and the Tkbh 2 Key* a d research worked ratings. Correlations of HoNOS sub-scores and total scores
HoNOS scores and subscores. The most with the equivalents derived by computation from the aitwkn i n r u u m
consistent correlations were, as predicted,
between the Social functioning sub-score HoNOS scores
of HoNOS and the SBS and SRPS total
scores. The correlations of the other Behavioural Impairment Symptoms Social Total score
HoNOS subscores were mvial and non- (A) (B) (C) functioning (D)
significant, except in the London follow-
up data. There were few correlations lnttial data
between the SCAN total score and the London
HoNOS sub-scores and total score. Ke).worker
The implication is that the best perform- ReKvchworker
ing part of the HoNOS is the Social func- Leicester
tioning sub-score. The performance of the
Keyworker
other sub-scores against the 'gold standard'
Research worker
instruments is inconsistent and generally
poor. The HoNOS total score correlates Follow-up data
best with measures of social behaviour and London
performance. HoNOS appears inadequate Keyworker 0.36- 0.20 0.36- 0.W 0.45-
as a measure of symptoms in these analyses. Research worker 0.76- 0.41- 0.67- 0.7P 0.7F
As described above, we used the items
-
Leicester
of the SBS to generate subscores which Keyworker 0.6 1 * 0.42- 0.61- 0.67- 0.71-
were approximately equivalent to the s u b
Research worker 0.7W 0.45- 0.7F 0.6F 0.9 1
scores in HoNOS. We were able to do this
for the Behavioural problems, Symptoms
D E B D I N G T O N E T AL
workers enabled them to rate items more We chose the criterion instruments ment and that of the instrument being
accurately, there was no overall tendency because they were detailed, standardised tested. One cannot, therefore, expect a vali-
to rate items either higher or lower than and clinically based. They all have good dation exercise to be underwritten by
the keyworkers did. The lack of bias in psychometric properties. However, no perfect agreement. Our intention was that
these independently collected ratings is re- criterion instrument is perfect, and there the chosen criterion instruments should,
markably clear (details available from the will always be some divergence between between them, be equivalent to the concep-
first author upon request). the conceptual basis of a criterion instru- tual domain of HoNOS. Thus, the Social
If HoNOS are truly to reflect outcome,
the changes in scores between induction Table 3 Summary of measurn of a g r e m t on HoNOS h s
and follow-up should correlate with the
score changes calculated from the reference
Kappa value London Leicester
instruments. The reference instruments
provided two scores for such comparison: Initial Follow-up Initial Follow-up
the computed equivalent scores and the
scores and subscores obtained directly from Keyworkersand research workers
the reference instruments. In fact, using
0-0-20 (F) 5 6 0
change scores is an extremely stringent test 0.21-0.40 (fair) 6 2 6
of validity because it compounds the error
0.40-0.60 (moderate) 0 3 6
component of both initial and follow-up
0.604.80 (good) 0 0 0
scores. For this reason we present data relat-
ing only to correlations between HoNOS Keyworkers'rcores and the computed criterion instrument equivalents compared
scores and the computed equivalents, as 04.20 (POW) 5 8 7
the content overlap is thus maximised. 0.21-0.40 (fair) 6 2 4
In Table 5, we show the results of these 0.41-0.60 (moderate) I 2 I
correlations for the HoNOS rated by the 0.61-0.80 (good) 0 0 0
keyworkers and by the researchers. The per-
formance of HoNOS in the hands of the Researchworkers'scores and the computed criterion instrument equivaknts compared
keyworkers was particularly poor in Lon- 0-0.20 (poor) 5 3 7
don and only a little better in Leicestershire. 0.21-0.40 (fair) 4 4 I
The performance of the research workers, 0.41-0.60 (moderate) 3 3 4
as assessed in this way, was once more 0.61-0.80 (good) 0 I 0
noticeably superior to that of the keyworkers.
Table 4 ComhtionofkryworkcrHdJOSsub-rcore,adtodru~uwithtoul~oreronvitcrkn
DISCUSSION
instruments
Limitations of the study
The intention of HoNOS is that they should Criterion instruments HoNOS scores
be acceptable, feasible, reliable and valid
when used by ordinary clinical staff in Behavioural Impairment Symptoms Social A, B+D Total
or* clinical practice. This basically problems (A) (B) (C) functioning (D) score
requires them to perform in imperfect
conditions, something of a tall order. The London (initial data)
validation of an instrument like this is SBS -0.08 0.08 0.06 0.42" 0.26 0.23
particularly stringent as it implies that the SRPS -0.11 0.17 0.18 0.3F 0.23 0.24
error in ratings is small, despite the imperfect SCAN -0.17 -0.09 -0.02 0.00 -0.12 -0.12
circumstances in which they are made. In the Leiaster (initial data)
ordinary clinical situation, the information SBS 0.21 0.09 0.08 0.47. 0.500 0.44"
available to a keyworker is likely to be in- SRPS 0.04 0.11 -0.02 0.44" 0.40. 0.3V
complete, sometimes very much so. These
SCAN 0.26 -0.05 0.06 -0.17 0.00 -0.0 1
are the circumstances in which keyworkers
in our study completed their ratings. Thus, London (follow-up data)
we were testing the keyworkers' ability to SBS 0.51" 0.29 0.54" 0.25 0.49ff 0.51"
approximate to the 'real' picture, given the SRK 0.35- 0.26 0.490 0.45" 0.480 0.490
imperfect state of their knowledge. It could SCAN 0.08 0.03 0.54- 0.32* 0.36" 0.42"
be said that our study was a d y one of Leicester (follow-up data)
the performance of the keyworkers making SBS 0.24 0.22 0.20 0.31 0.45" 0.44-
the rating, but because the instrument is SRPS 0.17 0.13 0.29 0.5F 0.47- 0.470
supposed to be capable of accommodating
SCAN 0.08 0.20 0.32* 0.04 0.1 1 0.28
the limitations of raters in clinical situations,
it was properly a test of the instrument itself. *P <0.0% "P <0.01.
VALIDATION O F H o N O S
W e 5 Spearman correlation of xae changes with computed equivalents corresponding to the inaccuracies in the
keyworkers' HoNOS ratings. SCAN, on
Behavioural problems Impairments Symptoms Social functioning T d the other hand, is based on direct evalu-
ation of patients' symptoms.
Keyworker There are three types of output from
London HoNOS: total score, sub-scores and ratings
Leicester on individual items. For the originators,
this represents a gradient of utility, with
Research worker the total score being the most useful (Wing
London 0.46- 0.1 1 0.32; 0.63- 0.52- et al, 1996). In the hands of our key-
Leicester 0.54- 0.36- 0.64** 0.13 0.41- workers, the performance of the instrument
also followed this gradient, being just about
acceptable for total score, and very poor for
Role Performance Schedule has items which although we do feel that our item equiva- individual items. The items for which
relate purely to the four items in the Social lents correspond quite closely to the agreement was particularly poor were 4
functioning section of HoNOS. The SBS HoNOS items. In our view, they are closer (Cognitive problems), 7 (Problems with
covers the Behavioural problems, Symptoms to being a validation criterion than the depressed mood), 8 (Other mental and
and Social functioning sections of HoNOS. research workers' direct HoNOS ratings. behavioural problems), and 12 (Problems
SCAN, on the other hand, relates primarily HoNOS is a simple instrument, intended with occupation and activities). It is of
to the Symptoms items (6-8). However, it to be quick and easy to use by busy clini- interest that keyworkers (who were mostly
does also share content with the Behaviour- cians. Nevertheless, its use does require nurses, although some were social workers)
a1 problems covered by HoNOS items 1-3. some training, however limited. The train- had particular difficulties with two of the
Because of this imperfect overlap between ing of research workers and keyworkers in three items relating to Symptoms.
the instruments, one would not expect the current study was constrained by the These difficulties with individual
more than good agreement between total busy services in which the study took place. HoNOS items almost certainly relate to
scores on any one of the reference instru- The training of keyworkers was more rigor- the fact that most can be rated in relation
ments and on HoNOS. The validation exer- ous in Leicester than in London, where the to several different circumstances and
cise can be amplified however by resorting service was particularly busy and it was phenomena. This is a necessary process of
to analyses based on sub-scores. difficult to arrange for keyworkers to put conflation, to render the instrument suc-
The rationale for using HoNOS ratings aside the necessary time. Because of cinct enough to be practicable in busy
by the research workers as a test of validity changes in keyworkers, the initial and clinical situations, but the downside is that
is that the research workers had access to follow-up interviews were not always done accuracy of rating is lost.
the criterion instrument ratings before by the same person. Moreover, the key- The use of HoNOS as an outcome
making their HoNOS ratings.' In practice, workers were not equivalent at the two measure depends on repeated applications
this meant they had a wider knowledge sampling times. Primary nurses on the ward to generate change scores. The HoNOS
base against which to rate the HoNOS. are not the same as community-based change scores obtained from the keyworkers
The discrepancy between keyworkers' and keyworkers (in seniority, knowledge of performed poorly in relation to the com-
research workers' HoNOS ratings could patients, etc.). This may account in some puted equivalents, whereas the performance
be due to the keyworkers' relative clinical part for the poor results we obtained. of the research workers was considerably
inexperience, or to the keyworkers having The results from the research workers better. W e n the patient is discharged
less information available to them. In either indicate how HoNOS might perform when between ratings, the keyworker's role in
case, the research workers' ratings are likely used by experienced clinicians with detailed ordinary clinical services will often be trans-
to be more valid than the keyworkers'. As information, and when initial and follow-up ferred to someone else. In consequence, the
we thus assign unequal status to the ratings assessment are made by the same person. necessary information for making the
involved in the comparison, it is more a test Thus, the circumstances of our validation ratings will reside with different raters,
of validity than of reliability. study are particularly informative about who may apply different rating thresholds,
Because of the incomplete overlap the limitations of HoNOS and the pro- unless unusual steps are taken to ensure
between criterion instruments and HoNOS, cedural aspects of its use. rating by the same person. In the current
we constructed item equivalents based on study, although follow-up keyworker
the individual items in the three reference ratings were often made by different
The performance of Ho NOS people, the follow-up research worker
instruments that corresponded most closely
to the HoNOS glossary definitions. The general Pattern of our results suggests ratings were always made by the person
aim of doing this was to get as close as poss- that HoNOS does not provide a good conducting the initial assessment. This
ible to the conceptual domain of HoNOs. assessment of symptoms, and performs best contrast is reflected in the relative perfor-
ne resulting algorithms were complex, as a measure of social functioning. Even this mance of HoNOS as an outcome measure
may be an artefact, insofar as the SBS is in the hands of the keyworkers and of the
completed from interviews with informants, research workers.
I. In fact, this was not always the case in Leicester as the in case the This tends to Our results are in considerable contrast
taskofcompleting criterion measures was often divided maximisf2 similarities, and may mean that to the reliability studies between raters in
up among different research staff. there are inadequacies in the SBS ratings, the HoNOS Research in Development
B E B B I N G T O N E T AL
precondition for adequate rating of HoNOS. (First received 6 April 1998. final revision 24 August 1998. accepted 2 December 1998)
In fact, the widespread introduction of
HoNOS would serve more than one pur-
pose. The original intention was to measure
outcome in relation to Health of the Nation that would complement the recently fostered NHS (1996) Planningand Priwities ~~
targets for mental illness. Our results suggest
culnw of care msunderlines the C NHs, 199718. L e k Department of Health.
serious problems in using the instrument in need to improve the training of keyworkers -. s. r ~ n nhe L&-r Housing Schedule:
this way. Completion by individual keywor- in making clinical such training background ar;d q&onnaire format.%t%ng paper:
should be general, in =lation to the appraisal Section of Social & Epidemiological Rychiatry,Dept of
kers is likely to lead to serious imprecision, Psychiatry,University of Leicester, Leicester.
and one way of dealing with this is to make of mental and social status, and also specific,
in relation to using HoNOS. wi- J . K . . I ~ ~ ~ , T . . ~ T . . U XAN:
~ ( I ~ ~ )
the ratings in multidisciplinary care plan- Schedules for Clinical Anessment in Neuropsychii.
ning reviews, so that the maximum infor- Finally, consideration should be given to Archives of Geneml Rychiatq 47.589-593.
mation and expertise can be brought to whether further investment should be made
in improving the savcture of the instrument. -.Curtis, R H. & Bewor, A. S. (1996) HoNOS:
bear. Even so, it is not clear over what period Healthofthe Nation Outcome Scales. Report on
outcome should be measured, and whether Researchand Development. July 1993-December 1995.
the inevitable acute episodes characteristic REFERENCES London: Royal College of Psychiatrists Research Unit.
of many patients with severe mental illness -.Bewor,A.S..Cvtb.RH.,ad(Im)
Cohm, J. A. (1960) A coeff'cientof agreement
should be excluded from overall consider- of nominal scale. Education and Rychobgicol
Healthofthe Nation Outcome Scales (HoNOS).
ation. Attempting to use the outcome of Researchand development. Britishjoumd of Psychiotr)!
Measmment. 10.3746.
I72.11-18.
patients as a measure of the pedormance
D e p a m n t of Hc&h (1991) The Heohh athe
of individual keyworkers (as has apparently m#ld ).b.lthOrganlution (I-) SOW Schedules
Notimi o sstmtegy fbr Engkmd. London: HMSa C Uinicol Assessment in Neump@tiov)cGeneva:
been suggested) is likely to lead to biased
WHO.
ratings for obvious reasons. It might be Hurry*J. & Sturt, E. (1981) Social performancein a
feasible to use the scales to establish populationsample: relationto pychiatric symptomr - (I-) The W)-I0 C k i f i ~ ~ ~ of
t i oMend
n and
In What is o Case? The RDMem of Definitionin Ps@&m Behoviourol Di&. Uinicol Descriptions and Diagnostic
comparative case loads in different service Community Surveys (eds P. E. Bebbington. L. Robins Guidelines.Geneva: WHO.
locations, but only in broad-brush tenns, & J. K.Wing), pp. 202-213. London: Grant Maclntyre.
as the training needed to establish true com-
parability would be extensive and expensive. ..
- - Bebblqtan. R. era1(l9U) Sociemographic
associationwith social d i m e n t in a community
W . T . & Sturt. E. (1986) The measurement
ofswial behaviour in p s y c h i c patients: an assessment
ofthe reliability and validity ofthe SBS xheduk.
Nevertheless, we do not think the instru- sample. Socid Psychiotq 18,113-121. BritishJournal of Rychiotq 148,l- 11.
-.
ment should be abandoned. This is because
one less overt purpose that it might serve is ab#clyon. RE. & thnant.C. (1907) Psychiatric - & Hurry, J. (1-1) Social behaviour and psychiatric
symptoms and social disablement as determinants of disorders. In k i o l Psychiocry: Theory, Methodologyand
the inculcation of a culture of assessment in illness behaviour. AuNnlion and New Zeolondjoumol P a t i c e (ed. F? E. Bebbington).New Brunswick. Nj:
community mental health teams in a way of F'sychiovx 21.68-74. Transaction,
Validation of the Health of the Nation Outcome Scales.
P Bebbington, T Brugha, T Hill, L Marsden and S Window
BJP 1999, 174:389-394.
Access the most recent version at DOI: 10.1192/bjp.174.5.389
References This article cites 0 articles, 0 of which you can access for free at:
http://bjp.rcpsych.org/content/174/5/389#BIBL
Reprints/ To obtain reprints or permission to reproduce material from this paper, please write
permissions to permissions@rcpsych.ac.uk