You are on page 1of 11

Clinical Psychology and Psychotherapy, Vol.

3 (4), 249-258 (1996)

The Reliability and Validity of the


Outcome Questionnaire
Michael J. Lambert^
Gary M. Burlingame
Val Umphress
Nathan 6. Hansen
David A, Vermeersch
Glenn C. Clouse
Stephen C. Yanchar
Brigham Young University, Prcwo, USA

With the rise in efforts to evaluate the quality of mental health care and
its outcomes, the measurement of change has become an important
topic. This paper tracks the creation of a new instrument designed to
assess psychotherapy outcome. The Outcome Questionnaire (OQ) was
designed to include items relevant to three domains central to mental
health: subjective discomfort, interpersonal relations, and social role
performance. This study describes the theoretical development and
psychometric properties of the OQ. Psychometric properties were
assessed using clinical, community, and undergraduate samples. The
OQ appears to have high reliability and evidence to suggest good
concurrent and construct validity of the total score. The data presented
show that it distinguishes patient from non-patient samples, is sensitive
to change, and correlates with other measures of patient distress.

INTRODUCTION a reliable means of defining treatment goal criteria


and monitoring efficacy of treatments. But in
Current trends in health care reform show the
contrast to traditional scientific studies of the effects
managed health care system to be an increasingly
of psychotherapy, much current research is being
popular alternative to traditional health care
undertaken to assess the quality of services and
delivery. As some kind of managed care or national
track patient progress.
health coverage eventually becomes a reality, a
growing body of literature indicates that health Formal outcome research is a manifold enterprise,
policy planners are concerned with the relationship ideally incorporating nim\erous measures of
between mental health care cost containment and patients subjective discomfort, clinician ratings,
the quality of services rendered (Bloom, 1987; trained observer ratings, physiological indices, and
Richardson and Austad, 1991; Mirin and Namerow, environmental information (Lambert et al. 1992).
1991; Brokowski, 1991). In the midst of debate While it is commonly accepted that such a multi-
surroxmding a definition of proper mental health dimensional approach offers greatly improved
care delivery lies the issue of determining criteria means of charting patient progress in terms of both
for, and assessing the effectiveness of, therapeutic scientific rigour and comprehensive assessment,
systems. As indicated elsewhere (Lambert, 1983; practical considerations encountered in routine
Ahmed and Smith, 1991; Mirin and Namerow, 1991; clinical practice limit a clinical researcher's ability
Moses-Zirkes, 1993), outcome assessment provides to conduct comprehensive assessments that inte-
grate the aforementioned criteria. Despite this,
outcome minded third-party payers show a con-
'Addressee for correspondence: Professor Michael J. Lambert, tinued interest in brief measures that tap a variety of
Department of Clinical Psychology, Brigham Young Univer- potential outcomes, without the attendant method-
sity, 284 Taylor Building, PO Box 28626, Provo, Utah 84602-
8626, USA. ological complexities of formal outcome research.

CCC 1063-3995/96/040249-10
© 1996 by John WUey & Sons, Ltd.
250 M. /. Lambert et al.

The Outcome (Questionnaire (OQ) is a brief self- mental disorders across five catchment areas and a
report instrument, designed for repeated measure- review of the most frequently occurring DSM-III-R
ment of client status through the course of therapy diagnoses in a national managed care corporation
and at termination. Ease of administration and (Human Affairs International). The 1988 epidemio-
scoring, low cost, sensitivity to changes in psycho- logical study of 18571 people across the United
logical distress over short periods of time, and an States showed that 15.4% of the population over 18
ability to tap a wide array of symptomatology and years of age fulfilled diagnostic criteria for a mental
role ftinctioning may make this instrument useful in disorder. Approximately 12% of the total popula-
a variety of clinical and counselling applications. tion received either an anxiety diagnosis or an
The OQ was formulated in accordance with affective disorder classification (Regier et ai, 1988).
Lambert's (1983) organizational scheme for out- This pattern was also present in data provided by
come assessment, suggesting that three dimensions Human Affairs International (HAI). For instance,
or content areas (called domains) be evaluated: examination of the diagnosis of 2145 HAI out-
intrapersonal (subjective) discomfort or symptom- patients treated in the first quarter of 1992 showed
atic distress, interpersonal functioning, and social that nearly 1/3 of all diagnoses involved a form of
role performance. Use of this conceptualization affective disorder. An additional 1/3 of the patients
seems justified, in that its breadth affords a were given some kind of anxiety disorder diag-
comprehensive review which encompasses inner noses, including post traumatic stress disorder.
life as well as progress in applied situations like These data suggest that the most prevalent intra-
work and school. In addition some items were psychic symptoms are depression and anxiety
included to tap positive states of mental health and based. Therefore, the OQ is heavily loaded with
life functioning. It was believed that these items such items. Following affective and anxiety dis-
would not only assess quality of Ufe as perceived by orders, substance abuse problems were the next
the client, but increase the range of measurement so most frequently occurring diagnoses, and thus
that the test did not suffer from an artificially low relevant items were included in the symptomatic
ceiling as is true in tests that only measure the distress subscale of the OQ.
presence or absence of psychopathology rather than Individual items in this domain were selected
aspects of healthy functioning. according to the following criteria: (1) fit to
The primary aim of the present study was to DSM-III-R criteria; (2) frequent citation in the litera-
examine the psychometric properties of the OQ. ture as a descriptor or symptom of the selected
Specifically, both internal consistency and test-retest psychopathology; and (3) statistical analysis of
reliability were assessed with undergraduate and individual items (specifically, inter-item correla-
patient samples. Concurrent and construct validity tions). Sample items are: I feel no interest in things;
were examined using undergraduate, community, I tire quickly; I have difficulty concentrating.
and several clinical samples. After describing the
item selection procedure and the development of the
Domain: Interpersonal Relations
OQ, the Methods section will address efforts to
acquire normative data and to assess the ability of the The inclusion of items for assessing interpersonal
OQ to measure patient change and to distinguish relations was based on research suggesting that
clinical from non-clinical samples. people consider relationships essential to happiness
and life satisfaction (Beiser, 1973; Andrews and
Withey, 1974; Blau, 1977; Veit and Ware, 1983;
Diener, 1984). Other research has shown that the
METHODS most common problems addressed in therapy are
The selection of all items and domains for the OQ interpersonal in nature (Horowitz 1979; Horowitz
was based on two methods: (1) rational selection et al., 1988). While factors associated with quality of
and (2) empirical data analysis. life vary from study to study, most emphasize the
importance of intimate relationships and their
central contribution to well-being (Diener, 1984).
Domain: Symptomatic Distress In addition, interpersonal problems are clearly
The choice of items to measure psychological related to intrapersonal distress, either as a direct
symptoms was derived from a 1988 National cause of psychopathology, as a result of psycho-
Institute of Mental Health (NIMH) epidemiological pathology, or as both a cause and a result (Klerman,
survey which identified the most prevalent types of 1974; Horowitz et al, 1988). Therefore items dealing
Outcome Questionnaire 251

with friendships, family, family life, and marriage Subjects


were developed. These included attempts to mea-
Subjects included two normal (student and com-
sure friction, conflict, isolation, inadequacy, and
mimity) and three clinical groups. Several student
withdrawal, which are all common complaints
samples totaling 424 undergraduate students from a
addressed in therapy sessions. These items were
large western university were collected. A commun-
derived from the existent marital and family
ity sample of 102 subjects was selected at random
relations literature, as well as research on inter-
from several medium sized cities within a single
personal problems most often reported by patients
western state. The clinical samples were collected
who are undergoing psychotherapy (Horowitz et al.,
from clinics in Arizona, Florida, Pennsylvania,
1991). As with the symptomatic distress questions,
Georgia, Maryland, North Carolina, Ohio, Utah
item analyses were performed on individual items
and Washington, DC. Clinical data were pooled for
aimed at measuring interpersonal functioning to
the large sample of patients who had sought
ensure their suitability. Sample items are: I feel
treatment via their place of employment through
loved and wanted; I have trouble getting along with
their employee assistance programme (EAP). This
friends and close acquaintances; I am satisfied with
clinical (EAP) sample consisted of 504 patients. AU
my relationships with others.
the patients in this sample had been given a DSM-
III-R diagnosis by their clinician. Two additional
Domain: Social Role clinical samples were collected. The community
mental health centre sample (CMH) was composed
Social role performance was assessed by focusing
of 100 patients seeking treatment at a community
on patient's level of dissatisfaction, conflict, distress,
mental health centre serving a rural catchment area.
and inadequacy in tasks related to their employ-
The outpatient clinic sample (university outpatient
ment, family roles, and leisure life. Assessment of
clinic, N = 76) was collected from patients attend-
social roles suggests that a person's intrapsychic
problems and symptoms can effect their ability to ing therapy in a public training clinic sponsored by
work, love, and play. This is supported by the a large private university. The clinic patients came
aforementioned quality of life research as well as from both urban and rural neighbourhoods. Demo-
the assumption that once people start to develop graphic information on these various samples are
sj^nptoms it is common for these symptoms to have presented in Table 1.
an effect on their personal and work lives (Frisch
et al., 1992). Thus, items were developed that Measures
measure performance in these three areas. Satisfac-
tion in these areas is highly correlated with ratings All samples completed the OQ. The OQ requires
of overall life satisfaction (Beiser, 1973; Blau, 1977; subjects to rate their feeUngs on a five-point Likert
Veit and Ware, 1983; Frisch et al, 1992). Sample scale ranging from 0 to 4.t
items are: I enjoy my spare time; I feel that I am
doing weU at work/school; I feel angry enough at Measures of Symptomatic Distress
work/school to do something I might regret. The Symptom Checklist 90-Revised (SCL-90-R) is
a 90-item self-report questionnaire that assesses
common psychiatric symptoms in nine areas. The
Preliminary Data Analysis SCL-90-R has high internal consistency (0.77 to 0.90)
and test-retest reliability (0.78 to 0.90) (Derogatis,
After initial development of items for each content 1977). Its validity as a meaure of S5miptomatic
domain preliminary reliability analysis were con- distress has also been supported (Derogatis et al.,
ducted on the Outcome Questionnaire using 86 1976). It is frequently used in psychotherapy
undergraduate psychology students. Item analysis outcome research as a measure of improvement
was conducted to determine each item's reliability (McRoberts and Lambert, 1993). The typical
by examining its point biserial correlations with the
total test score (Cronbach, 1951). Problematic items
(e.g. low reliability, reverse discrimination) were tCopies of this test will be made available upx)n request. The
either discarded or reworded, and alterations were user licensing agreement makes it available to individual
made as were needed to improve the item.* clinicians for a nominal fee of $10, after which it may be
reproduced without additional cost. Larger networks of
providers pay a higher licensing fee, but enjoy the same
'Data on the original version including reliability data on privileges. Practically speaking, the OQ costs little more than
each item are available from the first author. the three to five cents it costs to reproduce it.
252 M. J. Lambert et al.

Table 1. Demographic characteristics of the normative samples used to evaluate the Outcome Questionnaire
Sample N Age Gender Marital status Ethnicity H.S.
m SD M F married divorce/ (Zaucasian/other Grad
separated
University students 424 21.13/3.66 38 62 15% 2% 96% 4% 100%
Community sample 102 42.51/10.08 43 57 87% 13% n/a n/a 95%
EAP 504 37.23/10.13 59 41 n/a n/a 82.5% 17.5% n/a
Commtinity mental health 100 39.55/10.04 39 61 n/a n/a n/a n/a n/a
University outpatient clinic 76 29.58/9.69 68 32 48% 35% 97% 3% n/a
Note. In some cases demographic data were not available (denoted by n/a) due to differences in forms used to collect the data at different
localities.

SCL-90-R score used in outcome research is the which allows a patient to rate the amount of time
General Severity Index (GSI). they are distressed by a particular symptom.
The Beck Depression Inventory (BDI) is a 21-item Reliability and validity of the ZSAS appears to be
questionnaire that was developed through clinical quite high (Corcoran and Fischer, 1987).
observation of 21 attitudes and symptoms common The Taylor Manifest Anxiety Scale (TMAS)
to depressed psychiatric clients (Beck et al, 1961, consists of 50 items that were taken from a pool of
1988). These symptoms are rated by clients on a 200 anxiety-related items found on the Minnesota
4-point, anchored scale, from 0 to 3. The BDI has Multiphasic Personality Inventory (MMPI). These
been shown to have good psychometric properties 50 items were statistically analysed and found to be
with coefficient alpha ratings between 0.73 and 0.92 the most indicative of manifest anxiety (Lambert,
and test-retest coefficients ranging from 0.48 to 1983). The TMAS has excellent psychometric prop-
0.86, depending on the time intervals for retesting erties (Spielberger, 1983; Lambert, 1983).
and sample characteristics (Beck et al, 1988). The
BDI is a frequently used outcome measure for
examining changes in depressive symptomatology Validity Measure of Interpersonal Relations
(McRoberts and Lambert, 1993). The Inventory of Interpersonal Problems (IIP) is a
127-item self-report test that was designed to
The State-Trait Anxiety Inventory (STAI) is a measure the type of interpersonal difficulties
40-item test for measuring anxiety. The STAI patients experience as well as the corresponding
(form Y) is divided into two 20-item parts; the degree of discomfort (Horowitz et. al, 1988).
Y-1, for assessing 'state' anxiety; and Y-2, for Patients rate the level of distress experienced for
assessing 'trait' anxiety. The test allows subjects to each problem on a 5-point Likert scale, ranging
answer on a 4-point Likert scale ranging from 1 to 4 from 0 to 4. The IIP demonstrates high internal
(Speilberger, 1983). The internal consistencies for consistency (0.82 to 0.93). Test-retest reliability is
both scales range from 0.83 to 0.92. Test-retest also high (0.81 to 0.98). Discriminant validity is also
coefficients vary by form: state anxiety (0.16 to 0.54); high (Horowitz et al, 1988). The IIP was included in
frait anxiety from 0.76 to 0.86 (Dreger and Brabham, this study because it is the most frequently recom-
1987). These scales are frequently used measures of mended measure of outcome for interpersonal
anxiety (Froyd et al, 1996). problems (Horowitz et al, 1994).
The Zimg Self-Rating Depression Scale (ZSDS) is
a widely-used 20-item questionnaire which assesses
the frequency of depressive symptoms (Lambert, Validity Measure of Social Role Functioning
1983). Specifically, the ZSDS examines three basic The Social Adjustment Scale (SAS) is a self-report
aspects of depression: pervasive affect, physio- form based on the Social Adjustment Scale semi-
logical S3miptoms, and psychological symptoms structured interview. The SAS covers seven role
(Zung, 1965). Reliability and validity data support areas: work—as a worker, homemaker, or student;
the value of the ZSDS as a measure of depression social and leisure activities; relationships with
(Zung, 1965, 1972). extended family; marital role and parental role;
The Zung Self-Rating Anxiety Scale (ZSAS) is a family unit role; and economic role. Reliability and
20-item self-report instrument based on diagnostic validity were both foxmd to be adequate for this
criteria for anxiety (Zung, 1971). The ZSAS is self-report outcome measure (Weissman and
answered using a Likert scale ranging from 1 to 4 BothweU, 1976).
Outcome Questionnaire 253

Procedures clinic, and community mental health subjects),


while in other cases test batteries aimed at assessing
The undergraduate samples were tested in a class-
the validity of the OQ were also administered (i.e.
room setting, with a proctor administering the
college undergraduates). The Results (Table 3)
instruments. Retest administration used the same
section dealing with validity specifies which test
procedure 3 weeks following the initial testing
batteries were administered to particular normative
period.
samples.
The community sample was first contacted by
phone by choosing each tenth name in the local
phone directory. At this time adults in the house-
hold were asked if they would fill out question- RESULTS
naires in order to help the researchers better
understand some psychological tests and how
people respond to them. If they consented to
Normative Data
participate they were mailed questionnaires along Outcome Questionnaire scores for the five study
with a consent form and a self-addressed stamped groups are presented in Table 2 for OQ total score
return envelope. Responses were anonymous to and for domain scores. Mean scores were broken
encourage candid reporting. Approximately 55% of down to display gender differences. For those
those contacted returned questionnaires. groups where gender data were available t-tests
Data from the clinical samples were typically for independent samples were conducted indicating
collected by clinic receptionists who administered that no statistically significant differences existed
the OQ prior to the patients first therapy session. between the average scores of males and females.
Included in the test packet was information This finding held up across patient as well as non-
pertaining to subject confidentiality as well as a patient samples. Tliis result suggests that it is
consent form. The OQ was made available to appropriate to develop norms for all subjects
clinicians on a voluntary basis. The employee combined rather than developing separate male/
assistance programme patients were drawn from a female interpretive graphs.
data base supplied by Human Affairs International Sufficient data were available to analyse the
based on their routine use of the instrument. The relationship between age and OQ scores. This was
various samples that made up the subject pool were done using a Pearson product-moment correlation
tested in multiple settings with various testing (Cohen and Cohen, 1983). Within the broad age
conditions, and different samples completed sub- range of the EAP database (ages 17-74) no
sets of the entire test battery. In some cases, subjects significant relationship was found between age
completed only the OQ (i.e. the EAP, university and OQ score (N = 504, r = 0.01).

Table 2. Comparison of normative samples on the Outcome Questionnaire


Sample N Total Score Distress Interpersonal Sodal
Mean (SD) Mean (SD) Mean (SD) Mean (SD)
Undergraduate 238 42.33 (16.60) 23.08 (10.53) 8.95 (5.39) 10.37 (3.62)
Male 91 42.73 (15.89) 22.71 (10.07) 9.81 (6.24) 10.43 (3.63)
Female 147 42.10 (17.21) 23.43 (10.89) 8.31 (4.72) 10.35 (3.65)
Community 102 48.16 (18.23) 25.73 (10.26) 10.81 (5.74) 9.81 (3.91)
Male 46 49.20 (17.59) 25.37 (9.70) 11.51 (5.83) 10.43 (3.39)
Female 56 48.43 (18.48) 26.52 (10.85) 10.52 (5.39) 9.48 (3.95)
EAP 504 73.02 (21.05) 41.83 (14.15) 17.13 (6.03) 13.76 (4.83)
Male 198 73.52 (21.87) 41.64 (14.48) 17.49 (6.26) 14.41 (5.04)
Female 306 72.70 (20.70) 41.96 (13.96) 16.90 (5.88) 13.33 (4.64)
University oupatient clinic 76 78.01 (25.71) 42.88 (14.72) 17.25 (6.61) 14.24 (5.72)
Male 23 76.27 (26.53) 40.86 (15.08) 17.86 (6.42) 14.27 (5.75)
Female 53 81.82 (23.58) 45.34 (13.82) 17.80 (6.17) 14.70 (5.62)
Community mental
health centre 100 86.07 (19.33)'
'Gender and subscale information not available.
254 M. /. Lambert et al.

Table 3. Test-retest reliability and internal consistency values for the outcome questionnaire total and domain scores
Test-Retest' Internal consistency^
(Student) (Student) (Patients)
Symptom distress 0.78 (N == 157) 0.92 (N == 157) 0.91 (N == 298)
Interpersonal 0.80 (N == 157) 0.74 (N == 157) 0.74 (N == 294)
Social role 0.82 (N == 157) 0.70 (N == 157) 0.71 (N == 295)
OQ total 0.84 (N == 157) 0.93 (N == 157) 0.93 (N == 289)3
'Pearson product-moment correlation coefficient (Cohen and Cohen, 1983).
^Coefficient alpha (Cronbach, 1951).
'All coefficients significant at the 0.01 level of confidence.

Reliability of the OQ STAI-Y2, and ZSAS), and global distress (GSI)


criteria. The interpersonal relations and social role
Both test-retest and internal consistency reliability
subscales yielded lower validity coefficients with
were assessed using a subsample of 157 under-
these measures of symptomatic distress, but in
graduate students. Internal consistency was also
every case, relationships between the OQ scales and
calculated on the EAP patients. As is evident in
Table 3, both the total score of the OQ and the criteria were statistically reliable (p < 0.05).
symptom distress subscale demonstrated excellent Pearson product-moment correlations were also
internal consistency. More heterogeneity existed in calculated between the OQ subscales and the OQ
the interpersonal relations and social role content total score. These correlations computed on the 504
domains which is not surprising given the breadth patients from the EAP population showed signifi-
of functioning that these latter subscales attempt to cant relationships between the total score and the
assess. The undergraduates retook the OQ under domains of interpersonal relations (r = 0.71), social
similar circumstances 21 days after the original role (r = 0.70), and symptomatic distress (r = 0.94).
administration. All domain and total scores The lowest correlation was 0.36 between the
appeared to be temporally stable using this 3-week domain of social role and interpersonal relations.
time period (Table 3). At the mid-point were correlations between symp-
tomatic distress and interpersonal relations
(r = 0.51), and sjmiptomatic distress and social role
Validity of the OQ (r = 0.54).
Concurrent Validity
Concurrent validity as measured with the under- Sensitivity to change
graduate samples involved computing Pearson The OQ's construct validity depends, in part, on
product-moment correlations (Cohen and Cohen, the ability of the OQ to reflect change following
1983) between the OQ total score (and subscale interventions such as psychotherapy. While retest
scores) and the various criterion tests. The expecta- scores for individuals are not expected to fluctuate
tion was that a specific criterion would correlate systematically over time, it is expected that
most highly with particular domain scores. Spec- the scores of patients receiving psychological or
ificaUy it was expected that the IIP and SAS would psychopharmcological interventions would become
correlate most highly with the interpersonal rela- progressively lower over time. Past psychotherapy
tions and the social role subscales, respectively. research shows that most patients typically improve
Since the symptom subscale measured general in therapy and a portion improve even in placebo
intrapsychic distress the specific criterion tests treatments. The greatest gains are expected to take
measuring anxiety and depression as well as the place by about the eighth therapy session (Lambert
Symptom Checklist-90R were expected to correlate and Bergin, 1994).
most highly with this subscale. Given the consistent nature of these past findings
As depicted in Table 4, moderate to high validity the construct validity of the OQ would be
coefficients were found between the total score of supported if the scores for patients after seven
the OQ and all criterion measures. As predicted, the sessions of therapy were lower than their pre-
strongest relationships for the total score (and therapy levels. This hypothesis was tested by
symptom distress subscale) were found on the following a subset of patients in treatment at the
depression (ZSDS and BDI), anxiety (TMAS, University Outpatient Clinic. Of the 76 patients who
Outcome Questionnaire 255

Table 4. Validity estimates for the Outcome Questionnaire


Criterion OQ total score Symptom Interpersonal Social role
distress relations
0.78 (N = 115)' 0.61 (N = 115)'
0.72 (N = 238) 0.70 (N = 238) 0.50 (N = 238) 0.52 (N = 238)
BDI2 0.80 (N = 115)' 0.63 (N = 115)'
0.62 (N = 238) 0.59 (N = 238) 0.44 (N = 238) 0.47 (N = 238)
2SDS2 0.88 (N = 71) 0.89 (N = 71) 0.67 (N = 71)
ZSAS^ 0.80 (N = 71) 0.81 (N = 71) 0.53 (N = 71) 0.71 (N = 71)
TMAS^ 0.86 (N = 71) 0.88 (N = 71) 0.63 (N = 71) 0.64 (N = 71)
STAI(y-l)2 64 (N = 115)' 0.50 (N = 115)'
STAI(y-2)2 80 (N = 115)' 0.65 (N = 115)'
IIP2 0.60 (N = 71) 0.60 (N = 71) 0.54 (N = 71) 0.47 (N = 71)
0.63 (N = 238) 0.58 (N = 238) 0.50 (N = 238) 0.60 (N = 238)
SAS^ 0.62 (N = 71) 0.56 (N = 71) 0.65 (N = 71) 0.44 (N = 71)
0.60 (N = 238) 0.52 (N = 238) 0.47 (N = 238) 0.41 (N = 238)
'These values were obtained with a preliminary 43 item version of the current 45 item test.
^GSI = General Severity Index; BDI = Beck Depression Inventory; ZSDS = Zung Self Rating Depression Scale; ZSAS = Zung Self Rating
Anxiety Scale; TMAS = Taylor Manifest Anxiety Scale; STAI = State Trait Anxiety Inventory (Y-1 = State Anxiety; Y-2 = Trait Anxiety);
IIP = Inventory of Interpersonal Problems; SAS = Social Adjustment Scale.

Table 5. Pre-test and post-test scores for outpatients following seven sessions of Psychotherapy
OQ score N Pre-test Post-test
Mean (SD) Mean (SD) f-value (df)
Total score 40 84.65 (24.14) 67.18 (27.12) 4.78 (39)'
Symptom distress 40 46.20 (14.42) 36.65 (16.58) 4.26 (39)'
Interpersonal relations 40 18.35 (5.75) 15.67 (6.08) 3.30 (39)'
Social role performance 40 15.83 (6.0) 11.98 (5.68) 4.30 (39)'
'Significant beyond the 0.001 level of confidence.

took the CX2 prior to entering therapy 40 patients patient clinic sample, the EAP sample, the commu-
had at least seven therapy sessions. A Mest for nity sample and the undergraduate samples. A one-
correlated samples was used to test the hypothesis way analysis of variance was conducted to deter-
that OQ scores would decrease following a mine the difference between adjacent sample
sufficient dose of individual psychotherapy. As means. These results are presented in Table 6. Not
expected the f-test between the mean of the patient only were the samples ranked in order of most to
pre-test scores and the mean of their post-test scores least disturbed, but in all but one comparison (EAP
revealed statistically significant improvement. versus university outpatient clinic) these differences
These data are presented in Table 5. were significant at or beyond the 0.01 level of
confidence.
Ability to Discriminate Patient and A final consideration for the OQ 45 is related to
Non-Patient Samples the accuracy of the proposed cut-off scores that
Support for the construct validity of the OQ was discriminate between the normal (w = 342) and
also sought by comparing the three cUnical samples' clinical (« = 744) samples. Sensitivity and specificity
scores on the OQ with those of the community and indices were used to examine classification accur-
undergraduate samples. It was assumed that acy. Sensitivity is the probability of correctly
statistically significant differences between the classifying a person in the clinical population when
means of the clinical and normative samples would they are in fact a member of the clinical sample. The
suggest that the OQ could reliably distinguish OQ 45 was shown to have a sensitivity index of
between these groups. Further it was expected that 0.85, thus the proposed cut-off scores would
the mean scores for the groups would be ordered correctly classify 85 clinical subjects out of every
from most pathological to least pathological. We 100 persons tested.
expected the community mental health group to be Specificity is the probability of correctly classify-
most disturbed followed by the university out- ing a person in the normal population when they
256 M. /. Lambert et al.

Table 6. Comparison of level of psychopathology as measured by the OQ across patient and non-patient samples
Comparison group N Mean (SD) f-ratio
Undergraduate 238 42.33 (16.60) 146.09'
Community 102 48.16 (18.23)
Employee assistance programme 504 73.02 (21.05)2
Uruversity outpatient clinic 76 78.01 (25.71)2
Community mental health centre 100 86.07 (19.33)
'Significant beyond the 0.001 confidence level.
^All values differ significantly, except between the employee assistance programme sample and the university outpatient clinic sample,
using Tuke/s Honestly Significant Difference test.

are indeed in the normal sample. The OQ 45 (e.g. TMAS, ZSAS), depression (e.g. BDI, ZSDS),
produced a specificity index of 0.74. Thus 74 out general symptoms (e.g. SCL-90-R), interpersonal
of 100 normal subjects are correctly identified by the problems (i.e. IIP) or social adjustment (i.e. SAS).
proposed cut-off scores. Since specificity reflects the The total OQ score correlated most highly with
false positive rate (identifying someone as clinical mono-sjnnptomatic scales and the general severity
when they are a member of the normal sample) one index of the SCL-90-R. All these measures tap
recent finding should be pointed out. Epidemio- symptoms that are common in psychological dis-
logical studies have estimated that about 20% of a orders.
'normal' population have clinically notable levels of The total OQ score was least correlated with the
psychopathology (Regier et al, 1988). Thus, given two measures that did not specifically aim to
this relatively high percentage, one would expect a measure psychological distress per se (IIP and
higher false positive rate on the OQ 45 as indicated SAS). These two instruments produced the lowest
by the lower specificity index. correlations with the total OQ score as well as the
two domain scores with which we expected the
highest correlations (Interpersonal Relations and
Social Role Performance). This finding can be
DISCUSSION partially explained by the high overall internal
This research reports on the development of a brief consistency of the OQ, suggesting similarity of
questionnaire that can be used to assess change in resporwes to items by the subjects. This high
patients following psychological interventions. The internal consistency might be due to item content
goal was to create a cost effective self-report scale and suggests the need to carefully examine and
that was both reliable and valid. possibly revise the Interpersonal and Social Role
Results demonstrated that the overall question- subscales. It could also be a result of the homo-
naire and its three subscales possess adequate geneous subject sample (i.e. imdergraduates) and
stability, as demonstrated by the test-retest coeffi- suggests the need to conduct further validity
cients. The test-retest reliability of the OQ is studies with patient samples since patients can be
comparable to other brief outcome measures such expected to have a greater range of role impairment
as those with which it was compared in this study and interpersonal difficulties.
(e.g. BDI, SCL-90-R). Future research should focus The high intercorrelations among the subscale
on the stability of the OQ with clinical samples prior scores suggests a single underlying factor. This may
to their undergoing therapy and over longer reflect the fact that both problems in social role
periods of time with normal controls. The internal functioning as well as interpersonal problems are
consistency for the vmdergraduate as well as clirucal clearly related to internal distress, either as a direct
samples appears to be very high for both the total cause of psychopathology, as a result of this
OQ score as well as domain scores. These results psychopathology, or as both cause and effect. This
suggest that the test as a whole measures common intertwining of problems has been noted elsewhere
variables. (Klerman, 1974; Horowitz et al, 1988). It may be
Both undergraduate and clinical samples were that these three domains covary so highly that they
used to assess the validity of the OQ. It is clear from cannot be statistically separated. That is, despite
the correlational data found in this study that the differences in content, subscales may not provide
OQ has a strong relationship with other self-report distinct information. A factor analytic investigation
measures whether they purport to measure anxiety of a larger data set has been completed that further
Outcome Questionnaire 257

examines these issues (Meuler, 1995). A related sets, of which social desirability is reported to be the
topic for future research is to examine whether most common and problematic (Edwards, 1957).
clients change at the same rate or show differential While this problem may seem serious, it is a general
response to treatment across the three domains. It is problem in the scales that have typically been used
important to note that the OQ was not designed as a to assess outcome. Research suggests no systematic
multitrait test such as the MMPI. The OQ was bias as a consequence of using such self-report
instead designed as a single measure of patient scales (e.g. Ogles et al., 1995) in outcome studies.
recovery in mental health contexts that draws items However, it seems imperative to use the OQ only in
from three theoretically interrelated domains, a settings in which clients are motivated to accurately
presupposition supported by its high internal report their psychological state.
consistency. This study provides evidence for the reliability
It was anticipated that patients completing the and validity of the OQ as a measure of psychological
OQ would score significantly higher than those distress. Specifically, high internal consistencies and
from community and student samples. Such a test stability were found for the overall score as well
difference was found in this study, strongly sug- as for each of the three domains. Evidence for
gesting that the OQ measures what it purports to concurrent validity was also found using vinder-
measure—psychological distress. It is interesting to graduate populations and support for construct
note that the OQ could distinguish not only the validity was demonstrated by the OQ showing
community mental health patients from other sensitivity to changes across therapy, and the ability
clinical groups but that it differentiated student to discriminate between community, student and
samples from community volimteers. It is not clinical samples. Suggestions for future research
common for outcome measures to be able to make include assessing its concurrent validity with a
these same distinctions (Gnmdy, Grundy and clirucal sample with other well-known distress
Lambert, 1996). It remains to be seen if outpatient measures. Although certain limitations presently
and inpatient samples can also be successfully exist with the measure, the OQ will provide
distinguished from each other. therapists with an instrument that is easily admin-
Because psychotherapy should serve to decrease istered, available at low cost, measures general
the client's levels of distress, the scores on the OQ distress, and is sensitive to change in sjrmptoms
should decline over treatment sessions. Changes in and behaviours over short periods of time, while
distress following therapy would therefore bolster maintaining sound psychometric properties.
the construct validity of the OQ. That such change
was measurable with the OQ was confirmed
following seven sessions of individual psycho-
therapy. Of course, such change cannot be attrib- REFERENCES
uted to therapy itself since the design of our study Ahmed, T. and Smith, R. (1991). Impacts of Managed Mental
did not include any control groups or additional Health Care Employee Assistance Programs On Costs and
measures of change. Future research will address Utilization. A report prepared for Aetna Health Plans.
these shortcomings, but the data presented do Andrews, F. M. and Withey, S. B. (1974). Developing
measures of perceived life quality: Results from several
suggest that the items that make up the OQ are national surveys. Social Indicators Research, 1, 1-26.
rated differently over time by a majority of Beck, A. T., Steer, R. A. and Garbin, M. G. (1988). Psycho-
treatment participants. Further analysis of session metric properties of the Beck Depression Inventory:
by session change as measured by the OQ has been Twenty-five years later. Clinical Psychology Review, 8,
published by Kadera et al, (1996). 77-100.
Beck, A. T., Ward, C. H., Mendelson, M., Mock, J. and
Problems typically noted with brief self-report Erbaugh, J. (1961). An inventory for measuring depres-
tests (Derogatis et al, 1974; Boulet and Boss, 1991) sion. Archives of General Psychology, 4, 53-63.
apply to the OQ as a measure of patient distress and Beiser, M. (1973). Components and correlates of mental
deserve mention. First, interpretation of OQ scores well-being. Journal of Health and Social Behavior, 15,
typically relies on the assumption that the client will 320-327.
be accurate in the assessment of their mental or Blau, T. H. (1977). Quality of life, social interaction and
emotional states. Either because of acquiescence, criteria of change. Professional Psychology, 8, 464-473.
carelessness, boredom, lack of understanding, psy- Bloom, A. (1987). Liability concern of utilization review
and quality assurance programs. HMO, 1, 128-133.
choticism, or numerous other factors, the client's
Boulet, J. and Boss, M. W. (1991). Reliability and validity
responses may not be congruent with how (s)he is of tiie Brief Symptom Inventory. ]oumal of Consulting
really feeling. The OQ has no control for response and Clinical Psychology, 3(3), 433-437.
258 M. /. Lambert et al.

Brokowski, A. (1991). Current mental health care envir- relationship. The Journal of Psychotherapy Practice and
onments: Why managed care is necessary. Professional Research, 5, 132-151.
Psychology: Research and Practice, 22, 6-14. Klerman, G. L. (1974). Depression and adaptation. In R. J.
Cohen, J. and Cohen, P. (1983). Applied Multiple Regression/ Friedman and M. M. Katz (Eds), The Psychology of
Correlation Analysis for the Behavioral Sciences, 2nd edn. Depression: Contemporary Theory and Research. Washing-
New Jersey: Lawrence Erlbauin Associates. ton, DC: Winston-Wiley.
Corcoran, K. and Fischer, J. (1987). Measures for Clinical Lambert, M. J. (1983). Introduction to assessment of
Practice: A Sourcebook. New York: The Free Press. psychotherapy outcome: Historical perspective and
Cronbach, L. J. (1951). Coefficient alpha and the internal current issues. In M. J. Lambert, E. R. Christensen and
structure of tests. Psychometrika, 16, 297-334. S. S. DeJuUo (Eds), The Assessment of Psychotherapy
Derogatis, L. R. (1977). The SCL-90 Manual: Scoring, Outcome. New York: John Wiley.
Administration and Procedures for the SCL-90. Baltimore: Lambert, M. J. and Bergin, A. E. (1994). The effectiveness
Johns Hopkins University School of Medicine, Clinical of psychotherapy. In A. E. Bergin and S. L. Garfield
Psychometrics Unit. (Eds), The Handbook of Psychotherapy and Behavior
Derogatis, L. R., Lipman, R. S., Rickels, K., Uhlenhuth, Change, 4th edn. New York: John Wiley.
E. H. and Covi, L. (1974). The Hopkins Symptom Lambert, M. J., Ogles, B. M. and Masters, K. S. (1992).
Checklist (HSCL): A self-report symptom inventory. Choosing outcome assessment devices: An organiza-
Behavioral Science, 19, 1-15. tional and conceptual scheme. Journal of Counseling and
Derogatis, L. R., Rickels, K. and Rock, A. F. (1976). The Development, 70(4), 527-532.
SCL-90 and the MMPI: a step in the validation of a McRoberts, C. H. and Lambert, M. J. (1993). Current
new self-report scale. British Journal of Psychology, 128, trends in psychotherapy outcome measurement. A
280-289. poster presented at the Western Psychological Associa-
Diener, E. (1984). Subjective well-being. Psychological tion, phoenix, Arizona.
Bulletin, 95(3), 542-575. Meuler, R. (1995). An exploratory and confirmatory factor
Dreger, R. M. and Brabham, J. L. (1987). Two clinical analysis of the Outcome Questionnaire. Unpublished
validation studies on the state form and types of doctoral dissertation. Department of Psychology,
reliability of the trait form of the State-Trait Anxiety Brigham Young University: Provo, Utah.
Inventory. Multivariate Experimental Clinical Research, Mirin, S. and Namerow, M. (1991). Why study treatment
8(2), 195-209. outcome? Hospital and Community Psychiatry, 42, 1007-
Edwards, A. L. (1957). The Social Desirability Variable in 1013.
Personality Assessment and Research. New York: Holt. Moses-Zirkes, S. (1993). Outcome research: Everybody
Frisch, M. B., Cornell, J., Villanueva, M. and Retzlaff, P. J. wants it. American Psychological Association Monitor,
(1992). Clinical validation of the quality of life March, p. 7.
inventory: A measure of life satisfaction for use in Ogles, B. M., Lambert, M. J. and Swayer, J. D. (1995).
treatment planning and outcome assessment. Psycho- The clinical significance of the NIMH treatment of
logical Assessment, 4, 92-101. depression collaborative research program data. Journal
Froyd, J. E., Umbert, M. J. and Froyd, J. D. (1996). of Consulting and Clinical Psychology, 63, 317-325.
A survey and critique of psychotherapy outcome Regier, D. A., Boyd, J. H., Burke, Jr. J. D., Rae, D. S.,
measurement. Journal of Mental Health, 5, 11-15. Myers, J. K., Kramer, M., Robins, L. N., George, L. K.,
Grundy, C , Grundy, E. and Lambert, M. J. (1996). Kamo, M. and Locke, B. Z. (1988). One-month
Assessing clinical significance: Application to the prevalence of mental disorders in the United States.
Hamilton Rating Scale for Depression. Journal of Archives of General Psychiatry, 45, 977-986.
Mental Health, 5, 25-33. Richardson, L. M. and Austad, C. S. (1991). Realities
Horowitz, L. M. (1979). On the cognitive structure of of mental health practice in managed care settings.
interpersonal problems treated in psychotherapy. Professional Psychology Research and Practice, 22(1), 52-59.
Journal of Consulting and Clinical Psychology, 47, 5-15. Spielberger, C. D. (1983). Manual for the State-Trait Anxiety
Horowitz, L. M., Locke, K. D., Morse, M. B., Waikar, S. V., Inventory STAI (PormY). Palo Alto, California: Consult-
Dryer, D. C , Tamow, E. and Ghannam, J. (1991). Self- ing Psychologists Press.
derogations and the interpersonal theory. Journal of Veit, C. T. and Ware, J. E. (1983). The structure of psycho-
Personality and Social Psychology, 61, 68-79. logical distress and well-being in general populations.
Horowitz, L. M., Rosenberg, S. E., Baer, B. A., Ureno, G. Journal of Consulting and Clinical Psychology, 51(5),
and VUlasenor, V. S. (1988). Inventory of interpersonal 730-742.
problems: Psychometric properties and clinical Wiessman, M. M. and Bothwell, S. (1976). Assessment of
applications. Journal of Consulting and Clinical Psy- sodal adjustment by patient self-report. Archives of
chology, 56, 885-892. General Psychiatry, 33, 1111-1115.
Horowitz, L., Strupp, H. H. and Lambert, M. J. (1994). Zung, W. W. (1965). A self-rating depression scale.
Report on the APA sponsored core battery confer- Archives of General Psychiatry, 12, 63-70.
ence. March 1994, Vanderbilt University, Nashville, Zung, W. W. (1971). A rating instrument for anxiety
Tennessee. disorders. Psychosomatics, 6, 371-379.
Kadera, S. W., Lambert, M. J. and Andrews, A. A. (1996). Zung, W. W. (1972). The depression status inventory:
How much therapy is really enough: A session-by- An adjunct to the Self-Rating Depression Scale. Journal
session analysis of the psychotherapy dose-effect of Clinical Psychology, 28, 539-543.

You might also like