You are on page 1of 14

Psychological Medicine, 1990, 20, 195-208

Printed in Great Britain

6
The problem of validity in field studies of
psychological disorders' revisited1
BRUCE P. DOHRENWEND2
From the New York State Psychiatric Institute and Columbia University, New York, USA

SYNOPSIS Since the turn of the century and up to about 1980, there have been two generations
of epidemiological studies of the true prevalence of psychiatric disorders: a pre-World War II first
generation and a post-World War II second generation. With the appearance of DSM-III in 1980
and the changes in epidemiological procedures coincident with it, it has become meaningful in the
US to talk about the beginnings of a new, third generation of studies in psychiatric epidemiology.
The purposes of this paper are: first, to briefly summarize the problems of validity with the
procedures for case identification and diagnosis in the first- and second-generation studies; second,
to consider some of the newer developments with regard to diagnostic instruments that either are
or should be influencing third-generation studies; third, to discuss some of the problems of validity
in the handful of third-generation studies done so far; and fourth, to describe and illustrate an
approach that seems to make sense in the context of gaps in knowledge of aetiology and pathogenesis
that leave us still dependent on interviews for case identification and classification.

lence within periods of a few months to a year.


INTRODUCTION There were too few longitudinal studies to permit
It is over 20 years since Barbara Dohrenwend an examination of investigations of true in-
and I published an article titled, 'The problem cidence as well.
of validity in field studies of psychological Epidemiological research is dependent on the
disorder' (Dohrenwend & Dohrenwend, 1965). accuracy of diagnostic methods which is in turn
We had occasions to update this review in 1974 dependent on the progress of laboratory and
(Dohrenwend & Dohrenwend, 1974), in 1980 clinical research. Each of the first and second
with other colleagues (Dohrenwend etal.l 980 a), generation studies tended to pioneer its own
and in 1982 (Dohrenwend & Dohrenwend, unique methods and procedures for counting
1982). In these publications, we were concerned cases, with very little attention in any of them to
with what we came to term the 'firstgeneration' problems of validity. This anarchy reflected the
of pre-World War II studies and the post-war state of diagnostic affairs in the wider mental
'second generation' in which research workers health community. However, a number of
attempted to investigate the true prevalence of developments were under way that have brought
psychiatric disorders in communities all over the about a rather different situation.
world (Dohrenwend & Dohrenwend, 1982). For By 1980, with the appearance of DSM-III,
the most part, these studies focused on preva- and the changes in epidemiological research
procedures coincident with it, it has become
1
This paper was presented at the 1988 Annual Meeting of the
meaningful, I think, to talk about the beginnings
American Psychopathological Association and is published in The of a new, third generation of studies in psy-
Validity of Psychiatric Diagnosis (ed. L. N. Robins and J. E. Barrett), chiatric epidemiology. My purposes in this paper
American Psychopathological Association Series, Raven Press: New
York, 1989. It is reprinted here with the permission of Raven Press
are: to describe briefly as background the first
and includes some updating of the account of the epidemiological and second generation studies and the problems
study in Israel.
2
of validity with their procedures for case
Address for correspondence: Dr B. P. Dohrenwend, NY State identification and diagnosis; to consider some of
Psychiatric Institute, 722 West 168th Street, Box 8, New York, NY
10032, USA. the newer developments with regard to di-
195 7-2
196 B. P. Dohrenwend

agnostic instruments that either are or should be First, in most of the European and Asian
influencing third generation studies; to discuss research, a single psychiatrist or a small team
some of the problems of validity in third- headed by a psychiatrist personally interviewed
generation studies completed so far; and to offer community residents and recorded diagnostic
some suggestions for the future. judgements on the basis of these interviews. As
a rule, the interview procedures were not made
FIRST-GENERATION STUDIES explicit in this type of approach.
Sixteen studies, all of which took place between In the second type, by contrast, standard and
the turn of the century and World War II, explicit data-collection procedures were used.
comprise the first generation. Investigators in Although the interviews were sometimes done
these studies tended to rely on key informants by psychiatrists and clinical psychologists and
and agency records to supply the information sometimes by lay interviewers, in all instances
that would enable them to identify cases. Such case identification depended on psychiatrists'
procedures tend to underestimate untreated evaluations of protocols compiled from the
cases of disorders that are characterized mainly interview responses and, sometimes, from an-
by subjective distress that would be more likely cillary data from key informants, official records,
to be revealed in direct interviews. Direct and interviewers' observations. The Midtown
interviews were used in only six of these studies. study and the Stirling County study (Srole et al.
Nevertheless, even in these six interview studies, 1962; Leigh ton et al. 1963) pioneered this
where rates tended to be higher than in studies approach, and some others adopted their pro-
using key informants and agency records, the cedures (e.g. Gillis et al. 1965; Rin et al. 1966;
median for all types of disorders combined was Shore et al. 1973). The resulting classifications
only 3-6 % as compared to a median of close to were made not in terms of diagnostic types but
20% in the second generation of studies con- rather in terms of ratings of 'caseness' and
ducted after World War II (Dohrenwend & 'impairment'.
Dohrenwend, 1974). The difference is a dramatic Even more economical than having clinicians
illustration of the effects of the tremendous rate protocols constructed from data collected
expansion of psychiatric nomenclatures follow- by lay interviewers is dispensing with clinical
ing World War II on rates of psychiatric judgements altogether and using objectively
disorders counted in community studies. The scored measures of psychopathology. A number
expansion itself reflected the experiences of the of the investigators in this second generation of
mental health professions with psychiatric screen- studies took this route. The objective measure
ing for selective service and with subsequent used most often is a 22-item screening instrument
psychiatric casualities in World War II (Raines, developed by Langner (1962) in the Midtown
1952). It marked the transition from the first to study on a purely actuarial basis to provide an
the second generation of epidemiological studies. approximation of their Mental Health Rating of
psychiatric impairment. A similar, although less
SECOND-GENERATION STUDIES widely used, measure consisting of 20 Health
Opinion Survey questions, was constructed by
Unlike the research workers in the first gener- the Stirling County study research workers as
ation of studies, most of the investigators in well (Macmillan, 1957). Both have as their core
more than 60 second-generation studies con- a portion of the items from the Psychosomatic
ducted after World War II and up to about 1980 Scale of the Neuropsychiatric Screening Ad-
relied on direct interviews with all subjects. Only junct, developed as an aid to Selective Service
rarely in these studies were the interviews screening during World War II (Star, 1950).
supplemented systematically by data from key There exists by now a small family of these
informants and official records, although such brief screening scales that appear highly similar
information is extremely useful for identifying in content and have been used in between a
or confirming some types of psychopathology quarter and a third of the second generation of
such as substance abuse and antisocial be- epidemiological studies to measure such things
haviour. Two different types of interview were as 'mental health', 'mental illness', 'psychiatric
used. disorders', 'emotional adjustment', 'symptoms
Problem of validity in field studies revisited 197

of stress' and ' psychophysiological symptoms' are major facets of what Frank (1973) has called
(Seiler, 1973, p. 257). 'demoralization'. In Frank's theoretical for-
mulation, as well as in relevant research that we
have reviewed with regard to the screening
PROBLEM OF VALIDITY IN FIRST- AND scales (Dohrenwend et al. 1979), this type of
SECOND-GENERATION STUDIES non-specific psychological distress is likely to
There is little evidence of any of the usual occur in response to a variety of predicaments:
types of validity in the first- and second- severe physical illnesses, especially those that are
generation studies. Content validity was pre- chronic; a build up of recent stressful life events;
cluded because there was little consensus at any attempts to cope with psychotic symptoms; and
of the times these studies were done about the being low in social class position. It is something
population of signs and symptoms to be like physical temperature in that you know
sampled. Different nomenclatures were used by something is wrong when it is elevated, but you
different investigators, and some investigators do not know what is wrong until you learn more
tended to bypass nomenclatures, substituting about the context. Thus, while these measures of
'caseness' (Leighton et al. 1963) and 'impair- non-specific distress that I prefer to call 'de-
ment' (Srole et al. 1962) ratings. Nor is the moralization ' are interesting in their own right,
picture much better for criterion-orientated or they are often very imperfectly related to
construct validity. Except for studies using the diagnosable mental disorders.
brief screening scales for case identification, no
attempts were made to test the ability of the
BEGINNINGS OF A THIRD GENERATION
diagnostic procedures to identify and classify
known cases of important types or to test Epidemiological studies are expensive and time
whether the main measures agreed with very consuming. It is unlikely that there have been
different measures of the same types of disorders. more than a dozen since around 1980. There is
There has been much more methodological lack of unanimity about the diagnostic pro-
research on the brief screening scales used in cedures to be used. However, the procedures
these studies. They show good internal con- have tended to be very different from those used
sistency (typically between 0-80 and 0-85) and in the first- and second- generation studies.
tend to correlate with each other as highly as There have been a number of developments in
their reliabilities permit (Link & Dohrenwend, psychiatry and related sciences here and abroad
1980). They are all measures of much the same that have changed the context of these studies.
thing. It is not readily apparent, however, from Concern with systematizing and refining di-
their content (symptoms of depressed mood, agnostic systems is no longer concentrated
anxiety, and psychophysiological disturbance) abroad, but, spear-headed by the Washington
what this is. They certainly do not, for example, University group (Robins & Guze, 1970; Wood-
contain symptoms of all varieties of 'mental ruff et al. 1974), it has spread in the US. Its
illness' or 'psychiatric disorders'; nor are they embodiment is DSM-III. Semi-structured di-
limited to ' psychophysiological symptoms'; nor agnostic interview and rating examinations such
do they exhaust the variety of stress reactions. as the PSE (Wing et al. 1974) and SADS
Moreover, while whatever they measure fre- (Endicott & Spitzer, 1978), developed for clinical
quently is accompanied by diagnosable mental research with patients and designed to be used
disorders, it occurs with at least equal frequency by skilled clinicians, have been adapted for
in the absence of such disorders (Link & epidemiological research. Psychometric instru-
Dohrenwend, 1980). It is intriguing to enquire, ments such as the SCL-90 (Derogatis, 1977) and
therefore, into what it is that they are in fact the GHQ (Goldberg, 1972), also developed for
measuring. clinical research and research with general
We have found that these brief screening practice patients, have been used in epi-
scales have an extremely high correlation with demiological studies. In addition, there have
measures of self-esteem, helplessness-hopeless- been attempts to build instruments specifically
ness, dread, anxiety, sadness, and confused for epidemiological studies. These include psy-
thinking (Dohrenwend et al. 19806) all of which chometric instruments such as the CES-D scale
198 B. P. Dohrenwend

(Center for Epidemiological Studies-Depression patient says 'no' to the standard question does not
scale) (Radloff, 1977) and the set of screening mean that the symptom should be rated as absent. All
scales from the Psychiatric Epidemiology Re- available cues, in behaviour and case record, and
search Interview (PERI) (Shrout et al. 1986). from all parts of the interview, should be used to
They include the most influential of all and the determine whether a particular line of examination
most directly related to the new DMS-III, the should proceed further.
NIMH Diagnostic Interview Schedule or DIS There has been very little investigation of the
(Robins et al. 1981); the DIS is a fully structured validity of semi-structured clinical interviews in
diagnostic interview designed to be administered third-generation epidemiological research. By
by lay interviewers. and large, they have been assumed to bring their
credentials with them from their development
with psychiatric patients and their use in clinical
PROBLEM OF VALIDITY IN THE research, even when administered by pre-
BEGINNINGS OF THE THIRD doctoral-level clinicians, as has been the case in
GENERATION some epidemiological studies (Weissman &
There has already been more research on the Myers, 1978; Vernon & Roberts, 1982). Such
validity of third-generation case identification validity is by no means assured infieldstudies of
procedures than on those used in the first and psychiatric disorders in general populations
second generation. This is, however, an instance where the conditions under which diagnoses are
of a little being a lot by comparison. I will start made are very different from those that obtain in
with the methodological research on semi- research with patient samples, and where the
structured clinical examinations, move on to the boundaries between normal and abnormal are
more recently developed psychometric screening an under-explored frontier.
scales, and then the DIS. Much of what I have learned about the matter
comes from two methodological studies. One is
Semi-structured clinical examination my own previous research with a DSM-II era
In the semi-structured clinical interview, main forerunner of SADS called the Psychiatric Status
reliance is placed on the experience and skill of Schedule (PSS) developed by Spitzer et al. (1970)
the clinician to reduce measurement error. A and used by my colleagues and me in New York
degree of structure is introduced to increase City (Dohrenwend et al. 1978); the second is a
reliability. Wing et al. (1974) describe the study by Wing and his colleagues (Wing et al.
interviewing and rating procedure for one of the 1978) with a shortened version of the PSE
most prominent of these semi-structured in- scored on their ' Index of Definition' to assess
struments, the PSE, as follows. whether a respondent is a case in whom more
detailed criteria of specific syndromes should be
Each of the items or symptoms is denned in greater or investigated. It is interesting to note that this
lesser detail (in a glossary of definitions of symptoms). type of case-non-case determination, used again
For most of the items or symptoms, a form of by Brown & Harris (1978), harks back to the
questioning is suggested, so that it would be possible Stirling County study 'caseness' rating (Leigh-
to carry out the whole of the interview without ton et al. 1963) and has much in common with
deviating from the schedule at all. In practice, this the Midtown Study mental health rating of
would never happen since no two interviewees are impairment as well (Srole et al. 1962).
alike and the examiner must be able to adapt his The PSS was an attempt to standardize
technique to the situation. The wording of each interviews of the kinds used for intake and
question depends on the answer to the previous diagnosis in clinical settings and was designed to
one.... provide DMS-II diagnoses of each subject
A symptom should not be rated as present simply
because the patient says 'yes'. A further description
through the application of a computer program
should be asked for, in the patient's own words, and called DIAGNO (Spitzer & Endicott, 1968).
further specific questions asked as necessary. Fol- The PSS consists of fixed questions, many of
lowing this process of cross examination... the exam- them open-ended, together with suggested
iner should make up his own mind as to how the probes. The actual responses to these questions
symptom should be rated. Similarly, the fact that the and probes, however, are not recorded. Rather,
Problem of validity in field studies revisited 199

they form the basis for judgements by the the basis of research using the PSS with
interviewer as to whether each of the several psychiatric patients, tended to grossly over
hundred carefully described symptoms is 'true' diagnose 'schizophrenia' in the community
or 'false' of the subject. These clinical judge- sample that we studied; the errors were most
ments then become the basic data resulting likely to occur among respondents who were
from the interview. black or Puerto Rican (Dohrenwend et al.
A number of years ago, we chose to investigate 1980 c). We found earlier that scores on the PSS
the PSS on grounds that it was likely to have can be misleading about rates of mental disorder
much in common with the less explicit and less in different social classes (Dohrenwend et al.
reproducible types of clinical interviews used by 1971).
a number of first- and second-generation epi- To illustrate, DIAGNO diagnosed 10 of the
demiological investigators, especially those work- 133 community sample respondents as currently
ing in Europe and Asia rather than in North being schizophrenic. This rate of 7-5% is far
America (e.g. Bash, 1967; Hagnell, 1966; Lin, higher than average prevalence rates of under
1953). It is similar in type to the PSE and SADS, 1 % reported for a similar period of time in
which we noted above have only recently been samples of adults from the general population.
used with samples from the general population. It is also considerably higher than rates of 4-5 %
Like these instruments it was developed on the diagnosed by the psychiatrists who interviewed
assumption that its users had clinical experience the respondents and 3 % by a psychiatrist who
and would undergo intensive training in making independently reviewed the transcripts of the
the clinical judgements required. Thus, the interviews. Even the last rate of 3 % seems high
interviewer of choice with such instruments is an and may reflect the use of relatively broad pre-
experienced psychiatrist, clinical psychologist, DSM-III definitions of schizophrenia. In any
or psychiatric social worker. In our own re- case, of the ten DIAGNO schizophrenics only
search, the interviewers who used it were three were also diagnosed as schizophrenic by
psychiatrists. psychiatrists who interviewed them, and only
Fortunately, the items in the PSS, unlike the two of these three were diagnosed as schizo-
PSE and SADS, were not contingent on each phrenic by both the psychiatrists who inter-
other. It is thus possible to test their internal viewed them and a second psychiatrist who
consistency reliabilities and make direct com- independently reviewed the transcripts of the
parisons on this basis with measures from such tape recordings of the original interviews. More-
self-report interviews as PERI. In our research, over, the psychiatrists who interviewed the
we were able to test a large number of PSS scales community sample subjects diagnosed three
- those developed by its authors as well as our respondents as schizophrenic who were not
own a priori symptom groups - on a small classified as schizophrenic by DIAGNO, and
sample from the general population as well as on two of these were independently confirmed by
a sample of psychiatric patients (Dohrenwend the second psychiatrist who reviewed the tran-
et al. 1978). The most striking finding was the scripts. If we take as the most conservative
contrast in internal consistency reliabilities of identification of schizophrenia those instances
the scales for psychiatric patient and non-patient where there was a consensus between the
samples. For example, we were able to replicate psychiatrist who interviewed the subject and the
the findings of Spitzer and his colleagues (Spitzer psychiatrist who reviewed the transcript, then
et al. 1970) that a large number of PSS scales DIAGNO converged with this clinical consensus
they developed showed good internal consist- in only two of its ten schizophrenic diagnoses.
ency reliability in psychiatric patients; we found, I believe that these problems of reliability and
however, that most of the same PSS scales validity of the PSS for use in the general
proved unreliable in the general population. population are not specific to this instrument
This lack of internal consistency reliability of but extend to other instruments modelled on the
the PSS scales in samples from the general clinical examination and developed primarily on
population is accompanied by problems of the basis of research with psychiatric patients.
validity. For example, we found that the Consider in this regard some results reported by
computer program, DIAGNO, developed on Wing and his colleagues (Wing et al. 1978). They
200 B. P. Dohrenwend

come from a study in which the Present State psychiatric out-patients did not have diag-
Examination (PSE) was used by trained psy- nosable disorders (Link & Dohrenwend, 1980).
chiatrists to investigate mental disorders among Later studies with the more recently developed
women sampled from the general population of CES-D also show very imperfect correspondence
a district in London. Like the PSS, the PSE data between the scale cut-offs and diagnosable
can also be reduced by a computer program, this disorder (Myers & Weissman, 1980, Roberts &
one called 'CATEGO' (Wing et al. 1974). The Vernon, 1983; Breslau, 1985). Thus, while these
results of the PSE for each subject are sum- scales are brief, easily administered, and highly
marized first in terms of an Index of Definition reliable in contrasting sex, class and ethnic
and, if the subject is above a cut-off on this groups, they do not converge closely with
index, into one or more psychiatric syndromes diagnoses based on clinical interviews and
that correspond to diagnostic groupings of the ratings.
mental disorders contained in the International However, we have shown that it is possible to
Classification of Diseases (ICD-9). Of the 123 develop symptom scales that measure not only
women in this sample, 22 were cases of 'de- non-specific psychological distress, but also a
pressive disorders' on the basis of their identi- variety of other meaningful dimensions of
fication as being above the threshold on the psychopathology. These are the symptom scales
Index of Definition and their categorization as in the Psychiatric Epidemiology Research In-
depressive disorders by CATEGO. However, terview (PERI) (Dohrenwend et al. 19806). We
when Wing and his colleagues (Wing et al. 1978) have also shown that subsets of seven or eight of
examined the PSE scores in terms of widely used these scales can discriminate cases from non-
criteria for depressive disorders developed by cases with much higher sensitivity and specificity
Feighner et al. (1972) for this sample and for a than a unidimensional scale of non-specific
sample of in-patients and out-patients, the distress (Shrout et al. 1986). However, the scales
results were as follows. are not very precise in screening individual cases
of particular disorders (Dohrenwend et al. 1986).
One of the 22 'depressive disorders' in the general
The best they can do so far as individual
population series meets the standard, while two are
probable. On the other hand, 16 of the 23 above disorders are concerned is isolate subsamples
threshold depressive disorders found in the in-patient with much higher rates than the general popu-
series are definite and three are probable, while one lation sample as a whole. Thus, while very
patient with severe depressive retardation could not economical of time and money to administer
be rated on the subjective symptoms. Of the 14 above (about 15 minutes for all the items in seven or
threshold depressive disorders in the out-patient eight scales by a lay interviewer or even in self-
series, seven are definite, five are probable, and two administered format if comprehension and mo-
show only three of the Feighner et al. (1972) criteria tivation can be assumed) and very reliable in
(Wing et al. 1978, p. 213). different gender, class and ethnic groups, the
So far as the depressive disorders are concerned, symptom scales cannot provide precise rates of
particular disorders in the general population.
the PSE and its Index of Definition and
CATEGO system of case identification and
classification are clearly not measuring the same The Diagnostic Interview Schedule (DIS)
thing in this general population sample as they The DIS was developed as ' a response to the
are in samples of psychiatric patients. desire to have an instrument that will, as closely
as possible, replicate a psychiatrist's diagnoses
Psychometric screening scales for situations when the use of psychiatrists is
It has been evident for some time that the impractical or impossible' (Helzer et al. 1985, p.
unidimensional screening scales of non-specific 666). Large-scale epidemiological studies are
distress developed in the second generation of assumed to be such situations by its developers.
epidemiological studies are very imperfectly The DIS is not a psychometric measure, nor is it
related to clinical psychiatric disorders. Results a clinical examination. It is a fully structured
that Bruce Link and I analysed from three interview administered by lay interviewers and
studies showed, for example, that at least half of designed to provide current and lifetime diag-
those registering distress as severe as that of noses of many DSM-III disorders, with adap-
Problem of validity in field studies revisited 201

tations that make it relevant to other nomen- report the investigators present kappas for
clatures as well. Unlike the other instruments agreement (Cohen, 1960) between the initial
that have been used for individual investigations, DIS and the follow-ups based, for most findings,
the DIS has been used in the EC A collaborative on subsample data suitably weighted to represent
programme which, in terms of number of settings the population sampled. For the Baltimore study
and cumulative sample sizes, is the largest reported by Anthony et al (1985), only the
undertaking at any time or place in psychiatric kappas based on weighted data are provided. In
epidemiology to date (Regier et al. 1984). the St Louis study reported by Helzer et al.
Moreover, the influence of the DIS has spread. (1985), kappas for both weighted data and
Now, translated into many languages, it is unweighted data are given for most of the
undoubtedly the most widely used procedure for results. The kappas for the unweighted data are
case identification and classification in psy- usually somewhat higher due to the over-
chiatric epidemiology in the US, and it is being sampling of cases and consequent higher rates of
adopted by a number of investigators abroad. disorders.
More research has been done on the validity In my discussion of the two studies, I will refer
of the DIS than on the validity of the semi- for the most part to the weighted results, using
structured clinical interviews used in third- unweighted results only when weighted data are
generation studies. Conducted concurrently with not provided, as is the case with some of the St
or following the ECA studies, these method- Louisfindings.Given the differing designs of the
ological investigations have usually taken the follow-up subsamples in the two studies, the
form of diagnostic follow-up interviews. A weighted results are more comparable. In ad-
number have focused on patient samples. How- dition, the weighted results are the only ones
ever, the most important of the investigations that can be generalized to the populations
have been done with general population samples. studied.
Some of these checks have involved closely In the Helzer (Helzer et al. 1985) study in St
spaced test-retest designs which varied the type Louis, the subsample of respondents with vari-
of interview and/or the type of interviewer. For ous types of DIS lifetime diagnoses (including
example, in a study reported by Helzer and his no disorder) based on lay interviews in the St
colleagues (1985) of a subsample of ECA Louis ECA site were re-interviewed within a few
respondents in the St Louis site, psychiatrists weeks to a few months after the initial interview.
using the DIS and a DSM-III checklist were The second interview was done by psychiatrists
compared to lay interviewers using the DIS using the DIS. However, after they made a DIS-
(Helzer et al. 1985). This study contrasts with a alone diagnosis, they also made a diagnosis
study reported by Anthony and his colleagues in based on a DSM-III checklist that could be
which the follow-up of the lay interview DIS based on additional questions and observations.
with a subsample of respondents in the Baltimore The first finding to note is that the psychiatrists
site was done by psychiatrists using a semi- agreed quite well with themselves, with a kappa
structured clinical interview and other infor- of 0-73 with unweighted data for diagnoses
mation (Anthony et al. 1985). Other studies combined versus none (kappas for weighted data
have been one year or more follow-ups with lay were not provided for all versus none com-
administered DIS interviews that permit checks parisons) and over 060 with weighted data for
on the accuracy of diagnoses of past disorders, most individual diagnoses. For example, the
essential to the DIS goal of estimating lifetime as kappa for major depression was 0-70 with
well as current prevalence (Pulver & Carpenter, weighted data. When the psychiatrists' DIS
1983; Anthony & Dryman, 1987). diagnoses were compared with the initial lay
The studies reported by Helzer et al. (1985) interviewers' diagnoses, agreement was con-
and Anthony et al. (1985) are particularly siderably less although overall agreement for
instructive. In these studies, the subsamples of any lifetime diagnosis versus no disorder re-
ECA respondents designated for follow-up after mained reasonably good, as indicated by a
the initial DIS lay interviews were drawn to kappa of 063 with unweighted data. It decreased
overrepresent cases. Although the subsample only slightly, to 0-59 with unweighted data,
selection procedures were different, in each when the initial DIS diagnoses by lay inter-
202 B. P. Dohrenwend

viewers were compared with the checklist diag- ination, especially with the added problems of
noses these psychiatrists were permitted to make differing approaches to defining one month
following the psychiatrists' DIS interviews. The prevalence and a time lag of three weeks to three
kappas for individual diagnoses based on un- months for the clinical follow-up after the initial
weighted data tend to be much lower and, with interview.
weighted data, satisfactory to good only for By the same logic, however, it would have
alcohol abuse, drug abuse and, possibly, anti- been very reasonable to expect the Helzer study
social personality. For major depression, with to show strong convergence for lifetime diag-
weighted data, for example, they are only 033 noses in re-tests done with the DIS itself which,
when the DIS is compared with itself, and only although the type of interviewer was varied, is
0-28 when the checklist diagnosis is substituted more a test of reliability than validity. If the goal
for the DIS diagnosis. Kappas for most of the of the DIS, as Robins and colleagues (1981)
remaining disorders are even lower. have stated it, is to 'enable lay interviewers to
In the study by Anthony and his colleagues obtain psychiatric diagnoses comparable to
(1985), previous-month prevalence on DIS- those a psychiatrist would obtain' (Robins et al.
DSM-III lay diagnoses were compared with 1981, p. 386), then these results from the St
previous-month prevalence diagnoses made on Louis and Baltimore ECA studies cannot be
the basis of clinical examinations of a subsample considered reassuring. Where the problem lies,
of ECA cases from the Baltimore site. Two- however, and what should be done about it, are
thirds of the follow-up examinations took place other matters.
within three weeks of the initial DIS interviews, Robins, I think, is having second thoughts
75% within four weeks, and 93% within 90 about the appropriateness of the goal itself as
days. The results indicate that agreement is described above. She has argued in a remarkable
considerably worse than in the study by Helzer paper that the tests so far conducted are flawed,
et al. (1985). Here, for example, alcohol abuse that we cannot assess the accuracy of the lay-
again does relatively well, but the kappa is only administered DIS with a test-retest design using
0-35 in this comparison based on weighted data. clinician examiners (Robins, 1985). She gives
Robins has pointed out (personal communi- three reasons:
cation of 31 May 1988) that a positive diagnosis The research diagnosis by the clinician is not
according to the clinical examination used in a gold standard;
Baltimore requires meeting full criteria for a There are problems of time-gap or order
particular disorder in the last month; by effects in the design;
contrast, the DIS definition of one-month Available statistical methods for testing ac-
prevalence requires that the criteria have been curacy are inadequate (the base-rate problem).
met in one's lifetime and that there was at least
one relevant symptom in the last month. These Note that a test using the more appropriate
definitions are different, and it is hard to know SCID would not solve any of these problems.
how much they would overlap in practice if the Robins argues further that we do not need all
same instrument were used to operationalize that much accuracy for two of the important
each definition. Moreover, no methodological purposes of epidemiological studies, estimating
study has yet been conducted with a semi- prevalence and investigating correlates of dis-
structured research diagnostic interview de- orders in risk-factor studies. She notes that
signed for making DSM-III diagnoses-an analyses of the St Louis data show that the
interview such as the SCID which is only now discrepancies are most frequent for respondents
being developed by Spitzer et al. (1987). The whose DIS scores fall just short of meeting
PSE portion of the examination described by criteria or just barely meet the criteria - that is,
Anthony et al. (1985) that was used to cross- are close to the cut point between the presence
check the DIS was designed for making diag- and absence of the disorder-and that these
noses other than DSM-III, and had to be errors tend to be balanced among false positives
adapted for that purpose. It may have been too and false negatives. She points out that unbiased
much to expect that DIS diagnoses would estimates of rates require that there be equal
converge well with this particular clinical exam- numbers of false positive and false negative
Problem of validity in field studies revisited 203

cases so that they cancel each other out. When


rates are low, as with most disorders, and you MULTI-METHODS APPROACHES
have good specificity, a modest sensitivity can There are two very different ways of handling
bring you nearer this goal than a very high measurement error in interview approaches. One
sensitivity, as she illustrates. involves psychometric theory and method to
However, there is no assurance that error with develop and evaluate strong threads of truth in
an instrument such as the DIS is itself randomly scales composed of self-report items which, taken
distributed among subgroups of the population individually, are error prone. The second in-
defined by such factors as age, sex and class. volves cross-examination by expert clinicians
Perhaps most sobering in this regard is a and application of their clinical judgement to
comparison of initial DIS lifetime diagnoses rating signs and symptoms. Each of these
made in four ECA sites with one-year follow-up procedures was used in rather primitive forms in
diagnoses made again by lay interviewers using some of the first- and second-generation studies.
the DIS. In analysing these data, Anthony & As I reported earlier, the two approaches have
Dryman (1987) defined as a discrepancy a different strengths and weaknesses even in their
positive lifetime diagnosis for a particular more sophisticated forms. For example, an
disorder at Time 1 that disappeared altogether instrument like the Psychiatric Epidemiology
at Time 2. On the basis of this measure, 69% of Research Interview (PERI), which is in the
baseline cases were discrepant (2456 out of 3572 psychometric tradition and was developed
cases). For example, 61 % or 322 out of 529 through research with samples from the general
respondents who had a lifetime DIS diagnosis of population, can yield scales with high internal
major depression at baseline did not have a consistency reliabilities in contrasting subgroups
lifetime diagnosis of major depression on the and cover a wide variety of dimensions of
basis of the one-year follow-up DIS. psychopathology; but PERI does not provide
Some of the marked discrepancies are due to psychiatric diagnoses on an individual basis. By
the decreased reporting of positive diagnoses in contrast, we have diagnostic interviews such as
general in the follow-up interview. This often the Psychiatric Status Schedule (PSS), SADS,
occurs in repeated interviews over time with and the Present State Examination (PSE) that
instruments of imperfect reliability and has been were developed with psychiatric patients and
described as regression effects. The tendency yield reliable diagnoses for such patients but are
appears to be particularly marked with the lay- unlikely to provide reliable measures over the
in'terview-administered DIS; it did not occur in whole rate of various dimensions of psycho-
the test-retest designs used in St Louis and pathology in samples from the general popu-
Baltimore, where some diagnoses were more lation, and can yield misleading results in such
frequent and some less frequent upon follow-up. samples (Dohrenwend et al. 1971; 1978; 1980 c).
More important with regard to the present issue, A possible reason is that the expertise of the
however, is the fact that the discrepancies are clinician that helps reduce measurement error in
not randomly distributed according to age, the use of such instruments as the PSS and PSE
gender, education, and ethnic status. As Robins with patients is more limited than has been
(1985) pointed out, if you are interested in recognized. It may not extend to the full range
studying risk factors for particular types of and variety of symptomatology that are found
disorder, you need high sensitivity to assure an in groups from contrasting social and cultural
adequate sample of cases. There is, it seems to background in the general population.
me, no way around the need for highly accurate There are strategies, however, in which the
measures that can do the job of unbiased strengths and weaknesses of the two approaches
measurement in the diverse social and cultural can be used symbiotically to complement and
groups that make up the US and many other cross-check each other. For example, a self-
modern societies. In the absence of biological report interview like the Psychiatric Epidemi-
tests and under the circumstances where inter- ology Research Interview (PERI), based on a
views and observations of behaviour remain the psychometric approach to measuring di-
tools available to us, how do we obtain such mensions of psychopathology, can be used to
measures? economically screen samples from the general
204 B. P. Dohrenwend

population. Such screening can be designed to Register that would make it possible to draw
yield subsamples of individuals with various samples of birth cohorts from such ethnic
types of severe symptomatology. Individuals groups.
screened by high scores on the screening scale Within this setting, we have focused on a full
can then be followed up in a second stage of the probability sample of about 5500 Israelis born
research and interviewed by experienced clin- in Israel between 1949, just after it became a
icians with diagnostic instruments like the state, and 1958. The goal was to contrast Israelis
Schedule for Affective Disorder and Schizo- of European background with Israelis of North
phrenia (SADS) or the Present State Exam- African background, when both were born in
ination (PSE) to provide rates for particular Israel during the same period. Our aim has been
types of disorder. Results with the Psychiatric to identify and define cases of schizophrenia,
Epidemiology Research Interview (PERI) from major depression, substance abuse, antisocial
a study in Israel lend credence to the possibility personality, and severe nonspecific psychological
that it could with no more than seven of the 25 distress or 'demoralization'. The first step was
symptom scales, with a total of only 73 items, to draw a random sample of 19000 Israel born
perform the first-stage screening function (Doh- adults in the desired age range from the
renwend et al. 1986). Such a two-stage procedure Population Register. Demographic prescreening
capitalizes on the ability of a psychometric of 98% of those 19000 was completed to obtain
instrument to provide reliable measurement over information that would permit appropriate
the full range of important dimensions of stratification into the approximately 5500 mem-
psychopathology and on the ability of a clinical ber study sample on the basis of gender,
examination to provide reliable diagnoses in educational level, and ethnic background. Once
groups where the types of symptomatology selected into the stratified sample, respondents
involved are not rare. were given screening scales from the Psychiatric
While the potential practical advantages of Epidemiology Research Interview (PERI)
two-stage procedures such as this have long (Shrout et al. 1986; Dohrenwend et al. 1986).
been evident (cf. Cooper & Morgan, 1973), there Developed in the US, these had been re-
have been only a handful of systematic attempts calibrated in a previous pilot study in Israel
to use them for case identification and classi- (Shrout et al. 1986). Excluding the respondents
fication in psychiatric epidemiology. Duncan- who died or who are abroad who are being
Jones & Henderson (1978) have speculated that studied separately, we obtained the relevant
use of two-stage procedures is so rare because of PERI screening data from 93-3 % of our cohort
fear of loss of respondents between the first sample, as Fig. 1 shows.
screening stage and the follow-up diagnostic All of the screened positives (over 40 % of the
interview. They themselves found, however, that sample) and a subsample of over 15% of the
with careful planning, they were able to conduct negatives were given follow-up interviews by
interviews with 91 % of the respondents desig- psychiatrists trained to administer a modi-
nated for follow-up on the basis of initial fication of the shorter version (SADS-L) of the
screening. My colleagues and I are using this Schedule for Affective Disorders and Schizo-
type of two-stage approach in an epidemio- phrenia (SADS) (Endicott & Spitzer, 1978) and
logical study in Israel. Let me summarize the to make diagnoses according to Research Di-
field operation in Israel to provide a further agnostic Criteria (RDC) (Spitzer et al. 1978).
example. Since this instrument was modified to provide
Our choice of Israel as the setting for the more introductory history and to permit the
epidemiological research was based mainly on dating of onsets of episodes, we call it SADS-I
two considerations. First, we needed an open- (for Israel). The 40 or so psychiatrists involved
class, highly stratified urban society that con- in the research were intensively trained by Itzhak
tained a set of advantaged and disadvantaged Levav, who was himself trained at the NY State
ethnic groups to test theoretical issues having to Psychiatric Institute where SADS was devel-
do with class distributions of various types of oped. Their diagnostic interviews were tape
disorder (Dohrenwend & Dohrenwend, 1981). recorded, permitting extensive quality and re-
Second, we needed a place with a Population liability checks. Our completion rate for this
Problem of validity in field studies revisited 205

Activity Notes

Demographic prescreening of
19 000 persons sampled from Completion rate 98 %
Population Register

Stratified probability sample

PERI symptom scales and


anamnestic items to 4910 Completion rate 93-3 %
persons

All positives on PERI screen


plus 15 % + of negatives

SADS-I diagnostic interview


by psychiatrists with 2 752 Completion rate 94-1 %
persons

Definition of case groups:


Schizophrenia,
Definition Major depression,
of well Substance abuse,
control Antisocial personality,
group Demoralization.

Risk factor interview with


559 cases and controls Completion rate 91-2 %
defined above

Screening of at least one


sibling of cases and
Completion rate 80-1 %
controls with adult siblings

FIG. 1. Nature and status of field operations in Israel research.

diagnostic follow-up is 94-1 %, as can be seen in study of risk factors for various types of
Fig. 1. disorder; most of the cases were chosen to have
From these diagnosed respondents, it was had a recent onset of the disorder. From those
possible to select and re-interview a subsample persons who were not screened as possible cases
of 361 persons for an intensive case/control in the first stage, a stratified random subsample
206 B. P. Dohrenwend

of about 300 has also been chosen for diagnostic


follow-up. This follow-up of screened negatives
serves two purposes: (1) it provides a check on Stage 1
the number of false negatives from the first-stage
screening interview; and (2) it provides a sample
of wells for use in the case/control study. As
Fig. 1 shows, completion rates were again ex-
cellent. It has also been possible to screen at least Stage 2
one sibling of the majority of cases and controls.
The two-stage procedure that we used in this
research is, however, an economy. It would be
best achieved following methodological research
employing a multi-method strategy in which
both types of interview were conducted with all Stage 3
] Method 3 j • Method 3 ,
subjects along, perhaps, with a third method
based on reports from family members or other
key informants. In such a multi-method in-
7T
Case Non-case Case
7T Non-case
vestigation, there would be two means of True v
False ^ False v True
establishing validity. positive positive negative negative
One would be by testing the convergence of
the three methods. This is illustrated in Fig. 2.
The first method might consist of screening
scales with the screened positives followed up at FIG. 2. Flow chart for a multi-stage-multi-method procedure for case
identification and classification. (From Dohrenwend, B. P. &
Stage 2 by a clinical examination. When there is Dohrenwend, B. S. (1982). Perspectives on the past and future of
a divergence, categorization by a third method psychiatric epidemiology. American Journal of Public Health 72,
1271-1279.)
- perhaps informant reports - would be sought.
The approach would be similar for screened
negatives. 'Truth' in Fig. 2 would be defined as for designing more economical two-stage pro-
a convergence across at least two out of three cedures.
different methods of measuring the same thing The validity achieved by such multi-method
- in the instance above, the diagnoses or other procedures would be relative to the validity of
classifications dictated by the nomenclature or the particular diagnostic or classification system
other theoretical constructs being used. Other that dictates how the data on symptoms and
things equal, the majority would rule. signs are to be combined. Further tests of the
However, other things are not likely to be validity of each of these systems are in order; as
equal in a situation where the different methods advocated by Robins & Guze (1970), these
have different strengths and weaknesses. Thus, a could include longitudinal studies to test stability
second procedure would be to establish, not a of particular classes or types of psychopath-
gold standard of validity but rather what has ology, family studies to test whether the dis-
been called by Leckman et al. (1982) a 'best orders 'breed true' as a function of genetic
estimate' diagnosis in which the information and/or environmental transmission, and out-
from all three methods is assessed by two or come studies of the effectiveness of therapeutic
more experts to arrive independently and then trials.
by consensus at a criterion diagnosis. It is Such tests can help those of us involved in
something like the LEAD (Longitudinal, Expert, epidemiological enterprises to use the more
All Data) standard formulated by Spitzer (1983) promising classification systems and even con-
for testing SCID, and would be better with the tribute to their selection. Whatever system is
longitudinal component Spitzer requires. The selected for a particular epidemiological study,
relative contributions of the separate methods to however, epidemiologists will face the problem
these best-estimate diagnoses can then be as- of how to validly classify people according to it.
sessed for different diagnostic types and in Perhaps some day there will be biological
different subgroups of the population as a basis markers to provide gold standards. Meanwhile,
Problem of validity in field studies revisited 207

I shall be quite happy to settle for a combination Dohrenwend, B. P., Dohrenwend, B. S., Gould, M. S., Link, B.,
Neugebauer, R. & Wunsch-Hitzig, R. (1980a). Mental Illness in the
of the democratic rule of the majority among United States: Epidemiologic Estimates. Praeger: New York.
methods and/or the authoritarian LEAD where Dohrenwend, B. P., Shrout, P. E., Egri, G. & Mendelsohn, F. S.
a divergent minority method is too strong to be (1980 A). Nonspecific psychological distress and other dimensions
of psychopathology: measures for use in the general population.
ignored. Archives of General Psychiatry 37, 1229-1236.
Dohrenwend, B. P., Yager, T. J., Egri, G. & Mendelsohn, F. S.
This work was supported by grants K05-MH14663 (1980 c). Some problems of validity with the Psychiatric Status
Schedule as an instrument for case identification and classification
and MH30710 from the US National Institute of in the general population (letter to the editor). Archives of General
Mental Health. Psychiatry 37, 720-721.
Dohrenwend, B. P., Levav, I. & Shrout, P. E. (1986). Screening scales
from the Psychiatric Epidemiology Research Interview (PERI). In
Community Surveys of Psychiatric Disorders, (ed. M. M. Weissman,
REFERENCES J. K. Myers and C. Ross), pp. 349-375. Brunswick, N. J.: Rutgers
Anthony, J. C. & Dryman, A. (1987). Analysis of Discrepancy in University Press: New Brunswick, NJ.
lifetime diagnosis of mental disorders: results from the NIMH Duncan-Jones, P. & Henderson, S. (1978). The use of a two-phase
epidemiologic catchment area program. Presented at the September design in a prevalence survey. Social Psychiatry 13, 231-237.
1987 meeting of the World Psychiatric Association; Section on Endicott, J. & Spitzer, R. L. (1978). A diagnostic interview: The
Epidemiology and Community Psychiatry, Reykjavik: Iceland. schedule for affective disorders and schizophrenia. Archives of
Anthony, J. C , Folstein, M., Romanoski, A. J., Von Korff, M. R., General Psychiatry 35, 837-844.
Nestadt, G. R., Chahal, R., Merchant, A., Hendricks Brown, C , Feighner, J. P., Robins, E., Guze, S. B., Woodruff, R. A., Winokur,
Shapiro, S., Kramer, M. & Gruenberg, E. (1985). Comparison of G. & Munoz, R. (1972). Diagnostic criteria for use in psychiatric
the lay diagnostic interview schedule and a standardized psychiatric research. Archives of General Psychiatry 26, 57-63.
diagnosis: experience in Eastern Baltimore. Archives of General Frank, J. D. (1973). Persuasion and Healing. Johns Hopkins
Psychiatry 42, 667-675. University Press: Baltimore, MD.
Bash, K. W. (1967). Untersuchungen uber die Epidemiologie Gillis, L. S., Lewis, J. B. & Slabbert, M. (1965). Psychiatric
neuropsychiatrischer Erkrankungen unter der Landbevolkerung Disturbance and Alcoholism in the Coloured People of the Cape
der Provinz Fars, Iran. Akluelle Fragen der Psychialrie und Peninsula. University of Cape Town Department of Psychiatry:
Neurologie 5, 162-178 Cape Town.
Breslau, N. (1985). Depressive symptoms, major depression and Goldberg, D. P. (1972). The Detection of Psychiatric Illness by
generalized anxiety: a comparison of self-reports on CES-D and Questionnaire. Oxford University Press: London.
results from diagnostic interviews. Psychiatry Research 15, Hagnell, O. (1966). A Prospective Study of the Incidence of Mental
219-229. Disorder. Svenska Bokforlaget Norstedts-Bonniers: Stockholm.
Brown, G. W. & Harris, T. (1978). Social Origins of Depression. Free Helzer, J. E., Robins, L. N., McEnvoy, L. T., Spitznagel, E. L.,
Press: New York. Stoltzman, R. K., Farmer, A. & Brockington, I. F. (1985).
Cohen, J. (I960). A coefficient of agreement for nominal scales. A comparison of clinical and diagnostic interview schedule
Educational Psychological Measurement 20, 37-46. diagnoses: physician reexamination of lay-interviewed cases in the
Cooper, B. & Morgan, H. G. (1973). Epidemiological Psychiatry. general population. Archives of General Psychiatry 42, 657 666.
C. C. Thomas: Springfield, IL. Langner, T. S. (1962). A twenty-two item screening score of
Derogatis, L. R. (1977). SCL-90R. (revised) Manual I. Clinical psychiatric symptoms indicating impairment. Journal of Health
Psychometrics Research Unit, Johns Hopkins University School of and Human Behaviour 3, 269-276.
Medicine: Baltimore, MD. Leckman, J. F., Sholomskas, D., Thompson, D. W., Belanger, A. &
Dohrenwend, B. P. & Dohrenwend, B. S. (1965). The problem of Weissman, M. M. (1982). Best estimate of lifetime psychiatric
validity in field studies of psychological disorder. Journal of diagnosis: a methodological study. Archives of General Psychiatry
Abnormal Psychology 70, 52-69. 39, 879-883.
Dohrenwend, B. P. & Dohrenwend, B. S. (1974). Social and cultural Leighton, D. C , Harding, J. S., Macklin, D. B., Macmillan, A. M.
influences on psychopathology. Annual Review of Psychology 25, & Leighton, A. H. (1963). The Character of Danger. Basic Books:
417-452. New York.
Dohrenwend, B. P. & Dohrenwend, B. S. (1981). Socioenvironmental Lin, T. (1953). A study of the incidence of mental disorder in Chinese
factors, stress, and psychopathology - Part 1: Quasi-experimental and other cultures. Psychiatry 16, 313-336.
evidence on the social causation-social selection issue posed by Link, B. & Dohrenwend, B. P. (1980). Formulation of hypotheses
class differences. American Journal of Community Psychology 9, about the true prevalence of demoralization in the United States.
146-159. In Mental Illness in the United States: Epidemiological Estimates
Dohrenwend, B. P. & Dohrenwend, B. S. (1982). Perspectives on the (B. P. Dohrenwend, B. S. Dohrenwend, M. S. Gould, B. Link,
past and future of psychiatric epidemiology. American Journal of R. Neugebauer, and R. Wunsch-Hitzig), pp. 114-132. Praeger:
Public Health 72, 1271-1279. New York.
Dohrenwend, B. P., Egri, G. & Mendelsohn, F. S. (1971). Psychiatric Macmillan, A. M. (1957). The health opinion survey: technique for
disorder in general populations: a study of the problem of clinical estimating prevalence of psychoneurolic and related types of
judgment. American Journal of Psychiatry 127, 1304-1312. disorder in communities. Psychological Reports 3, 325-329.
Dohrenwend, B. P., Yager, T. J., Egri, G. & Mendelsohn, F. S. Myers, J. K. & Weissman, M. M. (1980). Use of a self-report
(1978). The psychiatric status schedule (PSS) as a measure of symptom scale to detect the depressive syndrome. American
dimensions of psychopathology in the general population. Archives Journal of Psychiatry 137, 1081-1084.
of General Psychiatry 35, 731-739. Pulver, A. E. & Carpenter, W. T. (1983). Lifetime psychotic
Dohrenwend, B. P., Oksenberg, L., Shrout, P. E., Dohrenwend, B. S. symptoms assessed with the DIS. Schizophrenia Bulletin 9,377-382.
& Cook, D. (1979). What brief psychiatric screening scales Radloff, L. S. (1977). The CES-D scale: A self-report depression
measure. In Health Survey Research Methods: Third Biennial scale for research in the general population. Applied Psychological
Research Conference, pp. 188-198. National Center for Health Measurement 1, 385-401.
Services Research. US Department of Health and Human Services, Raines, G. N. (1952). Foreword in Committee on Nomenclature and
DHHS pub. no. (PHS) 81-3268: Washington, D.C. Statistics of the American Psychiatric Association. In Diagnostic
208 B. P. Dohrenwend

and Statistical Manual; Mental Disorders. American Psychiatric Spitzer, R. L., Endicott, J., Fleiss, J. L. & Cohen, J. (1970). The
Association, pp. V-XI. Washington, DC. psychiatric status schedule: a technique for evaluating psycho-
Regier, D. A., Myers, J. K., Kramer, M., Robins, L. N., Blazer, pathology and impairment in role functioning. Archives of General
D. G., Hough, R. L., Eaton, W. W. & Locke, B. Z. (1984). The Psychiatry 23, 41-55.
NIMH epidemiologic catchment area program. Archives of General Spitzer, R. L., Endicott, J. & Robins, E. (1978). Research diagnostic
Psychiatry 41, 934-941. criteria: rationale and reliability. Archives of General Psychiatry
Rin, H., Chu, H. & Lin, T. (1966). Psychological reactions of a rural 35, 773-782.
and suburban population in Taiwan. Ada Psychiatrica Scan- Spitzer, R. L., Williams, J. B. W., Gibbon, M. & First, M. (1987).
dinavica 42, 410-470. Structured Clinical Interview for DSM-III-R (SCID). Biometrics
Roberts, R. E. & Vernon, S. W. (1983). The center for epidemiologic Research Department, New York State Psychiatric Institute: New
studies depression scale: its use in a community sample. American York.
Journal of Psychiatry 140, 41-46. Srole, L., Langner, T. S., Michael, S. T., Opler, M. K. & Rennie,
Robins, E. & Guze, S. B. (1970). Establishment of diagnostic validity T. A. C. (1962). Mental Health in the Metropolis. McGraw-Hill:
in psychiatric illness: Its application to schizophrenia. American New York.
Journal of Psychiatry 126, 983-987. Star, S. A. (1950). The screening of psychoneurotics in the army:
Robins, L. N. (1985). Epidemiology: reflections on testing the technical developments of tests. In Measurement and Prediction
validity of psychiatric interviews. Archives of General Psychiatry (ed. S. A. Stouffer, L. Guttman, E. A. Suchman, P. F. Lazarsfeld,
42, 918-924. S. A. Star & J. A. Clausen), vol. 4, pp. 486-547. Princeton
Robins, L. N., Helzer, J. E., Croughan, J. & Ratcliff, K. S. (1981). University Press: Princeton.
National Institute of Mental Health Diagnostic Interview Sched- Vernon, S. V. & Roberts, R. E. (1982). Use of the SADS-RDC in a
ule : Its history, characteristics, and validity. Archives of General tri-ethnic community sample. Archives of General Psychiatry 39,
Psychiatry 38, 381-389. 47-52.
Seiler, L. H. (1973). The 22-item scale used in field studies of mental Weissman, M. M. & Myers, J. K. (1978). Affective disorders in a US
illness: a question of method, a question of substance, and a urban community: the use of research diagnostic criteria in an
question of theory. Journal of Health & Social Behavior 14, epidemiological survey. Archives of General Psychiatry 35 1304-
252-264. 1311.
Shore, J. H., Kinzie, J. D., Hampson, J. L. & Pattison, E. M. (1973). Wing, J. K., Cooper, J. E. & Sartorius, N. (1974). The Measurement
Psychiatric epidemiology of an Indian village. Psychiatry 36, and Classification of Psychiatric Symptoms. Cambridge University
70-81. Press: London.
Shrout, P. E., Dohrenwend, B. P. & Levav, I. (1986). A discriminant Wing, J. K.., Mann, S. A., Left", J. P. & Nixon, J. M. (1978). The
rule for screening cases of diverse diagnostic types: preliminary concept of a 'case' in psychiatric population surveys. Psychological
results. Journal of Consulting Clinical Psychology 54, 314-319. MedicineS, 203-217.
Spitzer, R. L. (1983). Psychiatric diagnosis: are clinicians still Woodruff, R. S., Goodwin, D. W. & Guze, S. B. (1974). Psychiatric
necessary? Comprehensive Psychiatry 24, 399-411. Diagnosis. Oxford University Press: New York.
Spitzer, R. L. & Endicott, J. (1968). DIAGNO: a computer program
for psychiatric diagnosis utilizing the differential diagnostic
procedure. Archives of General Psychiatry 18, 746-756.

You might also like