You are on page 1of 26

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/241874054

Questionnaire Design Methods

Article · January 1996

CITATIONS READS

27 5,081

1 author:

Paul Oosterveld
Leiden University
84 PUBLICATIONS   3,848 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Preffi View project

Alexithymia View project

All content following this page was uploaded by Paul Oosterveld on 10 April 2018.

The user has requested enhancement of the downloaded file.


Questionnaire Design Methods
Paul Oosterveld

This is Chapter 2 of the dissertation Questionnaire Design Methods. It was prepared at the
University of Amsterdam and published by Berkhout Nijmegen. It can be referred to as
follows:

Oosterveld, P. (1996). Questionnaire design methods. Nijmegen: Berkhout Nijmegen.

ISBN 90-6310-842-7

© 1996, Paul Oosterveld, Amsterdam


Methods for
questionnaire
design
2
INTRODUCTION

Since the early 1920s applied methods of questionnaire construction have been subject to
influences from both within and outside psychology. Initially, the empirical foundation of
the assumptions underlying questionnaire construction, and the content of the questionnaire,
were limited. The first instruments were constructed mainly on the basis of intuitive
knowledge about the concepts to be measured. Kelley, who devised one of the first interest
inventories in 1914, employed judges to determine which items should be used in the
measurement of interests (DuBois, 1970). Soon thereafter item selection and weighing based
on external criteria were introduced (Strong, 1926; Bernreuter, 1931). Ream (1924) developed
an empirical key for a salesman success scale by contrasting the responses of successful
salesmen with those of unsuccessful salesmen. The development of statistical techniques
contributed to the development of more complex construction approaches, that take into
account the relationship between measurement and behavior (Likert, 1932). The emergence
of a formal test theory encouraged the development of methods combining theory
development and scale construction (Thurstone, 1929, 1931; Guttman, 1944, 1954). The
increase in psychological knowledge, and the shift from behaviorism to cognitive
psychology, contributed to the development of construction methods that are based on
content analysis and empirical research into relations between concepts (Cronbach & Meehl,
1955; Campbell & Fiske, 1959; De Groot, 1961). The introduction of computers has made it
possible to perform complex psychometric analyses on a routine basis. This in turn has
encouraged the development of models and accompanying methods. Although the methods
that have been developed over the last 70 years, have been applied widely, a given method is
seldom adopted on the basis on an explicit assessment of its specific advantages and
disadvantages.
Information about the construction of questionnaires can be found in literature (Aiken,
1994; Anastasi, 1988; Crocker & Algina, 1986; Cronbach, 1990; Dawis, 1987; De Groot, 1961;
DeVellis, 1991; Friedenburg, 1995; Gregory, 1996; Kaplan & Saccuzzo, 1989; Murphy &
Davidshofer, 1994; Spector, 1992; Walsh & Betz, 1995), standards for psychological testing
(APA, 1985), manuals of questionnaires (e.g., Jackson, 1984; Vorst, 1990), and documentation
of questionnaires and tests (Buros, 1978; Mitchel, 1985). It is remarkable that the discussion of
the construction methods in most these sources is limited to a few pages. Furthermore, most
authors emphasize the requirements of a questionnaire, but hardly mention the procedures
followed in constructing it. Moreover, a distinction between methods is rarely made. In
manuals, for example, the method of construction is discussed in some detail, but such
discussions do not follow an existing scheme or employ an accepted nomenclature such as is
available in, for instance, experimental and quasi-experimental research (Campbell &
Stanley, 1963; Cook & Campbell, 1979).
A taxonomy of construction methods including details on the threats to the validity
associated with each method, would be very useful both to the constructor and to the users of
the questionnaire. Given such a taxonomy, the construction and application of questionnaires
could be based on explicit and rational considerations relating to validity. A taxonomy,
finally, would greatly facilitate communication about questionnaires and their development.

REVIEW OF METHODS

A construction method refers the procedure followed in constructing a measurement


instrument. Such a procedure starts with the formulation of the measurement objectives and
ends with the evaluation of the instrument. Often questionnaire construction is defined in
terms of item selection, or scale construction (e.g., Hase & Goldberg, 1967). The present
author considers item selection to be a single aspect of questionnaire construction, as the
items cannot be viewed as given.
Based on a review of the literature, 30 different procedures were distinguished
(Oosterveld & Vorst, 1995). In reviewing these procedures, it is striking that there is no
consensus about terminology. For example, the term rational method is used by several
authors to refer to radically different procedures (Broughton, 1984; Hase & Goldberg, 1967;
Kelly, 1967; Wilde, 1977). Furthermore, there are large differences in abstraction level of the
methods. The so-called deductive and inductive methods of Burisch (1984) are presented
explicitly as broad categories, whereas the methods discussed by Hermans (1969) refer to
quite specific methods of item selection.
On the basis of procedural similarities, the 30 distinguished procedures were subsumed
within twelve somewhat broader categories. Although this categorization summarizes the
procedures, it is not based on a rationale for the procedures, and does not give rise to such a
rationale. The question is whether differences in procedures necessitate the assumption of
different construction methods, especially when differences between the methods are rather
small.
In order to obtain a more meaningful categorization, the underlying assumptions of the
twelve procedures were assessed. Each step in the various procedures was evaluated with
respect to its supposed meaning in an underlying philosophy of construction. In other words,
a method is not defined by the procedures, but by the goal that is pursued in the
construction. In so doing, it emerged that each procedure was explicitly, or implicitly,
directed towards the optimization of a certain psychometric aspect of questionnaires. It also
emerged that the procedures could be subsumed within six broad categories that each related
to a specific psychometric aspect. Based on this assessment, six primary construction
methods were identified. These methods are: the rational, the prototypical, the internal, the
external, the construct, and the facet method.
These methods are discussed in detail in the remainder of this chapter. In discussing
each method, a number of steps, or phases, in the construction are distinguished. The
distinction of these steps facilitates the discussion and emphasizes the differences between
the methods. The steps distinguished are: the determination of the theoretical framework,
concept analysis, item specification, item production, item judgment, scale construction, and
validation. Although the methods are discussed in terms of these steps, not every method
contains all seven steps.
Note that no attention is given to compiling norms for the questionnaires as the data
collection for the norms is preferably executed after the questionnaire is made, and there are
no differences in requirements regarding the norms between the methods.

THE RATIONAL METHOD

The foundations of the rational method (Kelly, 1967; Wiggens, 1973) can be traced to
Woodworth’s Personal Data Sheet (Woodworth, 1918) and Laird’s Personality Inventory
(Laird, 1925). This method optimizes face-validity and pays little attention to empirical
justification.
In this method, the judgment of the questionnaire constructor, or that of experts, forms
the criterion for the appropriateness of an item. The method is therefore also referred to as
the common sense strategy (Kelly, 1967). Wiggens (1973) identifies the correspondence point
of view of the early, introspective psychologists as the philosophy underlying this method.
This states that there is a one-to-one correspondence between verbal report and hypothetical
internal state. The correspondence point of view provides the justification for judging the
suitability of an item to measure a given internal state by its verbal content
The term rational refers to the supposed rationality of the considerations of the experts.
The method is also known as the intuitive method (Hermans, 1969; Jackson, 1973), the pre-
theoretical or pre-constructual method (Wiggens, 1973), the non-theoretical method (Kelly,
1967), and the theoretical method (Hase & Goldberg, 1967). These last two terms illustrate
again the lack of consensus in the literature about the terminology.
Many ad hoc scales that are used in research are based on this method.

Theoretical framework

The theoretical framework of a rationally designed questionnaire is generally provided


by the constructor’s ideas about the concept. These ideas, which are usually expressed in a
working definition, are implicit hypotheses based on informal observations, empirical
results, or a review of the literature. Refinement of the working definition may be based on
interviews with experts, or members of the target population of the questionnaire. Other
sources such as essays, diaries, etc. may also contribute to the definition.
Concept analysis

It is characteristic of the rational method that the concept is specified in typologies,


complex syndromes, or global descriptions. The domain specifications are generally derived
from the knowledge of experts (clinicians, psychologists, psychiatrists, teachers, managers,
etc.) or members of the target population. Because the theoretical basis of a questionnaire
constructed according to the rational method is generally rather limited, the difference
between the specification of the theoretical framework, and the concept analysis is small.

Item specification

The rational method poses few formal demands on a questionnaire. Specification of the
features of the items is limited to intuitive or informal criteria. The perceived coverage of the
concept domain generally forms the only criterion in determining the number of items
included in the questionnaire.

Item production

The item wording is based on the typologies or global descriptions. The material
collected by means of interviews, essays, etc. may also provide suggestions for item content..

Item judgment

A separate review phase may be incorporated in the process of construction. In this


phase, experts judge the preliminary item pool. To assure the face validity, the relevance of
the items to the measurement is assessed. If feasible, poor items are rewritten, otherwise they
are discarded.

Scale construction

The selection of items and the scale construction are based on the experts’ or
constructor’s judgment. In this step, each item has to be assessed with respect to its coverage
of the concept. Usually, the assessment is carried out by a team and the decision to exclude
an item is based on a vote. In addition, the experts may provide cut-off scores (pass/fail) or
interpretative categories (diagnostic criteria).
In view of the general availability of computers and widespread knowledge of statistics
and test theory, the sole reliance on the rationality of judges for the selection of items is
presently uncommon. Such selection could of course be aided by, for instance, a reliability
analysis. Such a procedure may improve the questionnaire. Wiggens (1973) points out that
such a questionnaire must be viewed as rational as long as the items are solely based on the
correspondence point of view and external correlates are not investigated.

Validation

The validation is not a particularly elaborate phase. The experts’ judgment of the items
are supposed to provide a guarantee of the face validity of the instrument. In addition to
these judgments, comparisons are sometimes carried out between, say, results based on the
questionnaire and results based on clinical evaluation.

Comment

Construction according the rational method is guided to a large extent by the


(informal) knowledge of experts. The empirical underpinning of this knowledge, and so of
the resulting instrument, is not of great concern. The method appears to be most appropriate
when the concepts of interest have been explored only superficially or when little formal
knowledge concerning the concepts is available. The method could also be used to construct
a preliminary instrument, upon which subsequent revisions, using other methods, could be
based.

THE PROTOTYPICAL METHOD

The basic assumptions of the prototypical method (Broughton, 1984), also known as act
frequency approach (Buss & Craik, 1980, 1981, 1983, 1985), are derived from the prototype
theory of the representation of categories (Rosch, 1973, 1978). According to this theory,
categories are represented by fuzzy sets. Categories are defined by a number of prototypical,
or characteristic elements. Besides these prototypes, categories contain elements that are less
characteristic. For all these elements, category membership is probabilistic rather than
discrete (Broughton, 1990). Viewed spatially, the highly prototypical elements are viewed as
central and the less prototypical elements are viewed as peripheral. The categories do not
have clear boundaries: in the periphery, the elements have features in common with the
prototypes of other categories. In other words, the categories have overlapping peripheral
regions (Broughton, 1984, 1990).
A second assumption of the prototypical method is that psychological traits can be
represented by cognitive categories with a prototypical structure. Traits are viewed as
categories of behaviors (called acts), and some acts are considered to be more prototypical of
the trait than others. If questionnaire items are restricted to the prototypical acts, the subjects’
cognitive representation of the traits and the item content will coincide. Thus, it is believed,
the more prototypical items will be better understood and will be more valid than the less
prototypical items. As the prototypical method is directed toward the optimization of the
cognitive process of stimulus representation, the term process validity is suggested to denote
the feature optimized.
Buss and Craik (1980, 1983) applied the act frequency or prototypical method to
optimize criterion measures. Broughton (1984) argued that the method could also be applied
in the construction of measurement instruments. Examples of questionnaires designed
according to the prototypical method are given by Amelang, Herboth, and Oefner (1991), De
Jong (1988), Romero, Luengo, Carrillo de la Peña, and Otero-López (1994), and Visser and
Das-Smaal (1991).
The construction starts with the production of the items. Even if available, formal
theory concerning the concepts is not used in the prototypical method. Elaborate concept
definitions are not required, as the concepts are delineated by the respondents. Strictly
speaking, a clear cut demarcation of the concepts is impossible as the concepts are assumed
to have a prototypical structure (De Jong, 1988).

Item production

The items are generated according to a procedure that is derived from the
prototypicality assumption. In the so-called act nomination, members of the target
population of the questionnaire are instructed to think of one or more persons having the
trait to be operationalized, and to write down behaviors that exemplify this trait. The items
generated in this procedure form the preliminary item pool.
In the act nomination phase, a large number of acts are generated. Buss and Craik
(1983), for example, continued with the act nomination phase until at least hundred acts for
each concept were generated.

Item judgment

The items generated in the act nomination generally require some editing. This editing
consists of eliminating redundancies, non-act statements, general tendency statements, and
statements that are considered too vague to constitute an observable act. Furthermore,
grammatical errors are corrected, and the statements rephrased so that they are neutral with
respect to gender (Buss & Craik, 1983).
Editing is limited to minimize the constructor’s influence on the prototypicality of the
items. The actual appropriateness of the content of the acts is not a concern in this phase.
Items that are not prototypical, and therefore unsuitable, are removed in the scale
construction phase.

Scale construction

For the item selection, prototypicality ratings of the items are used. Judges, e.g.,
members of the target population of the questionnaire, rate the prototypicality of each item
on Likert type response scales. For each item the mean prototypicality rating is computed,
and the items with the highest mean ratings are included in the scale.

Validation

There are no specific validation procedures based on the prototypicality principle. The
prototypicality of the items, and process validity of the scales, are guaranteed by the act
nomination and prototypicality rating procedures. The instruments constructed according to
the prototypical method are usually validated by means of a peer-rating procedure
(Amelang, Herboth, & Oefner, 1991; De Jong, 1988), but other, generally accepted methods
for content, criterion, and construct validity can also be applied.

Comments

Construction according to the prototypical method is guided by the (informal)


knowledge and experience of the respondents. The prototypical method is recommended for
specification of implicit ideas (Visser & Das-Smaal, 1991), and operationalization of concepts
that are difficult to define (De Jong, 1988). In comparison with the rational method, the steps
in the construction are more elaborate. They are derived from a theoretically founded
philosophy of conceptualization, i.e., the prototypicality approach. As far as is known, the
instruments designed according to the prototypical method are good (Broughton, 1984).

THE INTERNAL METHOD

The internal method has its origins in the Personality Schedule which Thurstone derived
from the items of Woodworth’s Personal Data Sheet using the internal consistency method
(DuBois, 1970). The assumption of this method is that constructs cannot be specified in
advance, but that they must be derived from the relations between the item responses (cf.
Cattell, Saunders, & Stice, 1957). Both the number of scales and the clustering of the items are
determined by analysis of the item responses. The variance, that is shared by the items, is
attributable to a common factor, which is interpreted to be the underlying construct of
interest.
The internal method is also known as the inductive (Burisch, 1984), or factor analytic
method (Kelly, 1967). Examples of questionnaires constructed according to the internal
method are the 16 Personality Factors Questionnaire (Cattell, et al. 1957), the NEO-PI (Costa
& McCrea, 1985), and the 5 PFT (Elshout & Akkerman, 1975).
The internal method is not theory-oriented, the constructs are identified on the basis of
the results of statistical analyses. Therefore, the internal method does not include a
theoretical framework for the analysis of the constructs of interest. For this reason, the
specification of the item features, and the judgment of the items is rather limited in scope.

Item specification

The internal method requires that the items be homogeneous with respect to the global
concept domain. Thus, the initial item pool should consist of items referring to the content
domain of the questionnaire, such as personality, or attitudes. The homogeneity requirement
of the internal method is in stark contrast with the requirement of item heterogeneity of the
external method.

Item production

The construction process of the internal method starts with the compilation of the item
pool. The method does not provide specific guidelines for the production of items and
different procedures are conceivable. The reanalysis of an existing questionnaire or the
adoption of items from several questionnaires with comparable measurement aims are
commonly followed procedures (Briggs & Cheek, 1986), but application of the method is not
necessarily restricted to reanalysis of existing item pools (cf. Comrey, 1988).

Scale construction

Several techniques are available to identify items that form homogeneous scales.
Commonly used techniques are exploratory factor analytic and componential procedures
(Briggs & Cheek, 1986; Comrey, 1961, 1978, 1988; Lorr & More, 1980), multi-dimensional
scaling (Dolmans & De Grave, 1991), and cluster analysis (Finney, 1979; Kinsman, Dirks,
Wunder, & Carbaugh, 1989; Nunnally, 1967; Revelle, 1979).
The clusters of items identified in the analysis are interpreted post hoc. The meaning of
a given scale is derived from the content of the items in the scale. Items that do not cluster
well and items that are found to belong to more than one cluster are removed.

Validation

In the validation phase the stability of the clusters identified in the scale construction
phase is assessed. Cross-validation and confirmatory techniques are the preferred techniques
to carry out this assessment. The failure to cross-validate will result in a possibly flawed and
unstable clustering, as the error covariance structure may have had a confounding effect in
the analysis. The cross-validation addresses this influence by testing the stability of the
clustering. If a confirmatory technique is not available, the stability can be evaluated to some
extent by repeating the exploratory technique in an independent sample.
Furthermore, the validity of the scales is examined. The interpretation of the identified
clustering can provide useful information. Common approaches to validity estimation are
applicable.

Comment

The internal method is an inductive or hypothesis generating approach. Taking a broad


perspective, selection methods based on various psychometric models (Guttman, Thurstone,
Birnbaum, Rasch, Mokken), could be included in this category, because the structure of item
responses is the main focus of these models.
The identification of the specific concepts to be measured follows from the analysis of
the item responses. The method is particularly useful if one cannot identify the specific
measurement aims beforehand, but one can specify the global domain. A second application
of the internal method is the reanalysis of existing questionnaires that are characterized by a
poor structure. The analysis and interpretation may help to identify a better, empirically
based, structure.

THE EXTERNAL METHOD

The external method (Burisch, 1984) has its roots in the development of the early interest
inventories (see DuBois, 1970). The method gained popularity in the 1950s when behaviorism
dominated psychology. The fundamental principle of the method is that responses to
questionnaire items are themselves interesting pieces of behavior, that may be related to
several non-test behaviors (Meehl, 1945). The actual item content is not important, because
neither psychological theory, nor intuition are believed to be useful in the assessment of the
relevance of verbal content of the stimuli for prediction of non-test behavior. Only the
statistical relationship between item response and the other behaviors is informative.
The method is also known as criterion-keying, the criterion oriented (Wilde, 1977), the
empirical (Kelly, 1967), and the actuarial (Cronbach, 1990) method. Well-known
questionnaires developed by means of this method are the MMPI (Hathaway & McKinley,
1967), the SVIB (Strong & Campbell, 1966), the CPI (Gough, 1969), and the ACL (Gough &
Heilbrun, 1980).
The construction does not require a theory, or a concept analysis. The concepts are
established by the (predicted) non-test behavior or criterion, not by psychological theory
(Mumford & Owen, 1987). For example, in constructing a questionnaire to assess academic
success of students, the number of study credits obtained in a given period, might feature as
the non-test behavior, or criterion.

Item specification

The most important requirement is that item pool consist of heterogeneous items
(Edwards, 1970). The diversity of the items is very important. Only if as many different
aspects as is possible are represented in the item pool, will individual items make some
contribution to the prediction of the non-test behavior. The concept of a heterogeneous item
content does however introduce a difficulty. Assessment of the heterogeneity of the items
requires a judgment of item content. Therefore, although it is deemed conceptually
questionable, considerations relating to item content may play a role in establishing the item
pool. The development of the item pool of the ACL is an example (Gough & Heilbrun, 1980).

Item production

Any procedure that generates a heterogeneous item pool is suitable. Usually the items
of an existing questionnaire are used. The MMPI-items are often used as the item pool.
Although the MMPI was developed as a questionnaire to measure psychopathology, at
present keys are available for over 300 new scales, such as ego strength, dominance, and
social status. Most of these were derived by means of the external method (Anastasi, 1988).
Scale construction

The selection of the items for the scales is based mainly on the strength of the
relationship between items and criterion. Behavioral measure (number of study credits),
judgments by others (peer- or teacher-ratings), group membership (vocational group), or an
outcome of experimental manipulation (fear induction) can all feature as criteria. Items with
strong positive or negative relationships with the criterion are selected. Items with a strong
negative relation are mirrored in the scoring rule.

Evaluation

In this step the reliability and validity of the scales are assessed. Generally, the external
method produces scales with a low internal consistency. This is not surprising as the scales
consist of items that are heterogeneous in content. The test-retest reliability of the scales, and
stability of the prediction, are considered to be important (Wilde, 1977). For the estimation of
the criterion validity a cross-validation procedure is warranted, as the selection may be
biased by the error covariance structure, thereby influencing the stability of the item criterion
relations.

Comment

The external method is an a-theoretical method that may be useful if the goal of the
questionnaire is the prediction of a specific criterion. The emphasis is on the empirical
relationships between test behavior and non-test behavior. The contents of the scales is
diffuse, and the communicability of the concept measured by the scales is low (Burisch, 1984;
Cronbach, 1990).

THE CONSTRUCT METHOD

The construct method (Jackson, 1971, 1973) is a deductive, or theory oriented method. The
construction starts at the theoretical level with the identification and definition of the
constructs. The subsequent steps are elaborations of this specification.
The reaction to the purely instrumental (external) construction strategy, that was
popular in the 1960s and 70s, provided the impetus for the development of this method. The
criticism of the external approach was based on the notion that it is nonsensical to identify
relevant items solely on the basis of the empirical relations between item and criterion.
According to Jackson, ‘even an inexperienced item writer would be superior to empirical
selection with a typical heterogeneous item pool’ (Jackson, 1971). The construction process
should be directed towards optimization of construct validity instead of the optimization of
criterion validity. It is claimed that items, that are designed to optimize construct validity,
will outperform those based on the external method with respect to both criterion validity
and generalizability.
An additional argument in favor of the construct method, is that it does not require the
availability of an observable criterion. As observable criteria do not exist for many
psychological constructs, the application of a purely criterion oriented method is often not
feasible. Indicators used as criteria in the assessment of validity, such as peer-ratings, or
group membership, are confounded to the same extent as, or even more confounded than,
the measures to be validated (Burisch, 1984; De Groot, 1961, Wilde, 1977). Construct
validation offers a solution for this problem as well. In the validaty assessment the relations
between the construct and other constructs in the nomological network are verified
empirically.
The method described here is also known as the substantive method (Wiggens, 1973),
or the rational method (Broughton, 1984). The Personality Research Form (PRF, Jackson, 1984),
and the SchoolVragenlijst (SVL, Vorst, 1990) are examples of questionnaires based on the
construct method.

Theoretical framework

The construct method requires that the construction be guided by theory. The theory is
often expressed in a nomological network. The nomological network contains the basic
constructs to be operationalized, and the specification of the relationships between these
constructs. Moreover, related and confounding variables must also be taken into account.
Related variables are related empirically to the constructs of interest, but are supposed to be
conceptually distinct. Extroversion, for example, could feature as a related variable in the
development of a dominance scale, as extroversion and dominance can be difficult to
distinguish empirically. Confounding variables are variables like social desirability, and
other response sets, that may bias the measurement.
The nomological network must be specified and justified in advance: ‘unless the
network makes contact with observation, and exhibits explicit public steps of inference
construct validity cannot be claimed’ (Cronbach & Meehl, 1955).
Concept analysis.

The concept analysis involves defining and delineating the constructs of interest. The
designer must define the constructs under consideration, and determine the kinds of
observations expected to be useful as indicators. Furthermore, different conceptualizations of
the domain should be identified and taken into account. The similarities and differences
between these conceptualizations should be pointed out. Finally, the separation of related
constructs should be addressed. Subsequent validation will be seriously hindered if the
convergent and discriminant properties of the constructs are not addressed at this stage.

Item specification

It is important that the item content be taken into account. Questions to be addressed
are: What kind of judgments is the respondent able to make? What knowledge can be taken
for granted? What is the relation between manifest item content and the latent construct?
Furthermore, the constructor must pay attention to aspects such as the item format and
wording. The items may consist of statements, or of questions; the items may or may not be
written in the first person, etc.
Generally, item clustering in the construct method is unambiguous, i.e., if one knows
the measurement aims of the questionnaire, one is able to infer the item keying (Jackson,
1971).

Item production

The concept analysis and the actual specification of the items determine a number of
important aspects relating to the content of the items. They are central to the actual
formulation of items. It is important that each item be an independent replication in the
measurement procedure. To achieve this independence, it is recommended to pay attention
to the convergent and discriminant validity of the items. As the interrelationships of the
items are sensitive to semantic overlap, the wording of the items is very precarious. The
repetition of the same word, or strongly related words, for instance, should be avoided.
Furthermore, the possibility should be taken into account that an item may relate to another
than the intended scale (Jackson, 1971; Oosterveld 1990).
Item judgment

Before administering the questionnaire, the theoretical relevance, the content, and the
semantic features of the items are evaluated. Both experts and potential respondents can act
as judges.
Furthermore, it is recommended to carry out a pilot study to verify whether the items
are comprehensible and clear, and to check some basic psychometric item characteristics. If
necessary, items are rewritten or discarded.

Scale construction

Item selection takes place on the basis of content saturation (Jackson, 1973). Content
saturation refers to the convergent and discriminant validity of the items. Items that correlate
strongly with the intended scale scores, and weakly, or at least more weakly, with the other
scales are characterized by good content saturation and are retained. Poor items are
identified and discarded.

Validation

In establishing the theoretical framework, the constructs are defined within a


nomological network. Specific hypotheses about the correlations of the scales are based on
the presumed relationships in the nomological network. Theoretically related scales are
expected to correlate more highly than theoretically unrelated scales. The validity of the
questionnaire is examined by testing the hypothesis derived from the nomological network.
The reliability, convergent and discriminant validity coefficients can be estimated. Preferably
this is done in a sample that is independent of the one used in the scale construction.
Criterion validity generally plays a minor role in the construct method. As mentioned
above, true, observable criteria are often unavailable. Of course, if a criterion is available, it
should be used. Otherwise, the multitrait-multimethod design (Campbell & Fiske, 1959),
which employs several quasi-criteria (Burisch, 1984), provides the best evidence of construct
validity (Meehl & Cronbach, 1955).

Comment

The construct method can be regarded as a deductive or hypothesis testing strategy.


Ideally, the assumptions derived from the nomological network, and the concept definition
determine the content of the items. These assumptions are tested empirically. The
construction has a cyclic character: if the items or the scales are found to violate the
theoretical assumptions, construction is undertaken anew by revising the questionnaire.
Clearly the construct method is only applicable if sufficient formal knowledge is available to
set up a nomological network.

THE FACET DESIGN METHOD

The objective of the facet design method is the optimization of the content validity by means
of a systematic, and, ideally, exhaustive, specification of the concept. The method was
introduced by Guttman (1954, 1965b) as a generalization of Fisher's model for experiments.
The generalization was intended to be suitable for all methods of data collection. Fisher
discerns factors, factor levels, and cells. In the facet design an analogous distinction is made
between facets, facet elements (also referred to as structs), and structuples. The facets form
the main aspects of a concept domain, the facet elements are specifications of these aspects,
and the structuples define the indicators of the concept. The facet design is a useful tool in
the construction of questionnaires.
Examples of questionnaires constructed according to the facet design method are the
Reasons for slimming and weight loss questionnaire (Gough, 1985), the Social Anxiety Scale
for children (Cohen-Kettenis & Dekking, 1980; Dekking & Raadsheer, 1977), the Dental
Anxiety Questionnaire (Stouthard, 1989, Stouthard, Mellenbergh, & Hoogstraten, 1994), and
questionnaires for achievement motives (Talsma, 1995), and well-being among the elderly
(Hoff, 1995).
Two conceptualizations of the facet design exist. In the first, the facet design is
conceived as a method for systematizing observations (Roskam, 1987). In the second, the
facet design is viewed as a method to develop a hypothesis or auxiliary theory about a
concept (Hox, 1986, Hox & Mellenbergh, 1989). Following De Groot (1961), in the first
approach, one is dealing with an empirical concept, and in the second, with a hypothetical
concept. With regard to the first approach one could speak of a domain analysis, and in the
second, of a concept analysis. The present author focuses on the concept analysis.

Theoretical framework

Unlike the construct method, in which the constructs, their relationships, and related
and biasing constructs must be specified, the facet design does not require a comprehensive
theory. If theoretical notions play a part in the construction at all, they serve to delineate the
construct and, where possible, to identify the main constituent aspects of the construct. The
theoretical framework provides the context for the concept analysis.

Concept analysis

The concept analysis forms the core of the facet design. The description of the concept
defines the item content. The content analysis must result in a systematic description of the
concepts and their constituent parts. The concept analysis consists of four parts.
Firstly, an inventory is made of the behavioral features, and underlying processes that
are essential to the definition of the concept. Fear, for example, can be viewed as a
physiological reaction to a threat, a autonomous cognitive process, an affectional state, or a
behavioral response.
Secondly, the facets are defined. Facets are independent and mutually exclusive aspects
that describe a given concept or domain. In the facet theory, there are three different types of
facet: content, population, and response facets (Borg, 1979, Stouthard, 1991). The content
facets constitute the kernel of the concept definition, and represent the essential aspects.
Thirdly, the elements of the facets are determined. These elements are mutually
exclusive categories within the facet. To illustrate, Stouthard (1989, 1991) distinguishes five
dental anxiety facets: an undivided population facet (X), three content facets, time (A),
reaction (B), and situation (C), and finally, a response facet (R). The time facet is divided in
four time periods in which dental anxiety may

Figure 2.1 - Mapping sentence for Dental Anxiety (Stouthard, 1989)

at home a1
on the way to the dentist a2
The degree to which a person worries in the dentist’s waiting room a3
in the dental chair a4

introductory aspects c1
about the dentist patient interaction c2 as it is expressed in the
actual dental treatment c3

subjective feelings b1 not at all r1


physical reactions b2 which leads to being to anxious
cognitive reactions b3 extremely r5
occur: at home (a1), on your way to the dentist (a2), in the dentist’s waiting room (a3), and in
the dental chair (a4). These four periods are the elements of the time facet (see Figure 2.1).
The set of elements have to cover the meaning of a facet. It should be possible to subsume
each moment, that dental anxiety may occur, within one of the time periods. The elements
should be mutually exclusive within every facet. It should be possible to classify each
instance of dental anxiety in a single element of each facet.
Fourthly, the final structure of the facet design is determined. The facets, their
elements, and the relations between the facets are represented in a so-called mapping
sentence. The mapping sentence is the verbal expression of the facet design and produces a
number of specific descriptions of the concept. Every element of a content facet is combined
with one of the elements of the other facets (a Cartesian product). A specific combination of
facet elements is called a structuple (for example, a1,b2,c3 in Figure 2.1). Every structuple
defines a manifestation of the concept. In Figure 2.1, the mapping sentence for dental anxiety
is given.

Item specification

In this phase, a number of features of the items, like format, wording, and number are
specified. Generally, the facet design method gives rise to clear and unambiguous items. The
items are derived from the structuple definitions. Each item must be specific to a single
structuple. The number of items depends on the size of the facet design. For a complete
representation, at least one item per structuple is needed. If structuple scores are required,
however, several items are needed to obtain a reliable measure. The number of items,
therefore, also depends on the desired level of scoring.

Item production

As the structuples are defined by their constituent facet elements, the item content is
precisely defined. Furthermore, the mapping sentence provides a guide to the actual writing
of the items. The content facets are mainly used in writing the items. The population facets in
most facet designs are of no importance for item content, unless specific subpopulations
require specific item contents.

Judgment of items

Canter (1985) points out that during the process of formulating, and reformulating
items, the meaning of a given item may change to such an extent that it no longer covers the
intended structuple. The items are judged with respect to the extent to which they are
compatible with the facet design. Moreover, problems that arise in producing items may be
indicative of a flawed facet design. The judgment phase can necessitate a modification of the
facet design.

Scale construction

In the scale construction phase a number of postulated scale and item features are
investigated. The facet design can be used to specify a hypothesis about the dimensionality
or spatial structure of the item responses (Borg, 1979; Borg & Shye, 1995; Mellenbergh,
Kelderman, Stijlen, & Zondag, 1979). With respect to the spatial structure, one assumes that
the items that belong to the same structuple are more alike than items that belong to different
structuples. The same applies to the clustering with respect to facet elements. The specific
structure is dependent on the facet design and the hypothesized relations between the
elements (Levy, 1985, 1990). Multidimensional scaling can be used to determine whether the
item responses are compatible with the hypothesized structure. In the factor analytic
approach, it is assumed that the facet design can be represented by a number of factors, e.g.,
a general factor, and a factor for every facet element. The hypothesized factor structure can
be tested by means of confirmatory factor analysis (Jöreskog & Sörbom, 1988).
In both approaches, items violating the model assumptions can be identified. Item
selection is based on this information.

Validation

The validity of the spatial structure or factor model must be tested in an independent
sample to assess the effects of capitalization on chance in the item selection phase. Moreover,
the reliability and validity of the questionnaire must be established. The facet design method
does not include specific procedures to these ends.

Comments

Like the construct method, the facet design method is a hypothesis testing method. The
assumptions relating to the domain specification, that are incorporated into the facet design,
are tested empirically. Theoretical considerations, however, generally do not play a major
role in the construction. Through an elaborate concept analysis, the content validity of the
questionnaire is optimized. The method is particularly suitable if formal knowledge of the
concept domain and of the domain of indicators is available, or can be acquired easily.
DISCUSSION

In Table 2.1 the steps of the six methods are summarized. The methods are subsumed within
three classes that are denoted intuitive, inductive and deductive. These classes are based on
the relationship between the theoretical constructs and their operationalizations. In the
rational and prototypical methods there is only an intuitive link between the latent construct
and manifest items. The internal and external method are data-oriented methods and the
clustering of items is derived from the observed relations between the items, or between the
item and a criterion. Therefore, these methods are labeled inductive. The construct and facet
method are based on a deductive line of reasoning.
The three classes are analogous to the correspondence, instrumental, and substantive
points of view that are distinguished by Wiggens (1973). The deductive class is similar to
Burisch’s (1984) deductive methods. Burisch, however, also considers the rational and the
prototypical method to be deductive methods. Wiggens, in contrast, points out that, although
in the construction by means of the rational method, a notion of the construct is present prior
to the construction, the construct is not specified sufficiently clearly to allow the formulation
of hypotheses about the relations between the measurements. Furthermore, the inductive
methods presented here do not resemble Burisch’s inductive method. Burisch’s inductive
method is similar to our internal method.
Another issue concerns the steps distinguished in describing the methods. The seven
steps, that were introduced without any real justification, are based on similar divisions in
the literature, and on the author’s own preferences. The number of steps can be reduced to
four: conception (theoretical framework, and concept analysis), elaboration (specification of
items, item production, and item judgment), scale construction, and evaluation (validation).
This division gives rise to extensive descriptions of certain steps of some methods and
obscures the specific differences between the methods. The division in seven steps makes
these differences clear, and has an obvious appeal, at least for the present author. In short,
discussion of the methods in terms of the seven steps is based more on pragmatic than on
fundamental considerations.
Table 2.1 Questionnaire design methods

Class Intuitive Inductive Deductive

Method Rational Prototypical Internal External Construct Facet

Aspect face validity process homo- criterion construct content


validity geneity validity validity validity

Step

theoretical work – – – nomological theoretical


framework definition network context

concept global – – – precise facets and


analysis description definitions, facet
or typology demarcation elements

item informal – homo- hetero- clear, mapping


specifi- criteria geneous geneous homo- sentence
cation geneous,
content
saturated

item diagnostic act- no no based on based on


production questions nomination guidelines guidelines definitions definitions

item review by redaction no no item content item content


judgment experts guidelines guidelines and pilot and pilot
results results

scale face validity proto- dimensional item-criterion convergent dimen-


construc- typicality analysis relation and sionality
tion ratings discriminant
item
validities

validation diagnostic reliability cross cross reliability reliability


comparison validation validation
validity convergent validity
test of mini retest and
theory reliability discriminant model fit
validity

The central issue associated with this taxonomy concern the mutual exclusiveness of
the methods, the exhaustiveness of the taxonomy, and its practical usefulness.
The mutual exclusiveness of the six methods is supported by the distinction of the six
specific psychometric criteria optimized in each of the methods. That these psychometric
criteria are viewed as independent qualities in the literature supports the present
categorization of methods of construction.
The six methods described can be viewed as ideal types. Specific procedures may vary
in practice, and hybrid methods are conceivable. One could, for example, generate items
according to the act nomination procedure, and subsequently perform a factor analysis of the
item responses. Most hybrid methods can be fitted into the taxonomy. As the internal
method does not specify how the items should be generated, the hybrid method just
mentioned could be assigned to the internal method. Furthermore, one should take care to
determine which criteria a hybrid method actually optimizes, if any. One could argue that
the hybrid mentioned optimizes both homogeneity and process validity. However, in this
specific case, the criterion of process validity is accorded secondary importance and would
not be optimized effectively.
Furthermore, it should be recognized that certain methods involve incompatible
assumptions regarding the concepts. The facet design method, the construct method, and, to
some extent, the internal method are aimed at the delineation of the concepts and the
coverage of the entire concept domain by the items. The prototypical method, on the other
hand, is aimed at the restriction of the item contents to the core of the concept, and, in fact,
rejects the notion of a clear delineation of constructs. The external method, on the other hand,
is not even concerned with the representation of the construct. These incompatibilities render
certain hybrid models implausible. Although an item generation by means of the facet
design, and an item selection by means of prototypicality ratings is imaginable, it would be
hard to justify such a procedure.
The exhaustiveness of the categorization into six methods is supported by the
distinction of the six criteria that are optimized. In a thorough review of the literature on
conceptions of validity, Van Berkel (1984) identified four test and questionnaire related
validity types: criterion related validity, content related validity, construct related validity,
and quasi-validity (face validity). Together with homogeneity, a psychometric criterion not
related to validity, and process validity, a newly identified validity, all important criteria are
represented in the present categorization. The development of a new construction method
would seem to entail a new psychometric criterion. For instance, in the case of the
prototypical method, a relatively new method, a new criterion was distinguished, namely
process validity.
It is stressed, however, that this argument offers only limited support for the claim that
each method is directed towards the optimization of a particular psychometric criterion. The
proposed relationship between method and psychometric criterion served to structure the
approaches to questionnaire construction and is not a finding of the literature review. The
fact that a new concept of validity could be identified with a new construction method does
however lend support to the credibility to this claim.

The practical value of the taxonomy lies in its usefulness in choosing a given
construction method with a view to meet well defined criteria. In addition, the taxonomy will
facilitate the communication about the quality of questionnaires.
Regarding the choice of construction method, it appears that optimality of a given
construction method depends on the state of knowledge concerning the constructs of interest.
Intuitive methods of construction (the rational or prototypical method) seems suitable when
the constructor’s knowledge of the concept to be measured is limited. An inductive method
(the internal or external method) is preferable to a intuitive method when there is a some
knowledge about the construct and there is perhaps one or more provisional instruments
available. A deductive method (the construct or facet method) depends on the presence of
extensive knowledge about a construct, with a beginning of a nomological network, or
substantial knowledge about the content and structure of the constructs. Thus the usefulness
of the construction methods depends greatly on the amount of available knowledge about
the construct.
The taxonomy offers a summary and systematic description of the procedures for
questionnaire construction. As mentioned, a comprehensive taxonomy, including the latest
methods, has hitherto been lacking. The communication between designers and users of
questionnaires can benefit greatly from the availability of the taxonomy. In providing
information concerning the questionnaire, an indication of the method of construction used
and details concerning any deviations from the method’s exact requirements, would suffice.
If additional knowledge is available about the relationships between methods and quality of
the questionnaire, or about the specific threats to validity associated with the methods, a
more rational and better motivated choice of instrument could be made.
At present, little can be said about the relation between the qualities of the instruments
and the method of construction followed. In fact, an important aim of the present study is to
compare the various methods empirically. This statement may seem inconsistent, as it was
argued above that each method is aimed at the optimization of a certain criterion. The
problem is, however, that the effectiveness of these procedures in achieving a given aim is
generally unknown. The external method is directed towards criterion validity, but how
valid are external scales? Another question concerns the effect of the optimization of a given
criterion on the other criteria. How does the optimization of, say, content validity affect
criterion validity? It should also be noted that the relation between criterion optimized and
the objective of the designer may be more subtle than presented so far. As noted above,
Jackson (1971) claimed that a construct questionnaire will outperform an external
questionnaire with respect to criterion validity, and Broughton (1990) has asserted that the
most prototypical items are the most valid. Therefore, focusing on one psychometric criterion
does not imply that the other aspects are unimportant. The criterion optimized may be
regarded as a sub-goal, or the other criteria may be seen as dependent on the criterion
optimized. It is for these reasons that an empirical comparison of the methods of the
construction of questionnaires is required. The remainder of this text will focus on these
issues.

View publication stats

You might also like