You are on page 1of 17

19

Quality & Quantity 30: 19—35, 1996.


0 1996 Kluwer Academic Publishers. Printed in the Netherlands.

A common base for quality control criteria in quantitative and


qualitative research

PETER G. SWANBORN
University of Amsterdam, IJsbaanpad 9, 1076 CV Amsterdam, The Netherlands

Abstract. In the social sciences, several scientific paradigms are mutually isolated owing to
their use of specific sets of methodological criteria and quality control procedures. In this
article, the central hypothesis, to be tested by conceptual analysis and logical reasoning, is
that recommended procedures for quality control in quantitative as well as qualitative
research can be derived from a common base of regulative ideas. By 'qualitative', we mean
the complex of ethnographic, anthropological, symbolic interactionist, ethnoscience and
related approaches. A second goal is to demonstrate the use of regulative ideas as a
parsimonious and fruitful base for a comparative analysis of methodological canons.
Although our focus is on the comparison of quantitative and qualitative (or: naturalistic)
research, we also pay attention to policy research as opposed to fundamental research.

1. Introduction

An analysis of methodology texts, originating from different research


paradigms, with respect to central concepts such as reliability, validity,
generalizability, etc., quickly leads to the emergence of a plethora of
concepts and quality control demands, as well as to the conjecture that more
or less the same basic ideas are expressed in different terminologies.
To check this conjecture, we start with the presentation of a parsimonious
and hierarchical scheme of concepts, as used in standard (quantitative)
methodology. It is parsimonious because it is based on a restricted number
of so-called regulative ideas (Section 2.1—2.5).
Secondly, we try to reconstruct within the context of our scheme several
procedures in use to enhance research quality, as advocated within the
different traditions (Section 3). A regulative idea for science is a very
general idea that governs and directs scientific research. The concept
originates from the German philosopher Kant (1787). Popper (1966)
applied it to modern philosophy of science.
As regulative ideas for science we mention:
— striving after intersubjective agreement
20 Peter G. Swanborn
— striving after valid argumentation
striving after efficiency.
And, as an 'extra' for fundamental research: —

striving after maximally informative knowledge

and, as an 'extra' for policy research:

— striving after usable knowledge.

In this article, we do not pay attention to the regulative ideas of valid


argumentation and efficiency, because these are not differentially conceived
or do not lead to diverging 'quality control' procedures in the above-
mentioned approaches. The striving after valid argumentation or internal
consistency of theories concerns the logical quality of connections between
propositions. 'Efficiency' refers to the ratio between costs and expected
benefits of the research project or program.
The regulative idea of maximizing information content is tackled in
Section 4, and we conclude with a short discussion on the striving after
usable knowledge (Section 5).

2. Striving after intersubjective agreement

2.1. Theories of truth

As the central regulative idea in science one would of course expect


'striving after truth'. To avoid, however, being drawn into fruitless debates
between protagonists of correspondence-, coherence- and consensus-
theories of truth, most scientists do not use the concept of truth any longer.
We assume that modern researchers agree on the fact that an empirical
science cannot do without observation and experimentation; that they
furthermore agree on the fallible and provisional character of scientific
knowledge and the theoryladenness of observations; that, finally, they agree
on the important role of the scientific community. With respect to this last
characteristic, it is clear that consensus within the scientific community
over research results, whether these results concern so-called objective facts
or whether they take the form of perceptions and subjective constructions
of people, is a central demand. Therefore we regard intersubjective
A common base for quality control 21
criteria

agreement as a common generally agreed-upon stand-in for the regulative


idea of striving after truth.
In the following the methodological norms deduced from the demand for
intersubjective agreement are discussed.
2.2. Controllability: a necessary condition

Intersubjective agreement on propositions or models regarding the


empirical world presupposes controllability. To achieve this end, several
necessary conditions have to be fulfilled:
— research products, and the procedures used to produce them, are public;
— research products are expressed in a precise language; — research
products are falsifiable.
The last requirement, falsifiability, is the overarching demand that renders
the others superfluous, since a vaguely formulated statement or a statement
that is not made public cannot be falsified. And without the possibility that
results may be false, the concepts of intersubjective agreement and control
are senseless. We mention the other necessary properties explicitly,
however, because they can be easily recognized and criticized. If one
dislikes the Popperian connotation of falsifiability one may as well use the
concept of testability.
Controllability is, of course, nothing more than a necessary condition for
intersubjective agreement. This point needs our attention because many
qualitative methodologists conceive it, erroneously, as a sufficient
condition.1

2.3. Reliability: a common criterion

The next question is which criteria are applied in evaluating research results
and striving for intersubjective agreement. There exists virtually no
difference of opinion among researchers of very different background with
respect to the concept of reliability (this point of view is illustrated in
Section 3). The person of the researcher, time and circumstances of the
measurement as well as the measuring instruments are generally regarded
as irrelevant variables, leading to stochastic error and therewith lack of
reliability. Therefore, scientists usually agree that propositions about the
empirical world should be, as much as possible, independent in at least
three respects:
— researcher-independent (intersubjective agreement in a restricted sense);
— time-independent;
22 Peter G. Swanborn
— instrument-independent.
'As much as possible' is added, because absolute independence, which
would mean objectivity (in the sense that the researcher's activities can be
completely programmed for a computer) can only be obtained with respect
to a very limited subset of research activities.2
If empirical propositions are unreliable, they are not admitted to further
debate within the scientific community. So, reliability is a very central
demand to put to all kinds of research results.
Reliability controls are in principle implemented by replication, and
calculating some coefficient of agreement. The procedures. and respective
concepts, are of course well-known:
— If, with replication over researchers it shows that knowledge is
researcherindependent (e.g. by Cohen's kappa) the label inter-researcher
reliability is legitimate. Replication is a very common procedure, also in
qualitative research. In practice, however, for many researcher-roles (e.g.
interviewer, writer of the report) replication is hard to realize; reliability
can only be assumed.
— If, with replication over time (settings) the conclusions remain stable,
we use the label stability (in quantitative research stability is generally
determined by calculating some correlation coefficient).
— If, with replication over instruments, conclusions remain the same, the
label instrumental independence would be adequate. In research, this
kind of replication usually takes the form of working with a set of items
that serve as mini-instruments. If the units of observation are ranked in an
identical way by these instruments we conclude that the scale is
internally consistent or reliable (in quantitative research usually
Cronbach's alpha is calculated). If really different instruments are used
(e.g. observation and interviewing) we calculate correlation coefficients.
In naturalistic research often the label 'triangulation' is applied when
different instruments are used, but the context of this operation is
different (see below).

2.4. The ultimate criterion: validity

It is, however, well known that reliable results do not necessarily constitute
valid results. Validity means, in a very general sense, that our propositions
describe and explain the empirical world in a correct way; in a stricter
sense: that they are free from random as well as systematic errors.
A common base for quality control criteria 23
However, once again, the mixture of correspondence- and
consensustheories of truth presents itself. Intersubjective agreement with
respect to what are, and what are not, systematic errors is far more difficult
to reach than with respect to reliability. If our propositions refer to
statistical properties of operational variables and their relations, reliability
may be sufficient. But an important part of our research results is based on
argumentation and interpretation, and here the concept of validity is at
stake.
For instance, a considerable subset of research results concerns causal
interpretations of covariances. Intersubjective agreement demands at least
some explicit attempts to falsify alternative causal interpretations (by
means of research designing and/or data analysis). The relevant criterion is
called internal validity. 3 We have to remark that causal thinking is
generally absent in naturalistic research, so the necessity for controlling for
third variables is not apparent to each qualitative researcher. If, however,
the need is felt for eliminating alternative causal interpretations, verbal
argumentation and sometimes supplementary data are used.
Another important subset of research results is expressed in the language
of theoretical concepts. The problem, of course, is whether research
findings, expressed in operational language, can be generalized to relations
on a theoretical level. Here, the concept of construct validity is at stake.
Systematic errors, concerning the measurement of not-intended concepts or
the incomplete coverage of intended concepts threaten construct validity.
Such errors, that do not disappear by repeating measurements, may be
detected

— by inspection of instruments (content validity); by bringing in theory,


and calculating correlations with variables that are predicted by our
instruments (pragmatic validity) and with other variables that are, according
to theory, supposed to correlate with the measured variable (construct
validity in a restricted sense). Procedures for the determination of content
validity are applied in quantitative as well as in naturalistic research. The
calculation of correlation coefficients, however, in the framework of testing
pragmatic and construct validity, is restricted to quantitative research.

Whether in a certain case construct validity, as well as internal validity, is


regarded as 'sufficient', results from consensus between researchers. One
more kind of validity has not been mentioned up till now; the concept of
external validity, which refers to the open question whether obtained
24 Peter G. Swanborn
research results are generalizable to other populations, places and points in
time. The problem boils down to the regulative idea of maximizing
information content (see Section 4).

2.5. A scheme for regulative ideas


We conclude this discussion by representing regulative ideas and derived
methodological demands in Scheme 1.
regulative ideas quality control

denunds intersubjective

controllability (a

necessary condition)

agreementresearcher-independency

time-

independency instrument-independency intemal validity

__r construct validity

max. information extemal validity

valid argumentation

efficiency

usability
Scheme 1. Regulative ideas and derived quality control demands.

3. How to reach intersubjective agreement?

By which means do researchers try to maximize chances that they will


agree, with respect to reliability as well as validity? The manifold of
existing research procedures can be grouped into five categories. In
discussing each category, we consider its role in quantitative as well as in
qualitative research.

1. Objectivation. For example by using instruments instead of


subjective researchers for measurements and calculations. In social research
A common base for quality control criteria 25
this is only possible on a small scale. The use of extended databanks for
objective data instead of subjective interviewing or observation; the
avoidance of clerical errors by using lap tops are examples. In qualitative
research objectivation is also advocated. The widespread use of personal
computers for protocoland in general documentary analysis has given rise
to a manifold of programs for qualitative text analysis (see Miles &
Huberman, 1994; 311—317). During the last decennia, the application of
audio-visual auxiliary apparatus (documents, films, videotapes, audio
recordings, pictures) to document verbal and nonverbal (re)actions has
served objectivation. Serious problems of interpretation may, of course,
arise afterwards, but the main advantage is that a premature false
interpretation of observation or interview data by one researcher stands to
correction. All these procedures are directed towards controllability in the
first place, and reliability (researcher-independency) in the second place.
2. Standardization. Standardization means a uniform approach of
respondents, groups or phenomena. Objectivation may lead to
standardization and vice versa, but there is no identity. General instructions
for interviewers, experiment and observers; precoded questionnaires; these
are dominant characteristics of quantitative research nowadays.
Standardization is perhaps, also, the most apparent characteristic lacking in
its qualitative counterpart: in the latter research tradition, it is almost
completely absent. The use of pre-coded questionnaires in interviewing is
even one of the central points of attack of 'naturalists' with respect to
quantitative research. The concept of standardization is close to Smaling's
'regimentation' (Smaling, 1992).

3. Repeated measurements and taking the mean. Working with more


than one observer; the application of scales consisting of several items
instead of working with single questions; these are well-known examples of
applying the principle "it is always better to take the mean of a number of
imperfect measures, than to rely on one imperfect measure". In naturalistic
research, also, the general idea is applied, for instance by having a scene of
social interaction documented by audio-visual means, interpreted by several
researchers, and finally determining the common kernel of their scores.
Outside the context of audio-visual means, also, the same principle is
recommended, for instance in the parallel working of two halved research
teams, each analyzing half of the material, with daily discussion.
Within the context of replication of instruments one encounters the
wellknown triangulation, the status of it being very unclear, however. The
problem with triangulation is whether diverging results are interpreted as
unreliability of one or more instruments, or whether they are explained as
26 Peter G. Swanborn
resulting from diverging 'actor's perspectives'. As far as we know, in most
cases discrepancies are gratefully accepted and explained as proof of the
existence of diverging subjective perspectives: triangulation is generally
used to obtain a 'more complete' picture of reality (for another critique of
triangulation, see Blaikie, 1991),

4. Verbal and nonverbal explication ofprocedures and products, and


encouraging colleagues to criticize the researcher. In 'quantitative
methodology' textbooks this requirement is often mentioned in a general
sense, but seldom elaborated. Quite the contrary is true with respect to
qualitative methodology texts. Thick description, keeping a diary,
especially with respect to all interpretation steps of the inquirer; peer
debriefing; the complete documenting of all research steps and asking an
external 'auditor' to examine the documented research process; all these
procedures refer immediately to the necessary condition of accessibility and
controllability of the products of our knowledge and the procedures used. In
doing so, we realize that with the explication canon only a necessary
condition is formulated; it is not indicated in which way differences of
interpretations between researchers are to be solved. In so far as, besides
explication, active attempts for scientific communication and mutual
criticism result, this reflects a category of procedures that are especially
advocated in qualitative research but that have a general applicability.

5. Asking respondents whether they agree with descriptions,


interpretations and conclusions; so-called member checks. This procedure
has a unique position in naturalistic research; the goal to be reached is
called 'credibility' (Guba, 1981). For this author, it is the ultimate criterion.
Its origins are often, rightly or wrongly, attributed to Schuetz's postulate of
adequacy (Schuetz, 1966). With Lincoln & Guba (1985) it also results from
their constructivist approach. As reality, in their view, is only a product of
our knowledge, it can not serve as a check on our interpretations; only the
human objects of our research are able to judge the researcher's
interpretations. For several reasons, however, 'member checks' are of
limited significance: (1) respondents' opinions are regarded as a rock
bottom, as the base for truthfulness of scientific statements; this exclusive
correspondence idea of truth has been an outmoded view since Popper's
theory of science; (2) respondents' opinions can only be assessed by asking
them, and this again introduces researcher's bias; (3) in many situations
respondents are stakeholders themselves and their answers can hardly be
viewed as trustworthy; (4) often respondents simply cannot give opinions:
children, mentally handicapped persons, people who do not speak the
A common base for quality control 27 criteria

researcher's language, etc. Perhaps as a consequence, in actual research


member checks are very rare. In a study of 200 articles in the symbolic
interactionist tradition Swanborn & van Zijl (1984) found only one instance
where member checking was applied.
These critical remarks do not imply, of course, that asking respondents
about the researcher's ideas and conclusions has no sense. On the contrary,
member checks should be tried more often than is the case now. It is quite
possible that in certain cases only a member check leads to intersubjectivity
among researchers, providing as it does a sudden insight as a correction for
different researchers' interpretations. In a special form the general idea is
applied in the well-known 'mirroring' of an interview; replaying the
interview tape for the interviewee a week later, and asking for corrections.
But reactions of respondents to the researcher's ideas have nothing special or
sacrosanct.
In Lincoln & Guba (1985) several other techniques for improving the
Scheme 2. The role of quality control procedures in quantitative and naturalistic research
Quantitative Naturalistic
Objectivation
Standardization
Repeated measurements
Explication
Member checks
quality of research results are advocated: prolonged engagement at a site;
persistent observation; the use of referential adequacy materials
(audio/visual recordings, or written documents); the use of overlapping
methods; stepwise replication. These procedures, however, do have a very
general if not trivial character ('the more data the better') and, in our survey
of procedures, do not require a special heading, nor is their significance
restricted to qualitative research.
Where does this enumeration of quality control procedures, as practiced
by quantitative resp. qualitative researchers, lead us (see Scheme 2)?
1. It does not seem too far-fetched to conclude that the procedures in
use in several research traditions may be conceived as different means to the
same end: reaching intersubjective agreement by firstly striving towards
fulfillment of the necessary condition of controllability, and secondly
aiming at reliable and valid measurements. Thus far, the hypothesis we
started with is not falsified. Research procedures for quality control in
qualitative research can be founded on the same base of regulative ideas as
those used in quantitative research.
2. A pragmatic explanation can be given for the several traditions
using different procedures. In quantitative research, quality control
28 Peter G. Swanborn
manifests itself mainly in calculating the strength of (cor)relations, with
regard to unsystematic errors (reliability) as well as systematic errors
(validity). The character of naturalistic research (in short: non-numerical
measurement) implies that these procedures simply can not be used.
Also, the fact that quantitative researchers do not cheer the procedures as
advocated by qualitative researchers is caused on the one side by the fact
that these procedures often seem superfluous to them in the light of the
farreaching codification of quantitative methods; a diary or log book or
other forms of communication with colleagues to clarify what exactly has
been done is not necessary if one applies a regression- or factor-analysis;
quantitative procedures are, in principle, perfectly controllable.
3. An excursion such as the one above leads us even to the conclusion
that researchers may profit by being less one-sided in their choice of quality
control procedures. In quantitative research, many steps, for instance
choosing a model of reality, selecting a technique for data collection,
carrying out data collection and data analysis, content validation (another
word: face validity!), controlling internal validity and construct validity, are
vulnerable to subjective biases, and could profit from explication, and
control by colleagues, as recommended by naturalistic researchers. One
might state that in quantitative research, by a strong preoccupation with
calculations, especially aiming at reliability of measurements, attention
regrettably has shifted away from other important problems.

3.1. Other criteria

The question is in order whether we have been righteously with respect to


qualitative research. Several naturalistic researchers are well-known for their
'lists of alternative criteria' as a replacement for traditional demands such as
reliability and validity. Our hypothesis can still be falsified if it can be
shown that these new 'methodological criteria' can only be derived from
other regulative ideas.
An ambitious attempt has been undertaken by Lincoln & Guba (1985).
These educational scientists have systematically replaced traditional criteria
by a set of parallel criteria. An example of their argumentation is, for
instance, that perfect inter-researcher reliability can not be reached while no
two researchers are exactly the same, no two observers will observe in
exactly the same way, or — another example — that perfect stability can not
be attained because no two points in time are identical in all respects. That is
why they introduce new concepts: confirmability and dependability,
respectively. Dependability, for example, is defined as stability after
discounting for such conscious and unpredictable (a second inquirer might
A common base for quality control criteria 29
choose a different path from the data!) changes (Guba & Lincoln, 1982). In
the same vein, external validity or generalizability is replaced by
transferability, and internal validity by credibility. The logical flaw in their
reasoning concerns the conclusion that a new concept is needed because
'100% stability' or '100% internal validity' cannot be reached. Accepting the
premises, this conclusion is not at all necessary. In our opinion, no new
regulative ideas are involved. 4
Confusion is also created by the pair of concepts internal reliability and
external reliability (LeCompte & Goetz, 1982). As with Guba & Lincoln,
their argument runs that as no two research situations are the same, and
factual replication under identical conditions is impossible, one can not
expect identical results. Their dichotomy originates from the specific
context of anthropological research. The famous Lewis/Redfield contrasting
results on the same Mexican village (and more of such pairs) inspired the
concept of external reliability: it "addresses the issue of whether
independent researchers would discover the same phenomena or generate
the same constructs in the same or similar settings". Internal reliability
refers to the convergence/divergence of procedures, interpretations, ideas of
multiple researchers in one multiple-site research study. Although maybe
acceptable in a restricted context, this unfortuitous pair of concepts cannot
be regarded upon as basic in methodology. The quest for external reliability
leads to procedures that can all be based on the explication canon: define
explicitly the situation and conditions of data collection, such as the
researcher's role, informant choices, theory and methods used. The search to
enhance internal reliability leads to recommendations for the team in the
field: use data on a low level of abstraction; employ well-trained observers,
and local cooperators; use peer examination and mechanically recorded data
(LeCompte & Goetz, 1982). In fact, almost the same procedural steps as
advocated by Guba in his plea for 'confirmability' (or, as we would prefer:
inter-researcher reliability).
To conclude: the procedures in order to enhance controllability and
intercollegial control as advocated by naturalistic researchers are very
welcome; application in quantitative research as well can only be
recommended. The introduction, however, of new criteria suggests that
each tradition in science keeps to its own criteria, and draws attention away
from the fact that in the end all these procedures are directed towards the
same basic goals.

4. An additional criterion for fundamental research: maximizing content,


or depth
30 Peter G. Swanborn
A central regulative idea is not, as we learned from Popper, the striving for
certainty, but the striving for informative knowledge. The argument runs as
follows. A combined striving for true and certain knowledge is not an
adequate combination of regulative ideas because one can very easily
materialize this aim by formulating results that have almost no content.
Standard example is the true and certain proposition:
"tomorrow it will rain or it will not rain".
The criterion of truth, therefore, has to be supplemented by some criterion
aiming at content. According to Popper, a scientific proposition about the
empirical world has to be falsifiable, that is to say that there exists at least
one proposition describing the real world that potentially falsifies the
original proposition. In this minimal form we already used the concept of
falsifiability as the legitimization of controllability in Section 2. As a
regulative idea it takes the form: 'strive for maximal information content'.
The more potential falsifiers a proposition has, the more content it has, and
the more stringently it can be tested. An example: the proposition
A: "tomorrow it will be above 20 degrees Celsius"
is more informative than the proposition
B: "tomorrow the temperature will be above 10 degrees Celsius",
the difference being that B's set of falsifiers is a strict subset of A's set of
falsifiers (the difference is constituted by those situations of the real world
with a temperature, tomorrow, between 10 and 20 degrees Celsius).
Formally the information content of a proposition can be enlarged in three
different ways:
— by enlarging the domain of the proposition (a proposition about the
inhabitants of Europe is more informative than the same proposition with
respect to the inhabitants of the U.K.); by making the if- or the 'the more-'
part of the proposition more general, less specific.
The 'if-part' contains the conditions under which the rest of the
proposition is true. The less specific these conditions are (the less
restrictions provided) the easier the proposition can be falsified and,
accordingly, the more informative it is (a weather forecast is not very
informative if it is preceded by a combination of ten or fifteen specified
conditions);
— narrowing the content of the rest of the proposition, or, in other words,
by making the 'then' part more specific.
A common base for quality control criteria31
It is not very difficult to forecast that under certain circumstances
people will show deviant behavior. It is far more difficult — and hence
informative — to forecast that people will show a very specific form of
deviant behavior.
In social science research the information content criterion is replaced by
the much vaguer concept of 'depth'. This replacement, that was already
made by Popper, is in order because relations between social science
theoretical propositions (and especially between theoretical and operational
propositions) very seldom have a strictly logical character, as is the case in
the temperature-example; they can almost never be represented in a
terminology of sets and subsets. The depth-criterion can be translated
approximately as generality. This brings us to the insight that information
content or depth may be expressed as the degree of 'external validity': the
kind of validity we have not yet discussed: one explores the limits of a
domain for some proposition. And in a comparable way, the concept of
construct validity may be interpreted in the context of enlarging information
content.
We now see that the earlier-discussed striving for independence from
researchers, circumstances and instruments can also be linked to the
regulative idea 'strive for maximal information'. To illustrate: a proposition
stating that a relation holds for let's say a period of some months has more
content than the same proposition referring to only the week the interviews
were held (because it has more potential falsifiers). The same holds for
independence from the inquirer and from instruments. It constitutes, as it
were, a first bottomline for information content. An empirical statement that
is dependent on the person of the researcher is not informative enough to be
taken seriously by the scientific community. Therefore, in Scheme 1 an
arrow is drawn from this regulative idea towards the methodological
demand for reliability.
To conclude, in fundamental science our goal is to formulate rather
specific propositions that are valid for a large domain of units of
observation under not too restrictive conditions. Herewith a 'quality
characteristic' of theories, the goal of fundamental research, is expressed. 5
In applied science, or policy research, striving for maximizing the content
of research propositions (beyond a bottomline constituted by the general
reliability and internal- and pragmatic validity demands) is not a regulative
idea. In policy research the overriding interest is in the specific target-
forpolicy domain: research conclusions have to refer to this domain here
and now; a policy agency is little interested in whether the research
conclusions refer to a larger domain.
32 Peter G. Swanborn
Absence of this striving for maximizing does not imply that in policy
research the concept of external validity is of no importance. Using this
concept in policy research means that we wonder whether results of our
samples, or cases, are generalizable to the target domain. Since often no
random samples are available, one has to look for other than statistical ways
to generalize. As the argument often takes the form of 'comparing cases',
going from one case to the other' on the basis of similarity, the procedure of
'thick description' is advocated, which, again, falls under the heading of
'explication ' .
The regulative idea 'striving for depth' is not found in every separate
research project, but it can surely be applied to most research programs. It is
not valid for idiographic science — if idiographic science is not a
'contradictio in terminis'.
5. An additional criterion for policy research: usability

Whilst in policy research one regulative idea, the striving for maximizing
information content, is lost, another one comes to the fore: the pragmatic
criterion 'does it work', or in short: usability. As a necessary condition for
usability one may, of course, refer to contractual criteria; these criteria
include obligations that stem from the contract between policy agency and
researcher, such as fixed intermediate and final dates for reports; but they
also refer to a basic agreement between policy agents and researchers on the
research goals,
Within the framework of usability, Leeuw (1983) differentiates between
implementary and strategic criteria.
By implementary criteria are meant criteria concerning the quality of the
theory or model as a base for policy measures:

a. Specificity. Is the proposed model sufficiently specified for the


target situation? Is it possible to deduce predictions for relevant subgroups
and conditions? Are the assumed functional relations specified with regard
to linearity, strength, step functions? etcetera. Leeuw (1983) labels this
property 'practical information content' in contrast to 'theoretical
information content' (see above).

b. Manipulability. Are the independent variables in the model


manipulable by policy agencies?

c. Expected impact.Are the expected results worthwhile? What is the


impact of policy advising based on manipulable variables compared with
A common base for quality control criteria33
the influence of non-manipulable variables? And if the influence of
manipulables is conditioned by non-manipulables, is this also clear? Are
there any sideeffects to be expected? Are the effects of the policy measures
transparent, also in the long run?
By strategic criteria (or: strategic feasibility) are meant factors that concern
the tuning of the theory or model to the target situation:

d. Reality content. Are the social conditions which are assumed in our
model present in the target situation? If important assumptions are not
realized, chances for success of an otherwise usable theory can be
annihilated. Does the policy system have power over organizational means?
Are other actors present (i.e. mass communication media) that exert
counterpressures? What are the costs of the program? Are there any actors
in the field who are willing to bear the responsibility for, and carry out the
new program?

e. Timing. It usually takes a lot of time to bring a policy to life, and


again a lot of time before effects can be measured. Is the implementation
time part of the model?

f. Social acceptability. A policy measure can be (un)acceptable or not


legitimized; a theory (i.e. the theory behind 'apartheid') can hardly be called
legitimized or not. With regard to policy research, one of the well-known
problems is, of course, whether the policy advice is in accordance with the
value patterns of the policy agency — and of the target groups. A general
formula might be: can it be asked from the several participating individuals
and groups in the field?
We stress the fact that the regulative idea of the striving for intersubjective
agreement with all its methodological derivatives (reliability, validity of
several kinds), as developed in mainstream social research, is a condition
for usability. Discussions about policy research methods in some quarters
used to be dominated by an antagonism between 'academic science' and a
distinct 'policy research paradigm'. If, however, no intersubjective
agreement on data quality can be reached, it seems useless to spend time on
usability. Important with regard to the domain (the extent of the target
groups) is, for instance, non-response in public opinion polling designs: in
the case of a select non-response the generalizability to target groups is in
grave danger.
34 Peter G. Swanborn
Notes

1. Compare Smaling (1992; 170): "Intersubjectivity in the sense of consensus,


unanimity, . . . may easily be recognized in the following methodological ideas:
intersubjective verifiability, . . interobserver agreement, interobserver reliability." The
author overlooks the essential difference between verifiability as a necessary condition on
the one hand, and reliability as a goal to be reached on the other.
Smaling mentions three 'traditional concepts of methodological intersubjectivity':
consensual intersubjectivity, intersubjectivity by regimentation and intersubjectivity by
explicitness. The latter two refer, in our view, to procedures or conditions to better reach
the former. Smaling continues by mentioning argumentative and dialogical
intersubjectivity. The first one concerns the democratic character of scientific discussions,
referring to 'the ideal speech situation' as well as Habermas' 'Herrschaftsfreie Dialog'; the
second refers to the interaction between researcher and the objects of his research; the
procedure of member's checks fits into it. In comparison with our analysis, Smaling leaves
out objectivation and repeated measurements (not surprising for a qualitative
methodologist!) and he adds the democratic character of the scientific discourse. We do
not object, of course, against this addition, but regard it as self-evident.
2. It is remarkable that qualitative researchers, in characterizing the leading ideas of their
opponents (positivist and postpositivist researchers) still use the criterion of objectivity
instead of intersubjective agreement, together with reliability, internal and external
validity (Guba & Lincoln, in Denzin & Lincoln, 1994; 114). One of the aims of our
contribution is to clarify that these four criteria can not simply be juxtaposed.
3. We keep to the Cook & Campbell definition. In the Cronbach tradition, the concept of
internal validity is interpreted in a more general sense as the ability to generalize from
empirical results to the population, not only with respect to causal propositions, but also
with respect to descriptions and interpretations. This approach, mainly adopted by
qualitative researchers, seems to have the advantage of an elegant parallelism to external
validity. Generalizability to the researched population and generalizability to other
populations, times, circumstances are, however, more or less points on a continuum;
besides, the Cook & Campbell definition is still the one most widely used (Cook &
Campbell, 1979; Cronbach et al., 1980; Cronbach, 1983; Denzin & Lincoln, 1994).
4. Lincoln & Guba (1985) developed still another set of criteria (besides the criteria
mentioned in Section 3), called 'authenticity criteria' for naturalistic (in later publications
they replaced the term by 'constructivist') research. This set refers to 'fairness' in the
relations between researchers and researched; a balanced representation of the views of all
stakeholders; the capacity of the project to initiate change, and so on and so forth. These
new criteria could, eventually, be classified under the label 'social acceptability' or under
one of the other derivatives of the regulative idea 'usability'. None of these criteria refers
to the quality of design, data collection and data analysis.
Criteria such as these, however, fit into a 'postmodern', •poststructural' or constructivist
paradigm, in which 'the aim of inquiry is understanding and reconstruction of the
constructions that people hold, aiming toward consensus but still open to new
interpretations as information and sophistication improüe ' (Guba & Lincoln, in Denzin &
Lincoln, 1994; 113). We do not equate these ideas with qualitative research. Some of the
general ideas of the constructivist paradigm, as expressed in many contributions,
especially those of the editors, in Denzin & Lincoln (1994), are too remote from those
represented in this article to search for a common denominator. Although important in the
context of action research and evaluation, especially in organizations research, these
criteria do not refer to the quality of design, data collection and data analysis, but to ethics
and 'change agency'.
A common base for quality control criteria 35
5. A relation exists between this regulative idea and the methodological norm that goes
under the name of simplicity, elegancy and - most often — parsimoniousness. Striving
after simplicity means, among other things, in theory-building: one abstains from ad hoc
adjustments which mainly have the form of complicating the initial conditions or
restricting the domain of propositions. Probably the linking sentence: "the less the number
of parameters in the model, the more informative it is" is valid.

References

Blaikie, N. W. H. (1991). A critique of the use of triangulation in social research. Quality


and Quantity 25: 115-136.
Cook, Th. D. and Campbell, D. T. (1979). Quasi-Experimentation. Chicago: Rand McNally.
Cronbach, L. J. , et al. (1980). Toward Reform of Program Evaluation. San Francisco: JosseyBass.
Cronbach, L. J. (1983). Designing Evaluations of Educational and Social Programs. San
Francisco: Jossey-Bass.
Denzin, N. K. & Lincoln Y. S. (eds.) (1994). Handbook of Qualitative Research. Thousand
Oaks: Sage.
Goetz, J. P. & LeCompte, M. D. (1981). Ethnographic research and the problem of data
reduction. Anthropol. and Education Quart. 12: 51—70.
Guba, E. G. (1981). Criteria for assessing the trustworthiness of naturalistic inquiries. Educat.
Comm. Technol. J. 29: 75-91.
Guba, E. G. & Lincoln, Y. S. (1982). Epistemological and methodological bases of
naturalistic inquiry. Educat. Comm. Technol. J. 30: 233-252.
Kant, I. (1797). Kritik der reinen Vernunft, 2nd edn.
LeCompte, M. D. & Goetz, J. P. (1982). Problems of reliability and validity in ethnographic
research. Review of Educat. Res. 52: 31—60.
Leeuw, F. L. (1983). Bevolkingsbeleid en reproductief gedrag, Doct. Diss. Leiden
University.
Lincoln, Y. S. & Guba, E. G. (1985). Naturalistic Inquiry. Newbury Park: Sage.
Miles, M. B. & Huberman, A. M. (1994). Qualitative Data Analysis: an Expanded
Sourcebook, 2nd edn. Thousand Oaks: Sage.
Popper, K. R. (1966). Of clouds and clocks, pp. 61—73, in: Objective Knowledge. London:
1973. Schuetz, A. (1966). Studies in Phaenomenological Philosophy, Collected Papers. The
Hague: Nijhoff.
Smafing, A. (1992). Varieties of methodological intersubjectivity — the relation with
qualitative and quantitative research, and with objectivity. Quality and Quantity 26: 169—
180.
Smith, J. K. & Heshusius, L. (1986). Closing down the conversation: the end of the
quantitativequalitative debate among educational researchers. Educat. Researcher 15: 4—
12.
Swanborn, P. G. & van Zijl, P. (1984). Interactionists do it only symbolically. Mens en Maatschappij 59:
142—164.

You might also like