Chapter 3 Kazdin

You might also like

You are on page 1of 11
Chapter 3 oan Drawing Valid Inferences IT Construct and Statistical Conclusion Validity CHAPTER OUTLINE Construct Validity Threats to Construct Vlidty ‘tention and Contact wit the Clients Single Operations and Narrow Stimulus Sampling Experimenter Expecancies Cues of the Experimental Situation General Comments Statistical Conclusion Validity ‘Overiew of Escential Validity Suita Tests and Decision Making Effect Sze “Threats to Statistical Conclusion Validity Low Statistical Power Variability inthe Procedures Subject Heterogeneity Unseiabilty ofthe Measures Multiple Comparisons and Error Rates General Comments Experimental Precision and Its Price Summary and Conclustons Internal and external validity are fundamental to research and nicely convey the ‘underpinnings fer many methodological practices, Two other types of validity, referred to as construct validity and statistical conclusion validity, must also be addressed to draw valid inferences. These types of validity are no less central 10 research design. Yet, they are less familiar to researchers and consumers of r= 40 Drawing Valid inferences 41 search and reflect slightly more complex concepts and design considerations than do internal and external validity. This chapter considers constrict and sta tistical conclusion validity and the interrelations and priorities of the different \ypes of validity. As in the previous chapter, the goal is to describe the nature cof these types of validity and the threats they raise. Subsequent chapters focus fn strategies to address these threats, CONSTRUCT VALIDITY Construct validity has to do with interpreting the basis of the causal relation demonstrated within an experiment, The meaning requires careful delineation of construct from internal validity. Internal validity, as you recall, focuses on ‘whether some intervention is responsible for change of whether other factors (eq, history, maturation, testing) can plausibly account forthe effect. Assume fora moment that these threats have been ruled out by randomly assigning sub- jects to treatment and control groups, by assessing both groups in the same way and at the same time, and so on. We can thus presume that the group differ: fences are not likely to have resulted from threats to internal validity but from the intervention. At this point the discussion of construct validity can begin. What is the intervention and why did it produce the effec Is the reason for the relation, between the intervention and behavior change due to the construct (explana: tion, interpretation) given by the investigator? Construct validity addresses the presumed cause or the explanation of the causal relation between the interven: ton or experimental manipulation and the outcome.* Several features within the experiment can interfere with the interpretation. ‘These are often referred to as confounds. We say an experiment is confounded ‘or that there is a confound to refer to the possibly that a specific factor varied (or covaried) with the intervention that could in whole or in part be responsi ble for the change. In an experiment, some component other than the one of Interest to the investigator may be responsible for change, Features associated, ‘with the intervention that interfere with drawing inferences about the basis for the difference between groups are referred to as threats to construct validity, ‘Threats to Construct Validity Attention and Contact with the Clients Attention and contact accorded the lent in the experimental group or differential attention across experimental and control groups may be the basis for the group differences and threaten construct validity. The intervention may have exereed its influence because of the atten. tion provided, not because of special characteristics unique to the intervention A familiar example from psychiatric research isthe effect of placebos in the ad ministration of medication. Suppose investigators provide a drug for depression tosome patients but no drug to other patients. Assume Further that groups were formed through random assignment and that the threats to intemal validity 2 capteRs ‘were all superbly addressed. Atthe end of the study, patients who had received the drug are greatly improved and significantly diferent from patients who did not receive the dug. The investigator may then discuss the effect of the drug and how the particular medication affets critical biological processes that con- trol symptoms of depression. We accept the fact that the intervention was re- sponsible forthe outcome (internal validity) Yet the intervention consists of all aspects associsted with the administration of the medication in addition to the ‘medication itself. We know that taking any deug might decrease depression be- ‘ease of expectancies for improvement on the par ofthe patients and on those administering the drug, Indeed, such expectancies can exert marked therapeu lie effects on a variety of psychological and medical dysfunctions (White, ‘Tursky, & Schwartz, 1985). The intervention might have been effective because of such expectations and the change they generate. In the present example, these effects were not examined; thus, the investigator cannot identify the ex: planation of the effec. ‘To examine the basis for the effects (construct validity, it would be essen: tial to include a third group that received a placebo on the sime schedule of ad ministration. A placebo is a substance that has no active pharmacological _propenies that would be expected to produce change. neutral substance that, {is known t© be inactive in relation to the clinical problem is used, Administ: von ofa placebo to another group would be exttemely useful in this example to address construct validity. A placebo might be a pil, capsule, or tablet of the same size and shape (and perhaps share other characteristics ofthe active med- ication). Those who administer the drug (physicians or nurses) and those who receive the drug should be naive (Ltn) (tle condos « which subjects are assigned, Thus, expectations for improvement might be constant between drug and placebo groups. With a placebo-control group, attention and contact with the client and expectations on the part of experimenters of clients become less plausible constructs to explain the effects the investigator wishes to attribute 10 the medication. I a somewhat parallel fashion, teatment and no-treatment are often com pared in psychotherapy research. The treatment may, for example, focus on cognitive processes that the investigator believes to be exitical to the clinical problem, Assessment completed after treatment may reveal that the treatment up is significantly better than the no-teatment group on various outcome ‘measutes. I the investigator explains the finding as support for the importance of altering cognitions or using this particular treatment, there is a problem in construct vality. We must question whether plausible features associated with the intervention ought to be ruled out In fac, the treatment and no-teatment groups differ on several dimensions, such as providing regular meetings with a therapist, generating patient expec. tations for improvement, and providing a palpable effort to resolve or acklress the client's problems. Might these dimensions plausibly improve symptoms ‘even if cognitions are not the focus of treatment? Some writers about psy chotherapy answer affirmatively (Eysenck, 1995; Frank & Frank, 1991); in adeli- Drawing Vala byerences i! 43 tion, evidence suggests that when control subjects are led to expect improve- ment, they offen improve irespectve of whether they received a veridical teat ment (Bootzin, 1985). Hence, in a psychotherapy study, if the investigator ‘wishes to explain the findings in terms of specific mechanisms or processes of the treatment (e.g, changes in cognitions, therapeutic alliance), attention and expectations associated with the treatment ought to be controlled, Otherwise, cther constructs could plausibly explain the findings. We discuss this Further ‘when contro! groups are examined (Chapter 6). Im general, there isa threat to construct validity when attention, contact with the subjects, and their expectations might plausibly account for the fin ings and have not been contolled for or evaluated in the design. A design that ‘does not control for these factors is not necessarily flawed, The intention of the investigator, the control procedures, and the specificity of the conclusions the investigator wishes to draw determine the extent co which construct validity threats can be raised. Ifthe investigator wishes to discuss why the intervention achieved its effects, atention and contact ought to be ruled out as rival inter pretations of the results. Single Operations and Narrow Stimulus Sampling In any study, the in- Vestigator is usually interested in some general way in a phenomenon, variable, intervention, and its effects. For example, the investigator may believe that « particular intervention will reduce anxiety. The investigator develops procedures to test the idea, in moving from the idea to the specific procedure, we know that decisions may raise concerns over external validity, that is, whether the proce: dures used will produce results that generalize to the situations we have in ‘mind, The way in which the idea is operationalized may also affect construct va lity, that is, our ability to decicle whether the treatment of interest or some fea ‘ure assoctated with the intervention is responsible for the results In many studies, the intervention is operationalized so that i is associated! with and is inseparable from features that are assumed to be irrelevant by the investigator. These irelevancies may contribute tothe results and help explain the basis ofthe intervention effect. For example, two different weatments might be compared. Let us say we recruit therapists exper in treatment A to adminis: ter that treatment and other therapists skilled in teatment B to administer that treatment. Thus, diferent therapists provide different treatments, Ths is reason- able because we may wish to use experts who practice their special techniques At the end of the study, assume that therapy A is better than B in the outcome achieved with the patient sample. Because therapists were diferent for the two treatments, we cannot realy separate the impact of therapists from treatment ‘We might say that veatment A was better than teatment B. Yet, a colleague ob- sessed with construct validity might propose that therapists who administered treatment A may have simply been much better therapists than those who acl ‘ministered treatment B and that therapist competence may account for the 1 sults. The confound of treatment with therapists raises a significant ambiguity ‘There is a more subtle variation that may emerge as a threat to construct 4 carries validity. Suppose we are comparing two treatments and we use one therapist. ‘his therapist provides both treatments and sees clients in each ofthe treatment conditions. At the end of the investigation, suppose that one treatment i clearly ‘more effective than the other. The investigator may wish to discuss how one technique is superior and explain on conceptual grounds why this might be ex- pected. We accepe the finding that one intervention was more effective than the ‘other, In deference to construct valiity we ask what the intervention was. The compacison consisted of the therapist giving treatment A versus the same ther- pist giving teatment B. With one therapist, tis possible that the different out comes are somewhat due to the treatment and somewhat to how the therapist administered the treatments (eg. enthusiasm, expectancies, ideliy). We cannot separate the influence ofthe therapist combined with the trestment in account= ing forthe results ‘One might say that the therapist was "held constant” because he or she was Used in both groups. Buti is possible thatthe therapist was more credible, com- fortable, competent, and effective with one technique than with the other, Per- haps the therapist believed in the efficacy of one technique more than another, Performed one technique with greater fidelity than the other, or aroused pa tients’ expectancies for improvement with one technique. The differential effects of treatment could be due to the interaction of the therapist xtreatment, rather than a main effect of treatment. The study yields somewhat unambiguous results because the effect of the therapist was not separable in the data analyses from the different treatment conditions. Inthe history of psychotherapy outcome re- search, there are many examples in which one therapist administered two or mote treatments (Elis, 1957, Lazarus, 1961, Shapiso, 1969). tn these cases, die more of most effective treatment was the one developed by the therapist— investigator and predicted t0 be more effective, We tend to be skeptical ofthe results until they are replicated because the particular therapist in combination with one of the treatments (iherapist x treatment effect) may have been re- sponsible for the effects, rather than the treatment alone. Construct validity could be improved by sampling across a wider range of conditions associated with treatment delivery (ie, therapists) so the effects of Uweatment can be evaluated in the design. Two or more therapists could be i cluded, each of whom would administer both treatments, atthe end of the study, ‘he impact of therapists could be separated from the impact of treatment (eg. by an analysis of variance. Ifthe effectiveness of treatment varied between the ther- pists, this could be detected in the interaction (treatment x therapist) term, ‘Asanother example, considera laboratory experiment designed to evaluate ‘opinions held about mental ilness. The purpose is o see whether people eval- uate the personality, intelligence, and friendliness of others differently if they be- lieve the other persons have been mentally ill. College students are the subjects and are assigned randomly to one of two conditions. In the experimental con: sition, the students see a slide of a 30-year-old man. They then listen to a tape ‘hat describes him as holding a factory job and living at home with his wife and ‘Wo children, The description slso includes a passage noting that the man has Drawing Valid ferences 45 bbcen mentally ill, experienced strange delusions, and was hospitalized 2 years ago. In the control condition, students see the slide and hear the same descrip. tion except for the passages that talk about mental illness and hospitalization. At the end of the tape, participants rate the personality, intelligence, and friend ness ofthe person in the slide. Alas, the hypothesis is supported-—subjects who heard the mentally ill description showed greater ejection of the person than subjects in the control group, ‘The investigator wishes to conclude that the content ofthe description that focused on mental illness isthe basis for the group differences. After al, this is the only part of the content ofthe slide and description that distinguished ex- perimental and control groups. Yet, there isa construct validity problem here ‘The use ofa single case in the slide (ie. the 30-year-old man) is problematic. It is possible that rejection of the mental iiness description occurred because of special characteristics of this particular case presented on the slide. The differ {ence could be du to the manipulation of the description of mental illness or 10 the interaction of the description with characteristics of this case. One would ‘want slides of different persons varying in age, sex, and other characteristics so ‘hat mentabillness status could be separated from the specific characteristics of the case. In general, i is important to represent the stimuli in ways so that po- tential irelevancies (e.g, the case, unique features of the task) can be separated from the intervention or variable of interest. Without separating the irrelevan- ies, the conclusions of the study are limited (Maher, 1978a), ‘The use of a narrow range of stimuli and the limitations that such use im poses sound similar to external validity. Actually, sampling a narrow range of sumull asa threat can apply to both external and construct validity. Ifthe in: vestigator wishes to generalize o other stimulus conditions (e.g, other ther- apists or types of cases in the above two examples), then the narrow range of stimulus conditions is asa threat 10 external validity. To generalize across stimulus conditions of the experiment requires sampling across the range of these conditions if i is plausible thatthe conditions may influence the results (Brunswik, 1955). If the investigator wishes to explain why a change oc curred, then the problem is one of construct validity because the investigator cannot separate the construct of interest (e.g, treatment or types of descrip. tion of treatment) from the conditions of its delivery (e.g. the therapist or ‘case vignette) Experimenter Expectanctes In both laboratory and clinical research, itis quite possible that the expectancies, beliefs, and desires about the results on the part of the experimenter influence how the subjects perform? The effects are sometimes referred to as unintentional expectancy effectsto emphasize that the ‘experimenter may not do anything on purpose to influence subjects' responses, ‘Depending on the experimental situation and experimenter-subject contact, eX- pectancies may lead to changes intone of voice, posture, facial expressions, de- livery of instructions, and adherence to the prescribed procedures and hence Influence how participants respond, Expectancy effects are a threat to constrict 46 cHapmrs valicityifthey provide a plausible rival interpretation of the effets otherwis tsbuted to the intervention. Expectancy effects received considerable attention in the mid-1960s, pri- marily in the context of sociat psychological research (Rosenthal, 1966, 1976). However, i isnot difficult to imagine thelr impact in clinical research. In teat ment research, the expectancy effects might be suspected in siuations in which the experimenter has a stong investment in the outcome and has contact with subjects in various treatment and control conditions. For example, in therapy ‘outcome studies (Ellis, 1957, Lazarus, 1961; Shapiro, 1989), the weatment devel: oped by the investigator surpassed the effectiveness of other conditions t0 ‘which the treatment was compared. In each case, the investigator was the ther- apist for all ondltions, Is plausible that expectancies of the investigator might be quite different for the treatments, and perhaps these were conveyed to the individuals during the course of treatment or assessinent, Expectancies might be much less plausible if the treatment was nos preferred by the investigator or was less effective than one of the other treatments. Because of the investigator's po- sition, expectancy as 2 possible threat to construct validity cannot be easly ¢is- missed. We would very much want to see the study replicated with more and cifferent therapists and perhaps even assess therapist expectancies at the be- ginning of the study to see whether they correlated with outcome. The notion of experimenter expectancies as a threat 10 validity i infre- ‘quently invoked for at least ewo reasons. First, both the construct and the way's through which achieves its effects are unclear. Second and related, many more parsimonious interpretations may serve as confounds before the notion of ex Pectancies needs 10 be invoked. For example, differential adherence of the ex Perimenter to the conditions, explicit and differential instructions to subject, ‘and changes in the measurement criteria (instrumentation) for subjects in df= ferent conditions might reflect more concretely why two conditions differ in their effects. Nevenheless, in a given situation, expectations on the part of the experimenter may plausibly serve as a source of ambiguity and threaten the construct validity of the experiment Cues of the Experimental Situation Ces of the situation refer tothe seem ingly ancillary Factors associated with the intervention that may contribute to the results. These cues have been referred to as the demand characteristics ofthe ex- perimental situation (Ome, 1962). Demand characteristics may include sources Of influence such as information conveyed to prospective subjects prior to their arrival to the experiment (e.g, rumors about the experiment, information pro- vided during subject recruitmend, instructions, procedures, and any other fea- ‘ures ofthe experiment that may seem incidental to the overall manipulation The influence of cues in the experiment distinct from the independent vari- able was dramadcaly illustrated in a study that examined the role of demand characteristics in a sensory-deprivation experiment (Ome & Scheibe, 1964), Sensory deprivation consists of minimizing forthe subject as many sources of sensory stimulation as possible. Isolating individuals from visual, auditory, tac- Drawing Valid inferences 47 tle, and other stimulation for prolonged periods has been associated with dis torted perception, visual hallucinations, inability to concentrate, and disorienta- tion. These reactions usually are attributable 0 the physical effects of being, deprived of sensory stimulation, Ome and Scheibe suggested that cues from the experimental situation in which sensory deprivation experiments are conducted ‘might contribute to the reactions. They completed an experiment in which sub- jects were exposed to the accouterments of the procedures of a sensory deprivation experiment but actually were not deprived of stimulation, Subjects received a physical exam, provided a short medical history, were assured that the procedures were safe, and were exposed to a tay of drugs and medical in sruments conspicuously labeled Emergency Tray. Subjects were told to report ‘any unusual visual imagery, fantasy, or feelings, difficulties in concentration, dis- Otientation, or similar problems. They were informed that they would be placed in a room to work on an arithmetic task. If they wanted to escape, they could ddo so by pressing a red “emergency alarm.” In shor, subjects were given a va Flety of cues to convey that strange experiences were in store. ‘The subjects were placed in the room with food, water, and materials for the task, No attempt was made to deprive subjects of sensory stimulation. They could move about, hear many different sounds, and work at a task. This ar rangement departs from true sensory-deprivation experiments in which the subjects typically rest, have their eyes and ears covered, and cease movement as much as possible. A control group in the study did not receive the cues, preparing them for unusual experiences and were told they could leave the room by merely knocking on the window. At the end of the isolation period, the experimental group showed greater deterioration on a number of measures, the reporting of symptoms charactersticaly revealed in sensory-deprivation e%. Periments. Although sensory deprivation was not administered, the cues Ust ally associated with deprivation studies may have contributed to or accounted forthe results Demand characteristics can threaten the construct validity if tis plausible that extraneous cues associated with the intervention could explain the findings ‘The situation described here conveys the potential impact of such cues. Whether demand characteristics can exert such impact in diverse areas of research is not clear. Also, in many areas of clinical research, the independent variable may in clude cues that cannot be so easly separated from the portion of the manipu: lation that is considered to be crucial. For example, different variations of treatment or levels of an independent variable (e.g, high, medium, and low) ‘may necessarily require different cues and hence be intertwined with different demand characteristics. The cues that may give subjects hints on how to perform ‘may not be considered as extraneous but as part of the manipulation itself. In such cases, it may not be especially meaningful to note that demand character {istics accounted for the results, ‘On the other hand, when several conditions are different ftom control con: ditions (eg., a no-treatment control group), one might weigh the plausibility of ‘demand characteristics as an influence, It may be that an implicit demand con. 48 curr 3 veyed to contol subjects is that they are not expected to improve from one test ccccasion 10 another. Presumably if cues were provided to convey this expecta: tion, treatment and no-teatment differences might be due to diferent demand ‘characteristics across the assessment conditions. The means of evaluating de- mand characterises are discussed further in Chapter 13, General Comments The discussion has noted common threats to construct validity, However, con: ssnuct validity threats are not easily enumerated because the threats have to do \ith inerpretaton of the reason for the outcome in an experiment, Thus, theo- retical views and substantive knowledge about how the experimental manipu- lation works or the mechanisms responsible for change are also at issue, apart from the issue of experimental confounds. The questions of construct validity are twofold: What isthe intervention? and Why did this intervention lead 10 ‘change? The fist question emphasizes thatthe intervention may be embedded in or confounded by other conditions that influence and account for the out: ‘ome. The second question emphasizes the related issue ofthe interpretation of ‘what led the intervention to change performance. Here we do not speak of com- found as much as better understanding ofthe mechanism, process, or theory 10 explain the change. The questions encompass construct validity because they a= fect interpretation of the basis for a given finding. STATISTICAL CONCLUSION VALIDITY. Internal, extemal, and construct validity and their threats codify many of the concerns to which methodology is directed. The list of concerns is already long; ‘what more can remain? Actually, a great deal. Assume we have designed our ‘wonderful experiment to address the bulk of the threats alzeady highlighted Shall we find reliable differences between the groups? Even ifthe intervention snd control conditions would produce differences in their outcomes, whether ‘ve find such differences depends on multiple considerations. Statistical con- ‘clusion valtaity fers to the facets of the quantitative evaluation that influence the conclusions we reach about the experimental condition and its effect Statistical evaluation often is viewed and taught from two standpoints. The fist pertains 10 undersanding the tests themselves and their bases. This facet rmay emphasize what the tests accomplish and the formulae and derivations of the tests. The second and complementary standpoint pertains to the comput- tional aspects of statistical tests. Here application of the tests to data sets and their interpretation are emphasized. There is another facet that might be con- sidesed as a superordinate level, namely, the role of statistical evaluation in re lation to research design and other threats to validity. Statistical conclusion validity reflects this level of concer with quantitative evaluation and is often the Achilles’ heel of research. Because this type of validity is often neglected, fi Drawing Val Inforences 49 Lure to consider statistical issues commonly undermines the quality of an inves: tigation, There are several facets ofthe results and statistical evaluation that can obscure interpretation ofthe experintent. These are referred to as threat ost Astical conclusion vakty. Overview of Essential Concepts Statistical Tests and Decision Making Before discussing threats to validity itis important to review a few of the essential concepts of statistical evaluation In most psychological research, the conclusions in an experiment depend heav- ily on hypothesis testing and statistical evaluation, The null hypothesis species that there are no differences between groups e.g, treatment ¥s. control group) Statistical tests are completed to evaluate whether the differences that are b- tained are reliable or beyond what one is likely t0 find due to chance fucta tions. We can reject the null hypothesis of no difference if we fine a statistically significant difference or accept the null hypothesis if we do not. The rejection and acceptance of hypotheses are weighty topics, only part of which we can {ueat here. The decision-making process is based on selecting a probability level that specifies the degree of risk of reaching a false conclusion. Ifthe statistical differences between groups passes this probability level, we state that the di ference is reliable and represents an effect of the intervention, Ifthe difference fails to pass the threshold, we say thatthe difference is not statistically signif ‘cant and that the groups ate not different. Is important to note at this point that the present discussion makes critical assumiptions that are not fully agreed on in science generally or psychological re search in particular. The utility of statistical tess and probability levels (alpha) as basis for drawing inferences, atleast as currently practiced, isa matter of ce ‘bate (Schmidt, 1996b; Thompson, 1996). Many of these issues are addressed in Chapter 14. In our present discussion, dese issues are skirted in recognition of the fact that the bulk of research in psychology is based on drawing inferences from statistical evaluation. AS such, there are common weaknesses of research that can be identified under the rubric of statistical conclusion of validity, Figure 3.1 notes the outcomes of an investigation on the basis of conclu sions we might draw from statistical evaluation. The four cells represent the ‘combination of our decision (there is a difference vs there is no difference) ancl the true state of afairs ithe world (here really sa difference or there is no li erence). Our goat in experimentation isto draw conchisions that reflect the wue state ofafais in the world. That is, if there are dllferences between two Or more conditions (ie, ifthe intervention is truly effective), we wish to reflect the dif- ferences in our decision (Cell B). If there is no difference between the condi- tions in the world, we would like to conclude that fact also (Cell C). Oc

You might also like