You are on page 1of 13

Observational Study

PAUL R. ROSENBAUM
Volume 3, pp. 1451–1462

in

Encyclopedia of Statistics in Behavioral Science

ISBN-13: 978-0-470-86080-9
ISBN-10: 0-470-86080-4

Editors

Brian S. Everitt & David C. Howell

 John Wiley & Sons, Ltd, Chichester, 2005

In such an experiment. It is the unmeasured characteristics vational study. or to assign subjects at ran- dom to different procedures. concluded: tionships [. and in light of this. Hidden acteristics and characteristics that were not or could biases are addressed partly in the design of an obser- not be measured. Wortman. and (c) the investigator Close Relative can control the assignment and delivery of treatments. random assignment tends to make the which are discussed in the Section titled ‘Adjusting groups comparable both in terms of measured char. In an attempt to estimate the long-term psychologi- ments that are harmful or unwanted. Contrasting their findings with the views of Bowlby and Freud. called hid- ferences in outcomes after treatment reflect effects of den biases. More precisely. Experiments cannot ethically be used to study treat. Later sections refer to these examples. Overt biases are removed comparable groups. and ages of children. in the sense of being able Contrary to what some early writers have sug- to impose the procedures or treatments whose effects gested about the duration of the major symptoms .Observational Study it is desired to discover. 23]. another to control – so if a common statistical test rejects the hypothesis that the difference is due to chance. differences rather than effects of the treatments [12. they matched for gen- empiric comparison of treated and control groups in der. be measured but are suspected to exist. family income before the crash. they the objective is to elucidate cause-and-effect rela. then a treatment effect is demonstrated Examples of Observational Studies [18. the effects caused by treatments are them. and those that have not comparable groups prior to treatment ensure that dif. stratification or teristics describing the subjects before treatment. in which it] is not feasible to use controlled experimentation. When subjects are not assigned to treatment or con- Observational Studies Defined trol at random. 22]. 5. jects to treatment or control. When death of a spouse or a child in a car crash. described. a consequence. for Biases Visible in Observed Covariates’. random assignment and partly in the analysis of the study. differing outcomes may reflect these initial investigated in experiments that randomly assign sub. Random assignment uses chance to form in observational studies. subjects Long-term Psychological Effects of the Death of a consent to be randomized. trols drawn from 7581 individuals who came to renew Cochran [12] defined an observational study as an a drivers license. the effects matched 80 bereaved spouses and parents to 80 con- of treatments are examined in an observational study. when subjects select their own treat- ments or their environments inflict treatments upon In the ideal. Lehman. . by adjustments. sured. and are not practical when subjects refuse to cede control Williams [30] collected data following the sudden of treatment assignment to the experimenter. discussed in the ensures that the only differences between treated and Section titled ‘Appraising Sensitivity to Hidden Bias’ control groups prior to treatment are due to chance – and ‘Elaborate Theories’ (see Quasi-experimental the flip of a coin in assigning one subject to treat. 15. As covariance adjustment (see Analysis of Covariance). such as matching. tion is not used. it does not use measured charac. of Observational Studies’ and ‘Elaborate Theories’. discussed in the Section titled ‘Design that present the largest difficulties when randomiza. called overt biases. Pretreatment differences or selection biases are comparable groups are compared under competing of two kinds: those that have been accurately mea- treatments [1. Removing overt biases and addressing the treatment (see Clinical Trials and Intervention uncertainty about hidden biases are central issues Studies). Internal Validity. Designs. (b) the best treatment is not known. ment. ments under study are either harmless or intended and expected to benefit the recipients. They experiments are not ethical or not feasible. . thereby ensuring that 59]. age. Experiments with human subjects are often ethical Several examples of observational studies are and feasible when (a) all of the competing treat. education which: level. and experiments cal effects of bereavement. and External Validity). Specifically. number.

. early stages of planning or designing an observational lent criminals are always congruent with their violent study attempt to reproduce. . group (b) is a ously [43]. Finally. finding that parents who had higher felons reduce criminal violence? It would be difficult.” some of the strengths of an experiment [47]. p. An analysis that does not carefully ward in time from the attempt to purchase a hand. they compared exposed experiment. as nearly as possible. These differences in terminology reflect certain ior as consistent with stable patterns of thought and differences in emphasis. For instance. policy. but substantial biases may 59]. both spouses and parents in our who were exposed to lead at work. They matched 33 study showed clear evidence of depression and lack children whose parents worked in a battery factory to of resolution at the time of the interview. and Rivara [68] compared A treatment is a program. In addition. . and used Wilcoxon’s signed rank test to compare the level of lead found in the children’s Effects on Criminal Violence of Laws Limiting blood. Wright. Presumably. and yet an observational study faces sub- children whose parents had varied hygiene upon leav- stantial difficulties as well.2 Observational Study of bereavement . and (b) individuals whose purchase was is called a covariate. A variable measured whose purchase was denied because of a prior felony prior to treatment is not affected by the treatment and conviction. Morton. However. If this presumption were correct. distinguish covariates and outcomes can introduce gun. The comparison looked for. in the specific sense that they contrast what Effects on Children of Occupational Exposures to did happen to a subject under one treatment with Lead what would have happened under the other treatment. and writes: “. in as natural experiments [36. in principle. Roberts. to be counterfactual (see Counterfactual Reason- ing). may be applied to or withheld viduals who attempted to purchase a handgun but from any subject under study. Causal effects cannot be calculated for individuals. (1997. group (a) potential but unobserved outcome the subject would would be more similar to group (b) than to typical have exhibited under the alternative treatment [40. Causal effects so defined are sometimes said remain. or intervention two groups of individuals in California: (a) indi. neighborhood. Wintemute. barred from handgun purchases to the rate among all other individuals permitted to purchase handguns. criminal actions. After all. Owens. finding that poor estimate the effects of such a law by comparing hygiene of the parent predicted higher levels of lead the rate of criminal violence among convicted felons in the blood of the child. the treated-minus-control difference . Athens experiments [61] (see Quasi-experimental Designs). they compared exposed children whose parents had varied levels of exposure to lead Do laws that ban purchases of handguns by convicted at the factory. which. convicted felons may be more prone to Design of Observational Studies criminal violence and may have greater access to illegally purchased guns than typical purchasers of Observational studies are sometimes referred to handguns without felony convictions. The effect caused by a treatment is a mixture of some individuals who did not commit comparison of the outcome a subject exhibited under the felony for which they were arrested and others the treatment the subject actually received with the who did. One could not reasonably ing the factory at the end of the day. which was 33 unexposed control children of the same age and 5 to 7 years after the loss occurred. purchasers of handguns. in a random- lead brought home in the clothes and hair of parents ized experiment. Silberg. but a shared theme is that the interaction. 68) depicts their sporadic violent behav. called an outcome. the self-images of vio. Saah. exposures on the job in turn had children with more perhaps impossible. 56] or as quasi- his ethnographic account of violent criminals. recording arrest charges for new offenses in the biases into the analysis where none existed previ- subsequent three years. to study this in a randomized lead in their blood. and because each individual is observed under treatment Saah [39] asked whether children were harmed by or under control. finding elevated levels of lead in exposed Access to Handguns children. . but not both. A variable measured after treat- approved because their prior felony arrest had not ment may have been affected by the treatment and is resulted in a conviction.

perhaps reflecting sured. tain special circumstances. what separates a class of treated and control groups. For . but whose For instance. • Biases of known direction. and outcomes measured effects of class size on academic achievement after treatment. economic effects of living in a poor neigh- nal study (see Longitudinal Data Analysis). In their study of the effects of that certain subjects would benefit from treat. not responsible. In contrast. with important covariates mea. if data based on their position in a waiting list. and Emer- ment. class sizes in government run schools are largely In planning an observational study. rather than yield a relatively unambiguous conclusion. one attempts determined by the degree of wealth in the local to identify circumstances in which some or all of the region. however. expe. and in cer- advantaged. Campbell and Boruch [10] dis. moods. Deliberate selection of this sort can lead son [69] used as controls young women who to substantial biases in observational studies. Similarly. the ble for the program to controls who were not direction of unobserved biases is quite clear eligible because they were not sufficiently dis. forms of addiction or psychopathology. limited the then the distinction between covariates and out- study to car crashes for which the driver was comes depends critically on the subjects’ recall. adolescent abortion. in the United States. monides in the 12th century still requires that a class of 41 must be divided into two sepa- • Key covariates and outcomes are available for rate classes. even if their magnitude is not. In some settings. a treatment effect ble. The observational studies of compensatory programs use in the Section titled ‘Effects on Criminal intended to offset some disadvantage. tively less haphazard events. Lehman are collected from subjects at a single time. borhood by exploiting the policy of Toronto’s then the temporal order of events is typically public housing program of assigning people to clear. and so forth can easily assignment is a fact. When randomization is not possi. habits. exploited Maimonides rule in their study of the sured before treatment. but current recall of past diseases. test result came back negative. The most basic ele. Random riences. in their study of bereavement in the in a cross-sectional study (see Cross-sectional Section titled ‘Long-term Psychological Effects Design) based on a single survey interview. study compares disadvantaged subjects eligi. a rule proposed by Mai- following elements are available [47]. Hirsch. as et al. subjects in the experiment. biases due else the view of some provider of treatment to self-selection. albeit not eliminate. visited a clinic for a pregnancy test. For instance. Observational Study 3 in mean outcomes is an unbiased and consistent esti. treated and control groups are often formed • Special populations offering reduced self- by deliberate choices reflecting either the per. Age for which the driver was responsible were rela- and sex are covariates whenever they are mea. deliberate choice. but haphazard assignment be affected by subsequent events. Zabin. on the grounds that car crashes and may not be sharp for some variables. such as Violence of Laws Limiting Access to Handguns’ the US Head Start Program for preschool chil. as in a longitudi. Oreopoulos [41] studies the over time as events occur. this is a weakness of cross-sectional studies. selection. Angrist and Lavy [2] control groups. When randomization is not used. assigned subjects to treatment mate of the average effect of the treatment on the or control. of the Death of a Close Relative’. thereby ensuring cuss the substantial systematic biases in many that the controls were also sexually active. • Haphazard treatment assignment rather than haphazard assignments are preferred to assign- self-selection. If data are carefully collected in Israel. Restriction to certain subpopulations sonal preferences of the subjects themselves or may diminish. Campbell and Boruch note that the typical victions may also reduce hidden bias. but in Israel. is a judgment. size 40 from classes half as large is the enroll- ments of an observational study are treated and ment of one more student. of controls who had felony arrests without con- dren. [30]. perhaps a mistaken one. and the distinction between covariates and housing in quite different neighborhoods simply outcomes is clear as well. one should try to identify circumstances in that overcomes a bias working against it may which an ostensibly irrelevant event. In Israel. ments known to be severely biased.

suggesting that even instance. An actual effect of lead should produce a quite • An abrupt start to intense treatments. In general terms. . it is often childhood physical disorders have significant claimed that payments from disability insurance psychiatric effects on the family. . . . [68] found other serious childhood physical disorder with fewer subsequent arrests for gun and violent the parents of healthy children. or ambiguous. . numerically precise predictions drawn from a eral discussion of studies of stress and depres. In the Section titled ‘Effects on Chil- even among the rejected applicants. – contrast. . see direction of bias seems clear: rejected applicants Section titled ‘Elaborate Theories’ for detailed should be healthier. Lehman the hygiene is poor. where the rejection was based on an often used to suggest a design in which certain administrative judgment that the injury or dis. descriptive causal hypothesis. 570-1) sion.” (p. and that both chronic role-related stresses and the chronic depression by definition have occurred it will do this mechanically. Bound found that discussion. the loss of a distant relative or the grad. Some general theory about together with data about the level of lead expo- studies that exploit biases of known direction is sure and the hygiene of parents exposed to lead. Here. [which] provide the best way of ruling out ual loss of a parent to chronic disease might threats to internal validity . Similarly. came first is difficult . or some handguns. too. may focus on stresses that can ing of the context becomes much more important. Cook et al. . Wright et al. • Additional structural features in quasi-experi- cial disincentive. . Kessler [28] makes this point clearly: Randomization will produce treated and control “. ments of design other than random assign- The study concerned the effects of the sudden ment–pretests. given in Section 6. more difficult to discern.5 of [49]. In treatments are scheduled across groups . be assumed to have occurred randomly with nal Violence of Laws Limiting Access to Hand. . a major problem in interpret[ation] . data were collected for control chil- fewer of the recipients would return to work dren whose parents were not exposed to lead. A good exam- cent individuals than does the arrested-but-not. [C]onclusions are more plausible if they are based on evi- possibly have effects that are smaller. In an specific pattern of associations: more lead in the experiment. diabetes. . [14] write that: of bereavement in the Section titled ‘Long-term “. more lead when the are markedly distinct. for reasons that are unrelated to other risk factors for parental psychiatric disorder. However. Bound [6] examined this claim ments intended to provide information about by comparing disability recipients to rejected hidden biases. structural features are added in an effort to ability was not sufficiently severe. without insurance. more dence that corroborates numerous.” (pp. for few returned to work. in the Section titled ‘Effects on Crimi. 197) provided by US Social Security deter recipients from returning to work by providing a finan. and these conditions level of exposure is higher. When randomization is not used. . suggest. . In a gen.4 Observational Study instance. comparison groups. . the way loss of a spouse or a child in a car crash. one expects that the group of convicted to be inescapable. ple is the matched comparison of the parents convicted group who were permitted to purchase of children having cancer. relatively dren of Occupational Exposures to Lead’. is groups that were comparable prior to treatment. the warrant for causal inferences from Psychological Effects of the Death of a Close quasi-experiments rests [on] structural ele- Relative’ resembles an experiment in this sense. Nonetheless. effect large enough to overcome a bias working The small amount of research shows that these in the opposite direction. in which case matched com- parison can be used to make causal inferences felons denied handguns contains fewer inno- about long-term stress effects. an understand- however. this sort are quite common and occur. in most ing that the denial of handguns may have had an cases. the treated and control conditions blood of exposed children. Disorders of offenses among the convicted felons. the provide information about hidden biases. respect to other risk factors of depression and guns’. complex. The term quasi-experiment is applicants. with no understanding for so long that deciding unambiguously which of the context in which the study is being conducted. more lead when become active at a specific known time. The researcher.’s [30] study of the psychological effects et al.

and see [33] for discussion of propensity scores for term Psychological Effects of the Death of a Close doses of treatment. and moreover quali. study. or to locate set. When several controls are used. the propensity istrative records were used to identify and match score is unknown and must be estimated. from whom ing the treatment rather than the control given the 80 matched controls were selected. Multivariate matching appear comparable in terms of observed covariates. in aggregate. with 20 binary covariates.g.’s [39] study of lead expo- tify what covariates to measure. or ject is matched to exactly one control. a single observational study is often not decisive. Routine admin. den biases due to unobserved covariates.’s [30] veys of methods using propensity scores. so that. psychiatric outcomes. it will often be difficult to find a control who matches a Selecting from a Reservoir of Potential Controls. observed covariates [52].. Covariates there are 220 or about a million types of individuals. Typically. see [7. this Adjusting for Biases Visible in Observed is quickly seen to be impractical when there are many covariates. however. instead varying observational study. or both. not most direct and intuitive is matching. matching cannot be expected to balance Lehman et al. treated group is available together with a large the distributions of observed covariates are similar reservoir of potential controls [57]. each treated sub- or subpopulations with reduced selection biases. several controls [62]. without replicat- ing any biases that may have affected the original Multivariate Matching Using Propensity Scores. Ethno. if the reservoir of potential ning an observational study. Observational Study 5 Context is important whether one is trying to iden. but other to determine the direction of hidden biases. namely. substantially greater bias reduction is possible if the and replication is often necessary. Some strategies for doing this are discussed In matching. Relative’ and Morton et al. Among methods of adjustment for overt biases. treated subject on all 20 covariates. In replicating an number of controls is not constant. The purposes. Psychological Effects of the Death of a Close Rel. using logistic regression [19] of the binary category. the actual treatment effects. and if obtaining data from con- tative methods may be integrated with quantitative trols is not prohibitively expensive. the needed data are available. then the standard studies [55]. For nontechnical sur- Most commonly. which compares perfect matches. controls is large. matching structures may yield either greater bias graphic and other qualitative studies (e. even when there are concerns about hid- for all 7581 potential controls. match. The propensity score is the conditional probabil- ative’ is typical. in treated and control groups. study of bereavement in the Section titled ‘Long. unlike ran- The structure of the study of bereavement by domization. It is neither propensity score is defined in terms of the observed practical nor important to obtain psychiatric outcomes covariates. Perfect matches are not needed to each treated individual to one or more controls who balance observed covariates. tion was needed from matched subjects for research treatment/control on the observed covariates. so estimat- ing selected 80 controls who appear comparable to ing the propensity score is straightforward because treated subjects. There were 80 bereaved spouses ity (see Probability: An Introduction) of receiv- and parents and 7581 potential controls. bereaved and control subjects. the Randomization produces covariate balance. [3. for instance. and instead. 27]. methods attempt to produce matched pairs or sets that Matched sampling is most common when a small balance observed covariates. 21]) reduction or estimates with smaller standard errors may provide familiarity with context needed in plan. one should seek to replicate from one treated subject to another [37]. treated subject to a control who appears nearly the same in terms of observed covariates. but additional informa. . For instance. as in both Lehman et al. Of course. In particular. tially reduced by matching each treated subject to vational study will have weaknesses and ambiguities. sure in the Section titled ‘Effects on Children of tings that afford haphazard treatment assignments Occupational Exposures to Lead’. if any. [30] in the Section titled ‘Long-term unobserved covariates. Matched Sampling so even with thousands of potential controls. errors of estimated treatment effects can be substan- Because even the most carefully designed obser. the first impulse is to try to match each in [48].

a common concern is that the adjustments once can often be formed by forming five strata at the failed to control for some covariate that was not quintiles of an estimated propensity score. whereas matching and stratification are the model used to estimate the propensity score. matched to several treated subjects [45]. in imbalances in observed covariates. biases visible showed that five strata formed from a single contin. uous covariate can remove about 90% of the bias in when observational studies are subjected to critical that covariate. Cochran [13] recorded in the data at hand. consider adding to that the combined use of matching and model-based the logit model interactions or quadratics involving adjustments was both robust and efficient. model-based adjustments may propensity score should balance observed covariates. Bergstralh. Because theory says that a correctly estimated substantially incorrect. can Observational Studies) asks how such hidden biases be determined using network optimization. If it suffices to adjust for the adjustment. if there is no hidden bias without regard to their comparability. not only fail to remove overt biases. called full matching in which a treated subject can because randomization balances both observed and be matched to several controls or a control can be unobserved covariates. compare treated subjects directly to actual controls ing on the propensity score and one or two other who appear comparable in terms of observed covari- key covariates will also tend to balance all of the ates. economics. see [53. In other words. In an observational study. they may even this check on covariate balance is also a check on increase them. However. model-based adjustments. such as covariance observed covariates. model-based adjustments and combinations of these techniques Stratification is an alternative to matching in which may often be used to remove overt biases accurately subjects are grouped rather than paired. matching. but model-based observed covariates are similar in treated and control adjustments were less robust than matching when the groups. ing. the new propensity score. A study measured. That is. An optimal a sensitivity analysis (see Sensitivity Analysis in full matching. the propensity score. the concern is that of coronary bypass surgery balanced 74 covariates treated and control subjects were not comparable using five strata formed from an estimated propensity prior to treatment with respect to this unobserved score [53]. that is. then the conclusions tion that makes treated and control subjects as similar about treatment effects would have been different. the stratifica. and he rec- these covariates. the distributions of when the model is precisely correct. 60] compared performance using One can and should check to confirm that the simulation. use data on treated and control subjects observed covariates – that is. Kosanke. If fairly consistent at reducing overt biases.6 Observational Study Matching on one variable. then check covariate balance with ommended this strategy in practice. such as a linear regression model. As an alternative. covariate. Dehejia and Wahba [20] compared the turates balanced 20 observed covariates by matching performance of model-based adjustments and match- on an estimated propensity score and sex [54]. relying on a due to unobserved covariates – then it also suffices model. which many observed covariates. of various magnitudes might alter the conclusions of . 54] for examples of this simple pro. A study of the they did not receive. and had this covariate been measured The optimal stratification – that is. In a case study from labor psychological effects of prenatal exposures to barbi. Appraising Sensitivity to Hidden Bias Stratification With care. even though matched individuals will typically differ on Unlike matched sampling and stratification. and Jacobsen [4] provide SAS software for an optimal matching algorithm. model is wrong. Indeed. match. These results how subjects would have responded under treatments are Theorems 1 through 4 of [52]. he found that if the model is cess. stratification. Rubin found that model-based adjust- propensity score has done its job. and Rubin [58. Rubin found some covariates are not balanced. hence also an optimal stratification. Strata that balance many covariates at evaluation. Model-based Adjustments tends to balance all of the observed covariates. to predict to adjust for the propensity score alone. one ments yielded smaller standard errors than matching should check that. as possible within strata – is a type of matching This is not a concern in randomized experiments. and controlled by adjustments. after matching.

(p.4 and Section 4. only because cigarette smokers produce hormone X. but not as insensitive as the studies experiment. when  = 4. [39] had failed to control by matching a vari- can explain a strong association in a large study. The objection had been value for testing no effect is between 0.1.041. the upper bounds on the P value are 0. able strongly related to blood lead levels and three A simple.3. Analogous bounds may be com- sensitivity to hidden bias. Observational studies vary greatly in their 0. 24. the P value for testing no effect is between Cornfield et al. and [31].  = 1.01 and 0. Observational Study 7 the study. Cornfield et al. this introduces a single sensitivity parameter  that mea. [17] conducted the first formal How large must  be before the conclusions of the sensitivity analysis in a discussion of the effects of study are qualitatively altered? If for  = 9. not an effect of smoking. 40) Exposures to Lead’ used Wilcoxon’s signed-ranks test to compare blood lead levels of 33 exposed Though straightforward to compute. In  = 6. general method of sensitivity analysis times more common among exposed children. if Morton away by very small biases.07 and an observational study with  = 2. Would small departures from random assign- ment alter the conclusions? Or. puted for point estimates and confidence intervals.3. of [49]. Of course. Morton et al.00001 and raised that smoking might not cause lung cancer. but 0. What can be observed to provide evidence about the true P value is unknown. so biases of this magnitude could were matched exactly for observed covariates. but must be between hidden biases. then the results are highly insensitive to bias – rather that there might be a genetic predisposition only an enormous departure from random assignment both to smoke and to develop lung cancer. . Sensitivity analy- one might be twice as likely as the other to receive the ses for point estimates and confidence intervals for treatment because they differ in terms of a covariate this example are in Section 4. was responsible for ation between treatment and outcome. and that of treatments could explain away the observed associ- this. This method of sensitivity anal- nonsmokers for developing lung cancer.02. so no sensitivity analysis is needed. magnitude could not explain the higher lead lev- Weak associations in small studies can be explained els among exposed children. The pattern of analysis is an important step beyond the familiar matched pair differences they observed would yield fact that association does not imply causation. then the proportion of hormone X-producers among For instance. would not have been likely to produce a difference sures the degree of departure from random assign. In an hidden bias.’s [39] study in the cigarette smokers must be at least 9 times greater Section titled ‘Effects on Children of Occupational than among nonsmokers. .12.0001 and 0.3. [17] wrote: 0. A sensitivity analysis tries out sev. respectively. the P cigarette smoking on health. For  = 3. A a P value less than 10−5 in a randomized exper- sensitivity analysis is a specific statement about the iment.5 not observed.35. their sensitivity and 33 matched control children. However. would only very large Elaborate Theories departures from random assignment alter the conclu. change. and this is ysis is discussed in detail with many examples in not because cigarette smoke is a causal agent. ment of treatments. but only a very large bias et al. Several other methods of sensitivity analysis are eral values of  to see how the conclusions might discussed in [16.  is unknown. biases due to covariates that . 26]. so the study is quite insensitive to ing the treatment by at most a factor of . In words.014. but Section 4 of [49] and the references given there.05 observed covariates may differ in their odds of receiv. if for the association between smoking and lung cancer. if cigarette smokers have 9 times the risk of observed association. in lead levels as large as the one they observed. then explain the observed association. as in the studies of smoking and lung cancer. Two subjects with the same The upper bound on the P value is just about 0. so a bias of this present to explain the associations actually observed. in an observational study. random assignment of treatments ensures of heavy smoking and lung cancer. it is possible to place bounds on a statistical inference – perhaps for  = 3. if two subjects 0. then the study is extremely sensitive to hidden bias – a tiny bias could explain away the . that is. the range of possible P values magnitude of hidden bias that would need to be is from about 10−15 to 0. Elaborate Theories and Pattern Specificity sions? For each value of . For  = 5 and that  = 1.

one would need to postulate biases that could produce all three • Two control groups. the comparison becomes sensitive are not [25]. select two different groups not exposed to and (b) they can make a study less sensitive to the treatment. but others predicted pattern. an actual effect of lead exposure. To explain the Due to Pattern Specificity entire pattern. that is. using the explanations. noticeably larger biases would need to be present. if Card and Krueger [11] examined the common exposed children had lower lead levels than controls. their ability to reduce children whose parents had higher lead exposure sensitivity to hidden bias [51]. (p.5 for the several intermediate should envisage as many different consequences of situations. selecting two control groups to systematically In a formal sense. 49. and (c) higher lead levels in exposed children whose parents practiced poorer hygiene upon Common Forms of Pattern Specificity leaving the factory. elaborate theories play two vary an unobserved covariate. that is. 565): “Suc. In other plausible alternative explanations for the observed words. on the job. It is possible to in the blood of exposed children than in matched contrast competing design strategies in terms of their control children. then this randomized experiment: would be difficult to explain as an effect caused by lead exposure. “it is generally less likely that the P value for testing no effect is 0. 580)” higher lead levels of exposure children are not large enough to explain away the pattern of associations Example of Reduced Sensitivity to Hidden Bias predicted by the elaborate theory. used the signed rank test to compare lead levels of the 33 exposed children and their 33 matched controls. For instance.35. to roles: (a) they can aid in detecting hidden biases [49]. 61].75. “[W]ith more pattern specificity. and plan observational studies to discover whether each of these consequences is Section titled ‘Appraising Sensitivity to Hidden Bias’ found to hold. [65]. to attribute the There are several common forms of pattern specificity observed associations to hidden bias rather than or elaborate theories [44. In Section titled ‘Effects on of certain unobserved covariates. some biases that would explain away the effect pattern will be forthcoming. good hygiene. Cook & Shadish [15] (1994. (p. Similarly. Children of Occupational Exposures to Lead’. since by Occam’s razor. Instead. because the upper bound on the P value for testing variate results often leaves few plausible alternative no effect had just reached 0. is less sensitive to hidden bias. 2 for a father with high exposure and was that when constructing a causal hypothesis one poor hygiene. suppose that the advice usually given is to make theories as the exposure levels are assigned numerical scores. and would likely be understood as a About 20 years ago.05.’s [39] study of lead exposures in the A reduction in sensitivity to hidden bias can occur Section titled ‘Effects on Children of Occupational when a correct elaborate theory is strongly confirmed Exposures to Lead’ provides an illustration. Since each of these predictions was consistent with observed data. (b) higher lead levels in exposed ‘design sensitivity.Campbell [8] advocated associations.8 Observational Study were not observed? Cochran [12] summarizes the or if higher exposure predicted lower lead levels. but known to differ in terms hidden bias [50. again. the pattern specific comparison reply puzzled me at first. Indeed. What Sir for a child whose father had either low exposure or Ronald meant. cessful prediction of a complex pattern of multi. claim among economists that increases in the . but an increase in sensitivity can occur elaborate theory predicted: (a) higher lead levels if the pattern is contradicted [50]. some way what can be done in observational studies to clarify children who appeared comparable were in fact not the step from association to causation. Sir Ronald Fisher replied: “Make your theories elaborate. Morton et al. or view of Sir Ronald Fisher.05. In detail. 51]. Their by the data. as subsequent discussion showed. p. 50] to incorporate the scientifically plausible as treatment effects. when asked in a meeting consequence of some unmeasured bias. and it became sensitive to hidden bias at  = 4. 95)” Some patterns of response are dose-signed-rank statistic [46. and 1. The sensitivity analysis discussed in the its truth as possible. the inventor of the if poor hygiene predicted lower lead levels.’ that is. the upper bound on writes Trochim [63].” at  = 4.” The comparable. 1 simple as is consistent with known data.

or nological change that reduced cost and increased if employment changes in relatively affluent safety. pretreatment differences between treated and • Coherence among several outcomes and/or sev. Pattern specificity may aid in detect- Related statistical theory is in [46. [29] illus- if. Although there groups: (a) restaurants in the same chains are reasons to think that bouts of anger might across the Delaware River in Pennsylvania cause heart attacks. and a of Reduced Sensitivity to Hidden Bias Due variety of tactics involving pattern specificity are to Pattern Specificity’ provides one illustration aimed at distinguishing actual treatment effects from and Reynolds and West [42] provide another. . . Hill [25] emphasized the importance Analytical adjustments may fail because of hidden of a coherent pattern of associations and of dose. to $5. . Card and Krueger found similar changes in employment in the two control groups. ineffective treatments. . are used to control for overt biases. Wendy’s.25 to two control highly significant association. found curiosity was not asso- affluent sections of New Jersey where the ciated with myocardial infarction. as far as the theoret. finding a moderately strong and restaurants initially paying $4.An earners to loose their jobs. and (b) restaurants in the same chains in Mittleman et al. there are also reasons to where the minimum wage had not increased. an attempt with either control group. doubt that bouts of curiosity cause heart attacks. eral doses. they compared New Jersey heart attacks.25 For instance. biases. that is. therefore. 51] and the ing hidden bias or in reducing sensitivity to hidden references given there. from $4. is illustrated with Card and Krueger’s study such as matching. ment or certain treatments should not affect and Roy Rogers’ – when New Jersey increased the outcome. Observational Study 9 minimum wage cause many minimum wage • Unaffected outcomes. in [32]. in a case-crossover study [34]. sections of New Jersey are very different from those in less affluent sections. and Legorreta et al. ments. .” Webb [64] speaks of triangulation. Hidden biases lead example in the Section titled ‘Example may leave visible traces in observed data. that is. its minimum wage by about 20%. each mode being as independent vational studies vary markedly in their sensitivity as possible of the other.” McKil- wage should have negligible effects on both lip [35] suggests that an unaffected or ‘control’ control groups. In contrast. Obser- more ways. In anger might cause myocardial infarctions or certain analyses. writing: “the starting wage was at least $5. Campbell [9] wrote: “. control groups that are visible in observed covariates. . hidden biases. to hidden bias. it is important to know ically irrelevant components are concerned (p. . on 1 April 1992. Analytical adjustments. The or insensitive to quite large biases. and Weiss [66] further measured and therefore not controlled by adjust- developed these ideas. as opposed to 1992. [38] asked whether bouts of employment before and after the increase.05 per hour. and similar results Summary in their comparisons of the treated group In the design of an observational study. An algorithm for is made to reconstruct some of the structure and optimal pair matching with two control groups strengths of an experiment. They did this by elaborate theory may predict that certain out- looking at changes in employment at fast food comes should not be affected by the treat- restaurants – Burger Kings. Sensitivity analysis indicates the magnitude great inferential strength is added when each of hidden bias that would need to be present to theoretical parameter is exemplified in two or alter the qualitative conclusions of the study. Pennsylvania Burger Kings were poor trate this possibility in a study of changes in the demand for a type of surgery following a tech- controls for New Jersey Burger Kings. KFCs. one anticipates outcome might sometimes serve in place of a differences between the two control groups control group. An actual effect of raising the minimum curiosity . argue against recall bias. comparing Mittleman et al. see Section 6 of [49] and [67]. whether a particular study is sensitive to small biases 33).00 before 1 April specificity observed for anger . say. bias. important covariates that were not response relationships.

E. E. 331–332. A.A. & Shadish. tic achievement. Association 94. in Evaluation and Experiment.L. & Snell.W. [16] Copas. S. P. The Design of Experiments. 533–575. (2003). A.R. American Economic tectomy. C.R. & Wynder. [10] Campbell.W. C. R. the alternatives: six ways in which quasi-experimental 327–333. A. & Lavy. pp. The planning of observational [30] Lehman. 270. Social experiments: Some developments over the past fifteen years. American Journal of Epidemiology 150. 295–313. 93. & Wahba. D. J. J.H. (2000). Kobylin- [11] Card. Edinburgh. Methodology and Epistemol. Urbana. Kosanke. D. motor vehicle crash.T.R. (1997). M. [32] Lu.M. nonexperimental studies: Reevaluating the evaluation of [4] Bergstralh. D. some questions. Rosenthal & ation or causation? Proceedings of the Royal Society of R. Propen- for randomized assignment to treatments by considering sity scores. (1997). Making it Crazy: An Ethnography edu/hsr/sasmac.D. Smoking [1] Angrist.G. & Peracchio. Chapman & Hall/CRC. Shimkin. H. Series A 128..N.P. Springer-Verlag.R.. ning and Evaluation. (1968).. New York. Ben. References [17] Cornfield. Prospective: artifact and Control. Campbell. Jurimetrics 33. (1994). (1994). Dunnette & L. (2003). Hough. 126–132. [12] Cochran. Organizational Psychology. NBER Reporter. Haenszel.F.. Medicine 58. L. studies of human populations (with Discussion). (1989). Randomized Experiments for Plan. W. and Graphical Statistics 13. J. (2004).D. . New [7] Braitman. Journal of the National Cancer Institute Summer. (1985). 295–300. pp. S. Annals of Internal Medicine 137. by subclassification in removing bias in observational [31] Lin. Epidemiology 7. (1959). Methods for assessing the sensi- propensity scores.J. (1997). The effects of stressful life timate effects. E. Chicago..D. & Li.L. The health and earnings of rejected & Boyd. Uni- [5] Boruch. common treatments: analytic strategies using [24] Gastwirth.R. E. Sage Publications. (1999). Lumsdaine.B.. Thousand Oaks. Psaty. 1053–1062. eds. University of Illinois Press. Palo Alto. & Zatz. V. C. (1965).A. [21] Estroff. S. & Rosenbaum..G. R. [3] Athens. Journal of Personality and Social [13] Cochran. 422–434. Biometrics 24. (1999). Biometrics 54. D. http://www. & Rosenbaum. R. L. the US National Science Foundation. J. S. American Economic Review Chicago Press. L.H. Quarterly Journal of Economics 114.A. B. P. Causal effects in Revisited.T. Academic Press. comes. (1989). training programs. omitted variables.G. W. Annual Review of Psychology 48.R. DeMets. D. B.C. D. evaluations in compensatory education tend to underes. [8] Campbell. The environment and disease: associ- in Artifact in Behavioral Research.T. Journal of the American Medical Association Review 84. [28] Kessler.. Journal of the Royal Statistical Society. & Rosenbaum. 491–576. (1975). Costantino. Violent Criminal Acts and Actors [20] Dehejia. Making the case [27] Joffe. (1998). [19] Cox. (1993). Analysis of Binary Data. and lung cancer: recent evidence and a discussion of experiments in education research. [2] Angrist. 134–155. Rare out. A. 545–580. disability insurance applicants. [22] Fisher. W. London. eds.L.B. 195–296. R. & Jacobsen. tivity of statistical comparisons used in title VII cases to 693–695. 772–793. ies. Annual This work was supported by grant SES-0345113 from Review of Psychology 45. ing with two control groups.M.mayo. [25] Hill. R. (1935).T. W. G. Randomized trials and quasi. The effectiveness of adjustment Psychology 52. & Krueger. 22. Rosnow. 55–96. M. T. American Economic [23] Friedman. [26] Imbens. (1999). [9] Campbell. ing the sensitivity of regression results to unmeasured [14] Cook.L. Minimum wages and ski. Series B 59. York. Increased cholecystec- employment: a case study of the fast-food industry tomy rate after the introduction of laparoscopic cholecys- in New Jersey and Pennsylvania. University of tions in program evaluation. Journal of Computational pp. D. 191–214. J. 19–34. (1965). Journal of the American Statistical Software for optimal matching in observational stud. Berkeley. 315–333. T.Y. Fundamentals of Clinical Trials. (1996). 11–14. Academic Press. & Furberg. R. in Handbook of Industrial and 948–963.L. Sensitivity to exogeneity assump- ogy for Social Science: Selected Papers. versity of California Press. J.J. Wortman. (1997). J. 482–503. Chapmann & Hall. (2002).A. 218–231. Lilien- feld. M. (1987). & Boruch. E. (1998). confounders in observational studies. Quasi-experimentation. Optimal pair match- eds. nett & A.. (1990). & Kronmal. Using Maimonides’ [18] Cox. The Theory of the Design rule to estimate the effect of class size on scholas. of Experiments. R.M. [29] Legorreta. (1992). (1988). Hammond. & Williams. New York. G.html of Psychiatric Clients in an American Community.D.E. Inference for non- random samples (with discussion). J. Assess- studies. D. Journal Long-term effects of losing a spouse or a child in a of the Royal Statistical Society.. 173–203. Silber. N. Oliver [6] Bound. & Read. (1969).. A. R.. D. P. D. events on depression. L. Review 79. New York. Consulting Psychologists Press.10 Observational Study Acknowledgment [15] Cook.D. 1429–1432.

eralized Causal Inference. 318–328. (2002). [46] Rosenbaum. 185–203.P. & Rubin. & Wolpin. Biometrics 29. and conceptualization in program evaluation. D. Observational Studies. Mulry. Estimating causal effects of treat- Section 9 (In Polish). Series A 147. R. . (1981). K. 41–55. 34–43. (2002). Using multivariate matched sam- [41] Oreopoulos..L. after surgery. P. Literature 38. J. Natural “nat- Benson. of the propensity score in observational studies for causal [35] McKillip. J.K. Prince- Statistical Society. Journal of the American Review 9. F. D. (2000).E. New York. Should we use tional studies. Pattern matching. Maclure. P. (2003). B. (1973a). in Methodological Issues [53] Rosenbaum.. Biometrics 29. (1979). Replicating effects and biases. (1991). D.R. 597–610. On the application of probability studies.T. (1923). 159–183. Series B 53. 415–425. S. M. [36] Meyer. Educational Testing Service. oration of the criterion of ‘dose-response’. W. Unconventionality. eds. [67] Weiss. & Rubin. A multiplist [61] Shadish. Plenum Press. Sherwood. Reprinted in English with Discussion in Statistical nal of Educational Psychology 66. (1966). validity [44] Rosenbaum. 487–90. Association 79. P. American Statistical Association 96. that incorporate the propensity score. 516–524. & Mittleman. K. [60] Rubin. W. Boston. 1533–1575. (1983). H. Constructing a con- economics. 656–666. 463–480. Substantial gains cian 39. M. Journal of the American Statistical pp. P. Jour- 1–51. 827–874. (2004). (1997). American Statistician 55. R. [50] Rosenbaum. Matching and of controls. Journal of the Royal Statistical Society. Owens. American Statisti- [37] Ming. & Hursting. E. (1992). [64] Webb. D. Journal of the American Statistical Economics 118. P. triangulation and [45] Rosenbaum. (1985). P. J.. Hornik. 728. 1245–1253. (1995). 1720–1725. 549–555. Cook.Correction: Roberts. Does a dose-response relation- (2001). P. A characterization of optimal inference.D.R.D.H. bility in causal inference: current method and practice. strategy for strengthening nonequivalent control group Experimental and Quasi-Experimental Designs for Gen- designs.R. 691–714. regression adjustment to remove bias in observational [40] Neyman. Proceedings of the Invitational Conference on designs for observational studies. Edwards in observational studies using subclassification on the & R. B. 325–353.R. & Campbell. The use of matched sampling and American Journal of Epidemiology 115. Springer-Verlag. Biometrics 53.R. (2002). D.B. 259–304.. New York. M. Bryant. pp. Saah.. (1995). 30. Tindale. [56] Rosenzweig.R. in bias reduction from matching with a variable number [55] Rosenbaum. D.. Biostatistics 2. [57] Rubin.R. [43] Rosenbaum.R.C.G. (2001). Evaluation tion in observational studies. M. Statistical Association 79. D.B. in children of employees in a lead-related industry. theory to agricultural experiments: essay on principles. Silberg. 151–161.R. J.. Quarterly Journal of vational studies. ton. & Rubin.L. propensity score. Journal of the 1–10. T. (1999). M. Matching with doses in an observational study ship reduce sensitivity to hidden bias? Biostatistics 4. N. 193–221.B. Journal of Business and Economic Statistics trol group using multivariate matched sampling methods 13. 217–232. (1985).. hypothesis? Epidemiology 13. by the treatment. S. Zanutto. R. Can the “specificity” of an associ- [49] Rosenbaum. (1974). (1997). Circulation 92.M. Roczniki Nauk Roiniczych. P. 575–604. 153–164. & Muller.R. P. 2nd ation be rehabilitated as a basis for supporting a causal Edition. 223–227. D.R. Journal of the Royal Testing Problems. (1973b). M. 41–48. Reducing bias in Applied Social Psychology. (2000). The long-run consequences of pling and regression adjustment to control bias in obser- living in a poor neighborhood. & Silber.B. Choice as an alternative to con. P. (2000). Sociological Methodology 27. Natural and quasi-experiments in [54] Rosenbaum. Tom X. P.. (2001). 6–8. Association 74. 118–124. & Saah. G.R. J.B..I. Observational Study 11 [33] Lu. D. Statistical [66] Weiss. Design sensitivity in observa- [34] Maclure. (1987). Inferring causal relationships: elab- Science 14. A. Matching with multiple controls ment for a concomitant variable that has been affected to estimate treatment effects in observational studies. (1984a). vational studies.S.R.. Triggering of acute ural experiments” in economics. 159–175.S. J. thick description in an observational study of mortality [38] Mittleman. M. P. The consequences of adjust. E. (1998). Friedman.D.. N. [63] Trochim.H. Research without control groups: effects. a case-crossover design? Annual Review of Public Health [52] Rosenbaum. 33–38. H.. Biologic plausi- coherent predictions. American Journal of Epidemiology 147. Science 1990. Journal of Epidemiology 113. K. (1984b). 556–566. American [48] Rosenbaum. 5. Matching to remove bias in obser- [39] Morton. & Rosenbaum.A. [47] Rosenbaum. Evaluation Review 11.J. 688–701. P.R.B. [58] Rubin. W. Biometrika 91. Jacobs. Journal of Economic myocardial infarction onset by episodes of anger. & Rosenbaum.. Biometrics 56.. of a media campaign against drug abuse. S. a control construct design. P. From association to causa. & West. P. Tofler. [59] Rubin.A. ments in randomized and nonrandomized studies. [51] Rosenbaum. The central role 21.. (1984).R. Signed rank statistics for [65] Weed.D. Biometrika 70. Lead absorption 1974. P. [62] Smith. Houghton-Mifflin. S. (1982). trol in observational studies (with Discussion). (2003). D. [42] Reynolds.

L. Family Planning Perspectives 21.R. (1999). 88–90. M. M.B. When urban adolescents choose abortion: effects on .A. and subsequent preg- Effectiveness of denial of handgun purchase to persons nancy. M.S. ROSENBAUM [69] Zabin. G. PAUL R. Hirsch.. American Journal of Public Health 89. Wintemute. (1989). psychological status..J.12 Observational Study [68] Wright. F. education. & Emerson. & Rivara.P. believed to be at high risk for firearm violence. 248–255.