Professional Documents
Culture Documents
101
Downloaded from https:/www.cambridge.org/core. UCL, Institute of Education, on 07 Apr 2017 at 17:38:02, subject to the Cambridge Core
terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511762888.004
102 Controlling Observables and Unobservables
Definition 4.2 (Script): The context of the instructions and information given
to subjects in an experiment.
Downloaded from https:/www.cambridge.org/core. UCL, Institute of Education, on 07 Apr 2017 at 17:38:02, subject to the Cambridge Core
terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511762888.004
4.1 Control in Experiments 103
Downloaded from https:/www.cambridge.org/core. UCL, Institute of Education, on 07 Apr 2017 at 17:38:02, subject to the Cambridge Core
terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511762888.004
104 Controlling Observables and Unobservables
Downloaded from https:/www.cambridge.org/core. UCL, Institute of Education, on 07 Apr 2017 at 17:38:02, subject to the Cambridge Core
terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511762888.004
4.1 Control in Experiments 105
proceeded in fixed order, with the pace controlled by the participant. All
instructions appeared onscreen. Participants were consented before the ses-
sion, debriefed and paid $20 after.” We discuss the benefits of debriefing in
Sections 12.1.2 and 13.6.3.
Procedures: The subjects participated in six consecutive experiments in a
single, one-hour session. Subjects were also given a survey of political attitudes,
demographics, and so on. We describe each experiment in the order in which
it was conducted.
Study 1: Subjects were first given a subliminal prime of the phrase “af-
firmative action” and then a target word or nonword, which the subject was
asked to identify as either a word or nonword. The target words came from
six sets of words with an equal number of nonword foils. The nonwords were
pronounceable anagrams. The six sets were (p. 10) “Black stereotype targets
(rhythm, hip-hop, basketball, hostile, gang, nigger); White stereotype targets
(educated, hopeful, ambitious, weak, greedy, uptight); female stereotype tar-
gets (caring, nurturing, sociable, gossipy, jealous, fickle); individualism targets
(earn, work-ethic, merit, unfair, undeserved, hand-outs); egalitarianism tar-
gets (equality, opportunity, help, need, oppression, disadvantage); big govern-
ment targets (government, public, Washington, bureaucracy, debt, mandate);
and pure affect targets (gift, laughter, rainbow, death, demon, rabies). . . . In
addition to these affirmative action trials, there were also interspersed an
approximately equal number of trials involving the prime ‘immigration’ and
a different set of targets” which Taber (2009) does not discuss. “In total, there
were 72 affirmative action/real target trials, 72 baseline/real target trials, and
144 non-word tries, not including the immigration trails. On average study 1
took approximately ten minutes to complete.”
Note that the target words were of three types: stereotype targets, principle
targets, or baseline targets.
The prime and target were presented (pp. 7–8) “in the following way . . . :
a forward mask of jumbled letters flashed center screen (e.g., KQHYTPDQF-
PBYL) for 13 ms, followed by a prime (e.g. affirmative action) for 39 ms, a
backward mask (e.g. DQFPBYLKQHYTP) for 13 ms, and then a target (e.g.,
merit or retim, rhythm or myhrth), which remained on screen until the subject
pressed a green (Yes, a word) or red (No, not a word) button. Trials were
separated by a one second interval. Where precise timing is critical, masks are
necessary to standardize (i.e., overwrite) the contents of visual memory and
to ensure that the effective presentation of the prime is actually just 39 ms.
Conscious expectancies require around 300 ms to develop.”
Taber measured the response times on word trials, discarding the nonword
trials.
Downloaded from https:/www.cambridge.org/core. UCL, Institute of Education, on 07 Apr 2017 at 17:38:02, subject to the Cambridge Core
terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511762888.004
106 Controlling Observables and Unobservables
Study 2: Subjects “were asked to think about affirmative action and told
that they might be asked to discuss this issue with another participant after
the study. One third were told that this discussion partner would be a conser-
vative opponent of affirmative action, one third were told to expect a liberal
supporter of affirmative action, and for one third the discussion partner was
left unspecified.” Then subjects completed the same task of identifying words
and nonwords as in study 1 without the subliminal primes.
Study 3: In this study Taber used the race stereotype words as primes for
the principle targets and vice versa, mixed in with a larger set of trials designed
to test unreported hypotheses. He used the same procedure in the subliminal
primes as in study 1.
Study 4: This study used black and white stereotype words as primes for pure
affect target words using an equal number of positive and negative examples.
Study 5: Taber conducted a replication of a famous experiment conducted
by Sniderman and Carmines (1997). “Participants . . . read a realistic one-
page description of a fictional school funding proposal that sought to provide
$30 to $60 million per year to disadvantaged school districts in the partici-
pant’s home state. The proposal was broken into an initial summary, which
manipulated whether the program would be publicly or privately funded,
and a brief case study of a particular school that would receive funding
through the proposed program, which manipulated race of recipients in three
conditions . . . : the school was described as predominantly white, black, or
racially mixed. . . . After reading the summary and case study, participants
were asked a single question . . . : Do you support of [sic] oppose this proposed
policy? Responses were collected on a 7 pt. Likert-type scale” (p. 21).
Study 6: This study replicated study 5 “with a simpler affirmative action
program using different manipulations.” Taber manipulates “need versus merit,
and target race, but this time the brief proposal mentions a particular disad-
vantaged child as a target recipient. The race of the child is subtly manipulated
by using stereotypical white, black and racially-ambiguous names (Brandon,
Jamar, and James, respectively). The child is described either as struggling aca-
demically with a need for special tutoring he cannot afford or as a high achieving
student who would be targeted by the program because of exceptional ability
and effort” (p. 23). Subjects were asked again whether they support or oppose
the proposed program on a seven-point scale.
Results: In study 1, Taber found evidence that both conservative opponents
of affirmative action and liberal supporters of affirmative action had shorter
response times to black stereotype targets as compared to other targets. In study
2, he found that explicitly thinking about affirmative action in the absence of
an expectation about the discussion partner led to shorter response times to
Downloaded from https:/www.cambridge.org/core. UCL, Institute of Education, on 07 Apr 2017 at 17:38:02, subject to the Cambridge Core
terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511762888.004
4.1 Control in Experiments 107
Taber found that response times for words related to racial stereotypes are
affected by the subliminal primes for conservative opponents of affirmative
action as compared to other words, which he argues suggests that racial
prejudice is a factor in explaining conservative opposition to the policy.
Observing such response times to subliminal primes would be difficult to
accomplish outside of a laboratory environment. That said, Taber took the
lab to his subjects to some extent, recruiting the subjects to temporary lab-
oratories set up in local library or hotel conference rooms in five U.S. cities.
Such an experiment is what is often called a “lab in the field” experiment,
which we investigate more fully in Section 8.2.3 and define in Definition 8.5.
Taber’s experiment is also a good example of how a researcher work-
ing with naturally occurring words and situations may need to conduct a
manipulation check to be sure that the manipulation he or she is conducting
Downloaded from https:/www.cambridge.org/core. UCL, Institute of Education, on 07 Apr 2017 at 17:38:02, subject to the Cambridge Core
terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511762888.004
108 Controlling Observables and Unobservables
Downloaded from https:/www.cambridge.org/core. UCL, Institute of Education, on 07 Apr 2017 at 17:38:02, subject to the Cambridge Core
terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511762888.004
4.2 Control Functions in Regressions 109
Downloaded from https:/www.cambridge.org/core. UCL, Institute of Education, on 07 Apr 2017 at 17:38:02, subject to the Cambridge Core
terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511762888.004
110 Controlling Observables and Unobservables
1
See Wooldridge (2002, chapter 15) for a discussion of the assumptions needed for these
procedures.
2
In footnote 14, Bartels reports results from a multinomial specification for 1992 that
includes Perot supporters.
3
Many researchers now typically use probit or logit estimation techniques to estimate
probability models. We discuss such models subsequently. See Wooldridge (2002, chapter
15) for a discussion.
Downloaded from https:/www.cambridge.org/core. UCL, Institute of Education, on 07 Apr 2017 at 17:38:02, subject to the Cambridge Core
terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511762888.004
4.2 Control Functions in Regressions 111
P0 = µ0 + u0 , (4.1)
P1 = µ1 + u1 , (4.2)
P = T P1 + (1 − T )P0 . (4.3)
We can then plug in Equations (4.1) and (4.2) into Equation (4.3), yielding
Downloaded from https:/www.cambridge.org/core. UCL, Institute of Education, on 07 Apr 2017 at 17:38:02, subject to the Cambridge Core
terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511762888.004
112 Controlling Observables and Unobservables
the means of the stochastic terms are not zero. Because the focus in this
chapter is on control of observables, for ease of exposition we drop M
from our analysis for the rest of this chapter. However, inclusion of M is
straightforward whenever Z is included or discussed.
If a political scientist thinks that these variables are related, can he or she
simply estimate an OLS regression with T as an exogenous variable, using the
coefficient on T as an estimate of the causal effect as in Equation (4.4)? The
answer is no. The main problem is that now the potential choices, P j , may
be correlated with the treatment variable, making it difficult to determine
treatment effects. Thus, we may not be able to estimate causal relationships.
How can we deal with the problem? The principal assumption necessary
is called ignorability of treatment by statisticians Rosenbaum and Rubin
(1983) and selection on the observables by econometricians Heckman and
Robb (1985) and Moffitt (1996). There are two forms of this assumption,
which are given in the following axioms.
4
See Wooldridge (2002, propositon 18.1).
Downloaded from https:/www.cambridge.org/core. UCL, Institute of Education, on 07 Apr 2017 at 17:38:02, subject to the Cambridge Core
terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511762888.004
4.2 Control Functions in Regressions 113
Downloaded from https:/www.cambridge.org/core. UCL, Institute of Education, on 07 Apr 2017 at 17:38:02, subject to the Cambridge Core
terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511762888.004
114 Controlling Observables and Unobservables
5
Information or treatment can affect the utility Louise and Sam receive from voting or
not voting that can have a causal effect in this example and have a causal effect on the
probability of turnout. The point is that confounding by unobservable variables can bias
our estimate of the causal effect of information, in this case overstating that effect.
Downloaded from https:/www.cambridge.org/core. UCL, Institute of Education, on 07 Apr 2017 at 17:38:02, subject to the Cambridge Core
terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511762888.004
4.2 Control Functions in Regressions 115
6
See Wooldridge (2002, proposition 18.2).
Downloaded from https:/www.cambridge.org/core. UCL, Institute of Education, on 07 Apr 2017 at 17:38:02, subject to the Cambridge Core
terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511762888.004
116 Controlling Observables and Unobservables
7
This is often not done in political science applications, however. Instead, political scientists
usually evaluate effects with the independent variables at their means to show the same
relationships.
Downloaded from https:/www.cambridge.org/core. UCL, Institute of Education, on 07 Apr 2017 at 17:38:02, subject to the Cambridge Core
terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511762888.004
4.2 Control Functions in Regressions 117
possible: 0.05, 0.2, 0.5, 0.8, or 0.95. To keep within the notation used in this
chapter, we label this variable T . However, these numbers are best viewed as
approximate assignments in an unspecified interval of information ranges
around them because within these categories respondents vary in informa-
tion level. So, for example, some of those classified as very high information
and assigned T = 0.95 may have information levels above 0.95 and some
may have information levels below 0.95. Bartels assumes that the variable
T is bounded between 0 and 1; when T = 1, a voter is fully informed, and
when T = 0, a voter is fully uninformed.8
As control variables in the probit equations, Bartels includes demographic
variables that measure the following characteristics: age (which is entered
nonlinearly), education, income, race, gender, marital status, homeowner-
ship, occupational status, region and urban, and religion. In the probit, he
interacts these independent variables with both T and (1 − T ) so assigned.
This is a generalization of the switching regression model in Equation (4.4).
Bartels argues then that the coefficients on the independent variables when
interacted with T are the effects of these variables on voting behavior when
a voter is fully informed and that the coefficients on the independent vari-
ables when interacted with (1 − T ) are the effects of these variables on
voting behavior when a voter is completely uninformed. He then compares
the goodness of fit of the model with the information variable as interacted
with the goodness of fit of a probit estimation of voting choice as a function
of the independent variables without the information variables; he finds that
in every presidential election year from 1972 to 1992 the estimation includ-
ing information effects improves the fit and that in 1972, 1984, and 1992
the improvement is large enough to reject the hypothesis of no information
effects.
Using simulations and the clever way that he has coded and interacted the
information variable, Bartels then makes a number of comparisons of how
different types of voters, according to demographic characteristics, would
or would not change their vote choices if they moved from completely
uninformed to fully informed and how electoral outcomes may actually
have been different if the electorate had been fully informed. He finds that
there are large differences in how his simulated fully informed and fully
uninformed women, Protestants, and Catholics vote but that the effects of
education, income, and race on voting behavior are similar for the simulated
fully informed and fully uninformed voters. He argues that his results show
8
In private communication with the authors, Bartels reports that alternative assignments
of the values do not change the results significantly as long as the order is maintained.
Downloaded from https:/www.cambridge.org/core. UCL, Institute of Education, on 07 Apr 2017 at 17:38:02, subject to the Cambridge Core
terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511762888.004
118 Controlling Observables and Unobservables
Downloaded from https:/www.cambridge.org/core. UCL, Institute of Education, on 07 Apr 2017 at 17:38:02, subject to the Cambridge Core
terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511762888.004
4.2 Control Functions in Regressions 119
voter choices such as partisan identification, the results overstate the effect
of information on voting behavior. The reasons for excluding these variables
are not in contention; that is, the values of these variables are likely to be
affected by voter information and thus mask the effect of information on
voting behavior in the survey data. Partisan identification is a mediating
variable, a variable which is a function of treatment that affects potential
outcomes as well. This issue is an important one in choosing control vari-
ables in analysis of causal effects that is often ignored in political science
research.
We use a simple graph to illustrate why Bartels omits the mediating vari-
able partisan identification. In Figure 4.1 we are interested in the causal
effect of T , say information, on Y , say voting behavior, which is repre-
sented by the arrow that goes from T to Y . Variable Y is also a function
of an observable variable, X, say partisan identification, and unobservable
variables, U , which we assume do affect X but not T . Thus, U is not a
confounder of the relationship between T and Y . But if we control for X,
partisan identification, in estimating the effect of information on voting
behavior, then because X is a descendant or a function of both U and T ,
and U and T are independent, then U and T are associated through Y .
Controlling for X makes U a confounder of the relationship between T
and Y . Intuitively, when measuring the effect of T on Y and including X
as a control variable, we remove part of the effect of information on vot-
ing behavior that is mediated through partisan identification because of
the confounding. Greenland and Brumback (2002) note a similar problem
when researchers analyze the effects of weight on health and the researcher
adjusts for serum lipids and blood pressure.
However, leaving out these variables does result in models of voting
behavior that do not fit well and there can be numerous missclassification
Downloaded from https:/www.cambridge.org/core. UCL, Institute of Education, on 07 Apr 2017 at 17:38:02, subject to the Cambridge Core
terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511762888.004
120 Controlling Observables and Unobservables
errors, which bothers Sekhon. This is probably why many political sci-
entists tend to include variables that arguably create such confounding.
Bartels’ work is exceptional in this respect. Yet, can the poor fit be a prob-
lem in itself? Sekhon argues that the estimated effects Bartels finds may
be inaccurate if the consequences of the misclassifications are not consid-
ered. Sekhon reestimates the model with a matching procedure designed
to determine treatment effects that are robust to these errors (which we
explore in Section 4.6.3) and finds that the information effects are not
significant.
Sometimes researchers may be interested in identifying the effect of the
mediating variable on outcomes or measuring the effects of treatment as
transmitted through the mediating variable (as independent of the direct
effects of treatment). We discuss how researchers attempt to estimate these
particular causal effects in Section 4.7.
Downloaded from https:/www.cambridge.org/core. UCL, Institute of Education, on 07 Apr 2017 at 17:38:02, subject to the Cambridge Core
terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511762888.004
4.2 Control Functions in Regressions 121
Y j∗ = XβY j + u j , (4.7a)
T ∗ = XβT + Zθ + v, (4.7b)
Y = T Y1 + (1 − T )Y0 , (4.7e)
where Y j∗ is the latent utility that a voter receives from voting for the Repub-
lican candidate over the Democratic opponent under treatment j and Y j
is a binary variable that represents whether an individual votes Republican
or Democrat under treatment j . Define u = T u1 + (1 − T ) u0 . We assume
that (u, v) is independent of X and Z and distributed as bivariate normal
with mean zero and that each has a unit variance. Because there is no manip-
ulation by an experimenter or nature, we do not include M as an exogenous
variable in Equation (4.7b); however, if there was manipulation, then M
would be included there. In our formulation, the equivalent equation to
what Bartels estimates is the following probit model:
Downloaded from https:/www.cambridge.org/core. UCL, Institute of Education, on 07 Apr 2017 at 17:38:02, subject to the Cambridge Core
terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511762888.004
122 Controlling Observables and Unobservables
Thus, unobserved variables like cognitive abilities and the value that voters
place on citizen duty might lead to a correlation between the error terms
and inconsistent estimations of the effect of information on voting.9
Again the problem can be illustrated using a simple graph. In Figure 4.2
we are interested in the effect of T on Y , where T is a function of observables,
Z, and unobservables, V , and Y is a function of T and U , which contains
both observables and unobservables. Note that we allow for Z to be related
to U and even overlap with some of the observables; U and T are correlated
through the observables. When we use control functions, we are controlling
for these effects. The assumption of ignorability of treatment is that this
is the only avenue through which U and T are correlated. If U and V
are correlated, then selection on unobservables is occurring and there are
common omitted variables in the estimation of the treatment effect. In the
top graph of Figure 4.2, ignorability of treatment holds – there is no arrow
connecting U and V – but in the bottom graph of Figure 4.2, ignorability
of treatment fails – there is an arrow connecting U and V .
How can a researcher find out if ρ = 0? Rivers and Vuong (1988) pro-
pose a test of exogeneity that can be applied to our problem. To do so, a
researcher first estimates a probit of T on X and Z and saves the residuals.
The researcher then estimates Equation (4.8), including the residuals as an
additional independent variable. If the coefficient on the residuals is signifi-
cantly different from zero, then the researcher has an endogeneity problem.
Evidence suggests that, in studies of the effect of information on voting
behavior, endogeneity can be a problem. For example, Lassen (2005), in
Example 2.8, conducts such a test in his work on the effect of information on
voting behavior and finds evidence that supports endogeneity of voter infor-
mation. As Wooldridge (2002, chapter 15) discusses, consistent estimates
9
Lassen (2005), Example 2.8, suggests that systematic measurement error in calculating
voter information is another unobservable that may cause estimation of Equation (4.8) to
be inconsistent. We return to this point when we discuss his empirical estimation more
fully.
Downloaded from https:/www.cambridge.org/core. UCL, Institute of Education, on 07 Apr 2017 at 17:38:02, subject to the Cambridge Core
terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511762888.004
4.2 Control Functions in Regressions 123
Downloaded from https:/www.cambridge.org/core. UCL, Institute of Education, on 07 Apr 2017 at 17:38:02, subject to the Cambridge Core
terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511762888.004
124 Controlling Observables and Unobservables
the charitable organization through which they were recruited. Some subjects
were unpaid volunteers, although this was a small percentage of the sample.
The experiment did not attempt to recruit a random sample but did gather
demographic information on the subjects that suggested a diverse population.
Environment: The candidates in the election were hypothetical, but given
characteristics to make them realistic. The experiment used a technique to
measure the effects of voter information on voter choices developed by LR,
called a dynamic processing tracing methodology, which is a variant of the
“information board” used by behavioral decision theorists for studying decision
making.10 LR (2001, pp. 955–956) described the technology as follows:
“The standard information board presents decision makers with an m by
n matrix, with [sic] the columns of the matrix are headed by the different
alternatives (e.g. candidates) and the rows of the matrix are labeled with dif-
ferent attributes (e.g. issue stands, past experience, and so forth). None of the
specific information is actually visible, however, and decision makers must
actively choose what information they want to learn by clicking on a box on
a computer screen. The researcher can record and analyze what information
was accessed, the order in which it was accessed, how long it was studied, and
so on. . . . Our dynamic process-tracing methodology retains the most essential
features of the standard information board while making it a better analog of
an actual political campaign. Our guiding principle was to devise a technique
that would mimic crucial aspects of an actual election campaign while still
providing a detailed record of the search process employed by voters. . . . We
accomplished . . . [this] by designing a radically revised information board in
which the information about the candidates scrolls down a computer screen
rather than being in a fixed location. There are only a limited number of
attribute labels (six) visible on the computer screen – and thus available for
access – at any given time. . . . The rate of scrolling is such that most people
can read approximately two labels before the positions change. Subjects can
‘access’ (i.e., read) the information behind the label by clicking a mouse. . . . The
scrolling continues while subjects process the detailed information they have
accessed, so that typically there is a completely new screen when subjects return
to scrolling – thus mimicking the dynamic, ongoing nature to the political
campaign. . . . [A]t periodic intervals the computer screen is taken over by a
twenty-second political advertisement for one of the candidates in the cam-
paign. Voters can carefully watch these commercials or avert their eyes while
they are on the screen, but they cannot gather any other information relevant
to the campaign while the commercial is on.”
10
See Caroll and Johnson (1990) for more details.
Downloaded from https:/www.cambridge.org/core. UCL, Institute of Education, on 07 Apr 2017 at 17:38:02, subject to the Cambridge Core
terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511762888.004
4.2 Control Functions in Regressions 125
Downloaded from https:/www.cambridge.org/core. UCL, Institute of Education, on 07 Apr 2017 at 17:38:02, subject to the Cambridge Core
terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511762888.004
126 Controlling Observables and Unobservables
Downloaded from https:/www.cambridge.org/core. UCL, Institute of Education, on 07 Apr 2017 at 17:38:02, subject to the Cambridge Core
terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511762888.004
4.3 Using Time to Control for Confounding Unobservables 127
Downloaded from https:/www.cambridge.org/core. UCL, Institute of Education, on 07 Apr 2017 at 17:38:02, subject to the Cambridge Core
terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511762888.004
128 Controlling Observables and Unobservables
Pt = α + βTt + I + Ut , (4.9)
Downloaded from https:/www.cambridge.org/core. UCL, Institute of Education, on 07 Apr 2017 at 17:38:02, subject to the Cambridge Core
terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511762888.004
4.3 Using Time to Control for Confounding Unobservables 129
where t denotes the time period of the observation, Tt represents the infor-
mation state of an individual at time t, I is the unknown characteristic
that is individual-specific and constant across t, and Ut is the time and
individual-specific unobservable error. So that our analysis is clear, we are
assuming that the only thing that affects voters’ choices is their information
and their individual characteristics. The problem is that if I is correlated
with Tt , then if we just have I in the error term we cannot consistently
estimate the effect of information on voting behavior. If we just have a sin-
gle cross section, then our estimate of the effect of information on voting
behavior is problematic unless we come up with a good proxy for I or take
an instrumental variable approach (discussed in the next chapter).
Can we estimate Equation (4.9) if we have panel data, using repeated
observations for the same individuals as a way to control for I ? A number
of relatively common estimation procedures are used by political scientists
to control for I such as random-effects estimators, fixed-effects estimators,
dummy variables, and first differencing methods.11 All of these methods
assume at the minimum strict exogeneity; that is, once Tt and I are con-
trolled for, Ts has no partial effect on Pt for s = t.
Formally, in OLS, researchers assume the following:
E (Pt |T1 , T2 , . . . , Ts , I ) = E (Pt |Tt , I ) = α + βTt + I. (4.10)
The second equality is an assumption of linearity. When Equation (4.10)
holds, we say that Tt is strictly exogenous conditional on the unobserved
effect.
Definition 4.7 (Strict Exogeneity): Once Tt and I are controlled for, Ts has
no partial effect on Pt for s = t.
11
Wooldridge (2002, chapter 10) provides a review and discussion of these methods and
their underlying assumptions.
Downloaded from https:/www.cambridge.org/core. UCL, Institute of Education, on 07 Apr 2017 at 17:38:02, subject to the Cambridge Core
terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511762888.004
130 Controlling Observables and Unobservables
Downloaded from https:/www.cambridge.org/core. UCL, Institute of Education, on 07 Apr 2017 at 17:38:02, subject to the Cambridge Core
terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511762888.004
4.4 Propensity Scores in Regressions 131
specific – in this case states – and that could confound the causal effect as
discussed earlier.
Holbrook and McClurg use as a dependent variable the first difference
in turnout, but they do not difference their independent variables. They do
include lagged turnout as an independent variable, which, because it is a
function of the other independent variables in the previous period, is an
indirect method of first differencing. Because lagged turnout is also a func-
tion of the state-specific unobservable variable, including lagged turnout
also includes that variable in the estimation and there is the possibility of
correlation across years in the error terms because of its inclusion.12
A potential problem with Holbrook and McClurg’s analysis is the endo-
geneity of campaign activities, which are probably functions of the expected
turnout effects, even when differenced. That is, campaign activities are
clearly choices that candidates make to maximize their probability of win-
ning. Work by Stromberg (2008) demonstrates that campaign visits and
advertisement choices closely match those predicted by a game-theoretic
model in which candidates make these choices strategically in reaction to
both anticipated effects on voter choices in a state and the activities of their
competitors. Thus, an estimate of the causal effect of campaign activities
should take this endogeneity into account, which is not possible in a single-
equation regression. We deal with models of causality that allow for such
endogeneity in Chapter 6.
Downloaded from https:/www.cambridge.org/core. UCL, Institute of Education, on 07 Apr 2017 at 17:38:02, subject to the Cambridge Core
terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511762888.004
132 Controlling Observables and Unobservables
Ho et al. (forthcoming) point out that the last assumption, that the
parametric estimator of the propensity score is consistent, is something
that is untestable. As they demonstrate, if the parametric estimation of
the propensity score is inaccurate, then the causal effects estimated are
inconsistent. They advocate that researchers make use of what they call the
“propensity score tautology”; that is, they note that (p. 219)
The estimated propensity score is a balancing score when we have a consistent
estimate of the true propensity score. We know we have a consistent estimate of the
propensity score when matching on the propensity score balances the raw covariates.
Of course, once we have balance on the covariates, we are done and do not need to
look back. That is, it works when it works, and when it does not work, it does not
work (and when it does not work, keep working at it).
Ho et al. suggest that researchers first begin with a simple estimation of the
propensity score using a logistic regression of T on W. To check for balance,
the researcher matches each treated unit to the control unit with the most
similar value of the estimated propensity score (nearest-neighbor matching
on the propensity score).13 If the matching shows that W is balanced (Ho
et al. discuss how to check for balance), then the researcher should use the
propensity scores. If not, the researcher respecifies the estimation by adding
in interaction terms, squared terms, or both and checks balance again. If
this fails, the researcher turns to more elaborate specifications.
Which is better in regressions estimating treatment effects – standard
control functions or propensity scores as control functions? Sekhon (2005)
contends that an advantage of propensity scores is that they are estimated
independently of outcome or choice data whereas control functions in
regressions are not. This approach allows for the application of a standard
for determining the optimal model that balances the covariates, indepen-
dently of the outcome data. Propensity scores also appear parsimonious as
compared to the kitchen sink–like regression with control functions.
The appearance of parsimony is deceiving because the propensity scores
themselves are estimated in a similar fashion. If the propensity score is
estimated in linear regressions without interaction effects as in Equation
(4.6), the estimations are identical. Moreover, the standard errors in the
13
We discuss matching more extensively later. Ho et al. provide free software for such match-
ing and in the documentation to the software discuss different techniques in matching.
Downloaded from https:/www.cambridge.org/core. UCL, Institute of Education, on 07 Apr 2017 at 17:38:02, subject to the Cambridge Core
terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511762888.004
4.5 Use of Controls and Propensity Scores Without Regression 133
kitchen sink regression have known reliability, and adjustments for prob-
lems can be made as is standard in regression analysis. However, in regres-
sions with propensity scores, the first-stage estimation is sometimes ignored
in computing the standard error and the proper adjustments are not made.
Researchers who use propensity scores need to be careful in the construction
of the standard error and in making these adjustments. Furthermore, the
two techniques rely on different assumptions and arguably the ones under-
lying propensity scores are more restrictive and less likely to be satisfied.
Finally, the use of propensity scores depends on the assumption that the
parametric model used is consistent.
14
See Härdle and Linton (1994).
Downloaded from https:/www.cambridge.org/core. UCL, Institute of Education, on 07 Apr 2017 at 17:38:02, subject to the Cambridge Core
terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511762888.004
134 Controlling Observables and Unobservables
propensity scores are bounded between 0 and 1 (that is, every observation
has a positive probability of being in either state of the world), and that
the parametric estimation of the propensity score is correct such that the
propensity scores are consistent. Then15
[T − π(W)]P
ATE = E (4.11)
π(W)[1 − π(W)]
and
[T − π(W)]P
E
π(W)[1 − π(W)]
ATT = . (4.12)
Pr(T = 1)
These can then be estimated after estimating the propensity scores, both
nonparametically and using flexible parametric approaches. Again, note
that both of these approaches assume ignorability of treatment, with the
propensity score approach making the more restrictive assumption and that
the propensity score estimates are consistent.
15
See Ho et al. (forthcoming) and Rosenbaum and Rubin (1983).
Downloaded from https:/www.cambridge.org/core. UCL, Institute of Education, on 07 Apr 2017 at 17:38:02, subject to the Cambridge Core
terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511762888.004
4.6 Control by Matching 135
to adjust the data prior to the parametric analysis so that (1) the relationship
between T and W is eliminated or reduced and (2) little bias and inefficiency is
induced. . . . An assumption of ignorability is still necessary, but we would no longer
need to model the full parametric relationship between the dependent variable
and the multidimensional W. This also eliminates an important source of model
dependence in the resulting parametric analysis stemming from the functional form
specification and the curse of dimensionality. For data sets where preprocessing
reduces the extent of the relationship between T and W but is unable to make them
completely unrelated, model dependence is not eliminated but will normally be
greatly reduced. If nonparametric preprocessing results in no reduction in model
dependence, then it is likely that the data have little information to support causal
inferences by any method, which of course would also be useful information.
Downloaded from https:/www.cambridge.org/core. UCL, Institute of Education, on 07 Apr 2017 at 17:38:02, subject to the Cambridge Core
terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511762888.004
136 Controlling Observables and Unobservables
Downloaded from https:/www.cambridge.org/core. UCL, Institute of Education, on 07 Apr 2017 at 17:38:02, subject to the Cambridge Core
terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511762888.004
4.7 Causal Effects Through Mediating Variables 137
informed and uninformed voters in U.S. elections at the time of the election
(he finds effects at earlier points in the election campaign). In contrast,
he finds that a similar analysis of Mexican voters in 2000 at the time of
the election shows significant differences. Sekhon conjectures that in more
advanced democracies it is easier for voters to vote as if they were informed,
even if they are not, due to communications that occur during election
campaigns in advanced democracies. Sekhon uses panel data as a way to
control for observables, not unobservables. He explicitly assumes, as in
other matching procedures, mean ignorability of treatment.
X = T X 1 + (1 − T ) X 0 . (4.14)
The causal mediation effect (CME) is the effect on the outcome of chang-
ing the mediator value as affected by the treatment without actually changing
the treatment value. That is, CME is given by
CME (T ) = YT X 1 − YT X 0 . (4.16)
16
We draw from Imai et al. (2009) discussion of mediators in a counterfactual framework.
Downloaded from https:/www.cambridge.org/core. UCL, Institute of Education, on 07 Apr 2017 at 17:38:02, subject to the Cambridge Core
terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511762888.004
138 Controlling Observables and Unobservables
Downloaded from https:/www.cambridge.org/core. UCL, Institute of Education, on 07 Apr 2017 at 17:38:02, subject to the Cambridge Core
terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511762888.004
4.8 Chapter Summary 139
variables, they can make some variables that are normally unobservable
observable, and they can use within-subjects designs and financial incen-
tives to control for other unobservables. In observational data, researchers
use statistical methods to attempt to control for observable variables. These
measures typically make strong assumptions about unobservable vari-
ables and how observable variables affect treatments and potential choices.
Regression methods rely on the assumption of mean ignorability of treat-
ment and in most cases the assumption that the expected effects of observ-
able variables on potential choices are equivalent. Most of the control and
matching methods based on RCM also rely on the assumptions of ignor-
ability of treatment or mean ignorability of treatment. Regressions with
panel data typically assume strict exogeneity. Propensity score and match-
ing methods that estimate ATE rely on the stricter version combined with the
assumption that all observations have some positive probability of existing
in either state of the world (being either informed or uninformed). Some
Downloaded from https:/www.cambridge.org/core. UCL, Institute of Education, on 07 Apr 2017 at 17:38:02, subject to the Cambridge Core
terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511762888.004
140 Controlling Observables and Unobservables
Downloaded from https:/www.cambridge.org/core. UCL, Institute of Education, on 07 Apr 2017 at 17:38:02, subject to the Cambridge Core
terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511762888.004