You are on page 1of 4

Proxy Variables

William Matcham
February 29, 2016
Reference: Slide Set 1, 30–33 and Wooldridge 298–303 (Wooldridge adds in an extra regressor in the
wage example, but the slides and the text below omit this term in order to make the discussion clearer.)

1

Introduction
ˆ A very common problem in econometrics is to not observe (i.e. not have data) on covariates that
are considered important to the analysis.
ˆ Proxy variables provide one way to mitigate the problems that arise when we cannot include in
our regression a covariate that we would like to.
ˆ Motivating example: suppose we wish to understand how education influences the wage of an
individual. A factor that affects the wage of an individual, that is correlated with education level,
is inherent natural ability.
ˆ Ability is therefore a confounding factor in the model.
ˆ The regression we may consider is

log(wage) = β0 + β1 educ + β2 abil + u

(1)

ˆ Inherent ability is very difficult, if not impossible, to measure, so without any better options we
may just leave out ability and run

log(wage) = β0 + β1 educ + w,

w = β3 abil + u

(2)

ˆ OLS estimation on (2) leads to an inconsistent estimator of β1
ˆ An attempt to deal with this inconsistency comes from using a proxy variable for ability. Loosely,
a proxy should be a variable correlated to the unobserved variable in question. It’s essentially an
observable variable that provides a similar measure to the unobserved confounder.
ˆ In the motivating example above, one choice of a proxy for natural ability would be IQ.
ˆ The proxy need not be exactly equivalent as a variable to the unobserved quantity, but it should
be as well correlated as possible. We know for example that IQ is definitely not a perfect measure
of ability at all, but is somewhat related to this.
ˆ Given that we have a proxy, the most obvious way to use it to mitigate our inconsistency problem
is to plug the proxy variable in as a regressor, in the hope that it acts as a good replacement for
the unobserved confounder. We hope that it does a good job mimicking the unobserved variable.

1

x∗2 and x2 in explaining the mean value of y. x∗2 . but we have data on a proxy for x∗2 . 1 You’ll see why we skipped the index 1 later. x∗2 is the factor that directly affects y and not x2 . We can (and must) have however that a higher IQ will be associated with a higher innate ability. but necessary.0 Requirement 0 ˆ The first requirement is that x2 should have a relationship (correlation) with x∗2 . the long and short is that if δ2 = 0. condition. ˆ The mathematical way of stating the above is that we require E(y | x1 . but how do we know that it works? The rest of this handout explains the conditions that a proxy needs to satisfy in order for an OLS regression on (3) to provide consistent a consistent estimator of β1 2 Required Conditions for a Proxy ˆ Let us work more generally than the wage regression example. 2 . We have data on y and x1 . so that we have a model y = β0 + β1 x1 + β2 x∗2 + u (4) Where MLR1-5 hold on this model. in the regression x∗2 = δ0 + δ2 x2 + v (5) we should have δ2 6= 0. ˆ In the wage regression example. and then the higher ability will be associated with a higher wage. The variable x∗2 is unobserved. then x2 cannot be a proxy for x∗2 . In other words. given that x1 and x∗2 are already in the regression. This is a somewhat obvious. 1 ˆ The reason why the δ0 term exists in (4) is because the proxy x2 and x∗2 may have different units. x2 ) (6) Which in words says that the explanatory power of x1 and x∗2 in explaining the mean of y is exactly the same as the explanatory power of x1 . we have to believe that a higher IQ score in itself will not lead to a higher wage (people don’t tend to put IQ score on their CV anyway). ˆ This is a bit like the instrumental variable exclusion restriction. is through x∗2 . and the v exists to represent the notion that the proxy and the unobserved variable are not exactly the same. The only channel from the proxy x2 into y. x∗2 ) = E(y | x1 . denoted x2 .1 Requirement 1 ˆ The next requirement is that x2 should not be in the main regression (4). 2. 2. In other words.ˆ The plug in approach gives the regression model log(wage) = α0 + β1 educ + α2 IQ + e (3) ˆ This seems logical. ˆ So far.

x∗2 ) = E(β0 + β1 x1 + β2 x∗2 + u | x1 . x2 cannot improve our prediction of the mean value of y. but also x2 .ˆ That is. once x2 is partialled out. 2 Tower 3 (7) . once we control for x1 and x∗2 . x2 ) = 0. x∗2 ) = β0 + β1 x1 + β2 x∗2 + E(u | x1 . ˆ In other words. we obtain E(β0 + β1 x1 + β2 x∗2 + u | x1 . x∗2 ) = 0 ˆ In other words. x2 ) = 0 ˆ v should not just be uncorrelated with x2 .2 2. x2 ) = E(x∗2 | x2 ) ˆ Similar to above. requirement 1 ensures that the error term u is not only uncorrelated with x1 and x∗2 . x∗2 . x2 ) m β0 + β1 x1 + β2 x∗2 + E(u | x1 . The second requirement related to the proxy is that once x2 is controlled for. x2 ) = E(v | x2 ) ˆ Since E(v | x2 ) = 0. x2 ) = δ0 + δ2 x2 + E(v | x2 ) m E(v | x1 . ˆ In mathematics. x2 ) ˆ Note that since MLR1-5 hold on (4). we thus obtain E(v | x1 . x∗2 . so we get back to obtaining (5). substituting (5) into (7) gives E(δ0 + δ2 x2 + v | x1 . x2 ) m E(u | x1 .2 Requirement 2 ˆ Now go back and consider the proxy regression (5). ˆ The above result implies that E(u | x1 . but also x1 . x∗2 ) = 0 and therefore the above derivation shows us that E(u | x1 . the mean value x∗2 shouldn’t depend upon x1 . x∗2 should have no correlation with x1 . x2 . x∗2 ) = E(u | x1 . x2 ) = E(δ0 + δ2 x2 + v | x2 ) m δ0 + δ2 x2 + E(v | x1 . ˆ NOTE: by substituting y from (4) into both sides of (6). this requirement is given by E(x∗2 | x1 . Another way of seeing this is that if we considered x∗2 = δ0 + δ1 x1 + δ2 x2 + v Then δ1 = 0 should hold. E(u | x1 . x∗2 .

we may be more interested in the marginal effect of one more IQ point. noting the two results in red in the above text. x2 ) + E(u | x1 . α0 = β0 + β2 δ0 2. but in many settings. 3 Consistency of OLS with a correct proxy ˆ To finish. we need that E(e | x1 . For unbiased estimation of β1 and α2 .as something natural that cannot be taught or learnt. recall that we have y = β0 + β1 x1 + β2 x∗2 + u (8) x∗2 = δ0 + δ2 x2 + v (9) and ˆ Substituting (9) into (8). x2 ) = 0. let’s see why a proxy variable satisfying the three above requirements will allow for consistent estimation of β1 . identifying α2 will be more interesting anyway: in the wage regression example. we obtain y = β0 + β1 x1 + β2 δ0 + β2 δ2 x2 + β2 v + u Which leaves y = α0 + β1 x1 + α2 x2 + e (10) Where 1. I will leave it up to you to decide whether you believe such a variable exists. the marginal effect of x∗2 on y. α2 = β2 δ2 3. that E(e | x1 . ˆ Consider running OLS on (10). IQ) = E(abil | IQ) = δ0 + δ2 IQ ˆ We are saying that the expected value of ability only changes with IQ. 4 . rather than the marginal effect of one more unit of innate ability. but we may want to think of this ability variable as innate . e = β2 v + u ˆ Note that now we cannot identify β2 . x2 ) = 0+0=0 ˆ Therefore we have found a way to obtain an unbiased estimator for β1 .ˆ To cast this requirement in the wage regression example. x2 ) = β2 E(v | x1 . which is a vague notion at best. and not education. we are saying that E(abil | educ. Some may argue that this is not reasonable. Observe. ˆ In the first case where all the requirements are met.