129 views

Uploaded by Will Matcham

Proxy Variables

save

- Quantitative Modelling-Part-1 by Tarun Das
- Regression.ppt
- FX_20140827
- CMA Part 1 Summary of Part 1 -2015
- Formula Sheet 2018
- [Turn in]Homework 3 Econometrics
- Tahoe Salt
- Projection Techniques
- 712 (de Paula, UPenn).pdf
- STROBE Checklist Version2
- BS
- Econometric s
- cesifo1224_04.pdf
- IRJET-Pumping of Water by using Wind Turbine
- Student Notes Madule 2 (1)
- DIS_ch_5.pptx
- CH2+Moore.ppt
- 307023727 Bab 7 Performing Effective Internal Audits
- Ch1 Consolidated Script
- Statistics for Data Analysis Lec 4 Correlation and Regression
- BRM - Unit IV- Cheet Sheet.docx
- Analysing Third-World Urbanization a Theoretical Model With Empirical Evidence
- Toledo Gas
- MLB Baseball Attendance
- Evans-practical Business Forecasting[1]. Blackwell.2003 3
- Regression MCQuestions
- 2007-Hawkins-Ecology-richness-metabolic
- Linear Regression Model
- Lecture 06
- Sample Final Solutions.docx
- Curriculum Pp
- Pierburg 36-1b1
- ENSE
- sk_TIM Perencanaan Idi Timur.doc
- Pauta Debate
- Pedido de Informes Comisaria Transradio
- Costo Del Hosting
- a-b-c-d----m-0-1-6-8-9-13-14-15----------------option-4
- anonimo
- Investigacion Cientifica
- can-6-262(2).pdf
- uu traffic management
- Haiti_ List of Loa.pdf
- Appropriatetechnology Part1 151014074437 Lva1 App6891
- Fuentes Conmutadas m.e
- mafiadoc.com_contoh-soal-soal-dan-pembahasan-integral_59c912b81723dd3417991db7.pdf
- yyyyyyyyyyy.pdf
- EXPO
- caligrafia
- ICD_9_CM_2010.pdf
- Manejo Colecistitis Aguda.en.Es
- acordes.pdf
- Troya
- Csf
- ghcjc
- Plenary_Bourell.pdf
- Quint Diode 48DC 40
- Colegio de Ingenieros de Venezuela
- Comunicacion J Brage (Smend)
- Succesul in Viata Bryan Tracy 8548166
- Optimisation for Economists.pdf
- 3 - Mathematics in TeX
- Optimisation for Economists
- Linear Alegbra 1a
- The Rules of Wealth notes
- System of Equations

You are on page 1of 4

William Matcham

February 29, 2016

Reference: Slide Set 1, 30–33 and Wooldridge 298–303 (Wooldridge adds in an extra regressor in the

wage example, but the slides and the text below omit this term in order to make the discussion clearer.)

1

Introduction

A very common problem in econometrics is to not observe (i.e. not have data) on covariates that

are considered important to the analysis.

Proxy variables provide one way to mitigate the problems that arise when we cannot include in

our regression a covariate that we would like to.

Motivating example: suppose we wish to understand how education influences the wage of an

individual. A factor that affects the wage of an individual, that is correlated with education level,

is inherent natural ability.

Ability is therefore a confounding factor in the model.

The regression we may consider is

log(wage) = β0 + β1 educ + β2 abil + u

(1)

** Inherent ability is very difficult, if not impossible, to measure, so without any better options we
**

may just leave out ability and run

log(wage) = β0 + β1 educ + w,

w = β3 abil + u

(2)

** OLS estimation on (2) leads to an inconsistent estimator of β1
**

An attempt to deal with this inconsistency comes from using a proxy variable for ability. Loosely,

a proxy should be a variable correlated to the unobserved variable in question. It’s essentially an

observable variable that provides a similar measure to the unobserved confounder.

In the motivating example above, one choice of a proxy for natural ability would be IQ.

The proxy need not be exactly equivalent as a variable to the unobserved quantity, but it should

be as well correlated as possible. We know for example that IQ is definitely not a perfect measure

of ability at all, but is somewhat related to this.

Given that we have a proxy, the most obvious way to use it to mitigate our inconsistency problem

is to plug the proxy variable in as a regressor, in the hope that it acts as a good replacement for

the unobserved confounder. We hope that it does a good job mimicking the unobserved variable.

1

x∗2 and x2 in explaining the mean value of y. x∗2 . but we have data on a proxy for x∗2 . 1 You’ll see why we skipped the index 1 later. x∗2 is the factor that directly affects y and not x2 . We can (and must) have however that a higher IQ will be associated with a higher innate ability. but necessary.0 Requirement 0 The first requirement is that x2 should have a relationship (correlation) with x∗2 . the long and short is that if δ2 = 0. condition. The mathematical way of stating the above is that we require E(y | x1 . but how do we know that it works? The rest of this handout explains the conditions that a proxy needs to satisfy in order for an OLS regression on (3) to provide consistent a consistent estimator of β1 2 Required Conditions for a Proxy Let us work more generally than the wage regression example. 2 . We have data on y and x1 . so that we have a model y = β0 + β1 x1 + β2 x∗2 + u (4) Where MLR1-5 hold on this model. in the regression x∗2 = δ0 + δ2 x2 + v (5) we should have δ2 6= 0. In the wage regression example. and then the higher ability will be associated with a higher wage. The variable x∗2 is unobserved. then x2 cannot be a proxy for x∗2 . In other words. given that x1 and x∗2 are already in the regression. This is a somewhat obvious. 1 The reason why the δ0 term exists in (4) is because the proxy x2 and x∗2 may have different units. x2 ) (6) Which in words says that the explanatory power of x1 and x∗2 in explaining the mean of y is exactly the same as the explanatory power of x1 . we have to believe that a higher IQ score in itself will not lead to a higher wage (people don’t tend to put IQ score on their CV anyway). This is a bit like the instrumental variable exclusion restriction. is through x∗2 . and the v exists to represent the notion that the proxy and the unobserved variable are not exactly the same. The only channel from the proxy x2 into y. x∗2 ) = E(y | x1 . denoted x2 .1 Requirement 1 The next requirement is that x2 should not be in the main regression (4). 2. 2. In other words. The plug in approach gives the regression model log(wage) = α0 + β1 educ + α2 IQ + e (3) This seems logical. So far.

x∗2 ) = E(β0 + β1 x1 + β2 x∗2 + u | x1 . x2 cannot improve our prediction of the mean value of y. but also x2 . That is. once x2 is partialled out. 2 Tower 3 (7) . once we control for x1 and x∗2 . x2 ) = 0. x∗2 ) = β0 + β1 x1 + β2 x∗2 + E(u | x1 . In other words. we obtain E(β0 + β1 x1 + β2 x∗2 + u | x1 . x∗2 ) = 0 In other words. x2 ) = 0 v should not just be uncorrelated with x2 .2 2. x2 ) = E(x∗2 | x2 ) Similar to above. requirement 1 ensures that the error term u is not only uncorrelated with x1 and x∗2 . x∗2 . x2 ) m β0 + β1 x1 + β2 x∗2 + E(u | x1 . The second requirement related to the proxy is that once x2 is controlled for. x2 ) = E(v | x2 ) Since E(v | x2 ) = 0. x2 ) = δ0 + δ2 x2 + E(v | x2 ) m E(v | x1 . In mathematics. x2 ) Note that since MLR1-5 hold on (4). we thus obtain E(v | x1 . x∗2 . so we get back to obtaining (5). substituting (5) into (7) gives E(δ0 + δ2 x2 + v | x1 . x2 ) m E(u | x1 .2 Requirement 2 Now go back and consider the proxy regression (5). The above result implies that E(u | x1 . but also x1 . x∗2 ) = 0 and therefore the above derivation shows us that E(u | x1 . the mean value x∗2 shouldn’t depend upon x1 . x∗2 should have no correlation with x1 . x2 . x∗2 ) = E(u | x1 . x2 ) = E(δ0 + δ2 x2 + v | x2 ) m δ0 + δ2 x2 + E(v | x1 . NOTE: by substituting y from (4) into both sides of (6). this requirement is given by E(x∗2 | x1 . Another way of seeing this is that if we considered x∗2 = δ0 + δ1 x1 + δ2 x2 + v Then δ1 = 0 should hold. E(u | x1 . x∗2 .

we may be more interested in the marginal effect of one more IQ point. noting the two results in red in the above text. x2 ) + E(u | x1 . α0 = β0 + β2 δ0 2. but in many settings. 3 Consistency of OLS with a correct proxy To finish. we need that E(e | x1 . For unbiased estimation of β1 and α2 .as something natural that cannot be taught or learnt. recall that we have y = β0 + β1 x1 + β2 x∗2 + u (8) x∗2 = δ0 + δ2 x2 + v (9) and Substituting (9) into (8). x2 ) = 0. let’s see why a proxy variable satisfying the three above requirements will allow for consistent estimation of β1 . identifying α2 will be more interesting anyway: in the wage regression example. we obtain y = β0 + β1 x1 + β2 δ0 + β2 δ2 x2 + β2 v + u Which leaves y = α0 + β1 x1 + α2 x2 + e (10) Where 1. I will leave it up to you to decide whether you believe such a variable exists. the marginal effect of x∗2 on y. α2 = β2 δ2 3. that E(e | x1 . Consider running OLS on (10). IQ) = E(abil | IQ) = δ0 + δ2 IQ We are saying that the expected value of ability only changes with IQ. 4 . rather than the marginal effect of one more unit of innate ability. but we may want to think of this ability variable as innate . e = β2 v + u Note that now we cannot identify β2 . x2 ) = 0+0=0 Therefore we have found a way to obtain an unbiased estimator for β1 . To cast this requirement in the wage regression example. x2 ) = β2 E(v | x1 . which is a vague notion at best. and not education. we are saying that E(abil | educ. Some may argue that this is not reasonable. Observe. In the first case where all the requirements are met.

- Quantitative Modelling-Part-1 by Tarun DasUploaded byProfessor Tarun Das
- Regression.pptUploaded byNadirah Yasmin
- FX_20140827Uploaded byeliforu
- CMA Part 1 Summary of Part 1 -2015Uploaded bySiddharthaSaiKrishnaGonuguntla
- Formula Sheet 2018Uploaded byUtkarsh Goel
- [Turn in]Homework 3 EconometricsUploaded bySiti Maghfirotul Ulyah
- Tahoe SaltUploaded byZek Zarin
- Projection TechniquesUploaded byAmal Datta
- 712 (de Paula, UPenn).pdfUploaded byInvest
- STROBE Checklist Version2Uploaded byniraj_sd
- BSUploaded bysareenck
- Econometric sUploaded bybabooz
- cesifo1224_04.pdfUploaded byEugenio Martinez
- IRJET-Pumping of Water by using Wind TurbineUploaded byIRJET Journal
- Student Notes Madule 2 (1)Uploaded byAsi
- DIS_ch_5.pptxUploaded byVictoria Liendo
- CH2+Moore.pptUploaded byAdeel Shaikh
- 307023727 Bab 7 Performing Effective Internal AuditsUploaded byarief kurniawan
- Ch1 Consolidated ScriptUploaded byJohn Tan Guan Zhong
- Statistics for Data Analysis Lec 4 Correlation and RegressionUploaded byNikesh Bajaj
- BRM - Unit IV- Cheet Sheet.docxUploaded byscotpep
- Analysing Third-World Urbanization a Theoretical Model With Empirical EvidenceUploaded byqwedcxz
- Toledo GasUploaded byRongor10
- MLB Baseball AttendanceUploaded byMarcus A. Streips
- Evans-practical Business Forecasting[1]. Blackwell.2003 3Uploaded bycarlosaliaga
- Regression MCQuestionsUploaded bytamizh
- 2007-Hawkins-Ecology-richness-metabolicUploaded byjuli
- Linear Regression ModelUploaded bymarkkkkkkkheeess
- Lecture 06Uploaded byNaeem Ahmed Hattar
- Sample Final Solutions.docxUploaded bydungnt0406